W3C home > Mailing lists > Public > public-grddl-comments@w3.org > January to March 2007

Re: GRDDL and HTML

From: Harry Halpin <hhalpin@ibiblio.org>
Date: Mon, 08 Jan 2007 23:22:49 -0500
Message-ID: <45A31899.3040700@ibiblio.org>
To: Norman Gray <norman@astro.gla.ac.uk>, public-grddl-comments@w3.org

Norman,
   
    We've added a use-case to the GRDDL Use-Case document[1] that we
believe addresses both the use of GRDDL transformations on non-XML HTML
(i.e., as you rightly pointed out, how it is possible) and then presents
the case for why XML (XHTML) is preferred.

Tell us if you find it satisfactory!

[1]
http://www.w3.org/2001/sw/grddl-wg/doc43/scenario-gallery.htm#html_tidy_use_case

thanks!
       harry

Norman Gray wrote:
>
>
> Harry, hello again.
>
> On 2006 Nov 27 , at 17.07, Harry Halpin wrote:
>
>>> At the same time, I can't help feeling that saying just `if it's not
>>> well-formed, all bets are off' and `it is possible...', while true,
>>> is rather avoiding the issue.  In the case of someone generating
>>> (X)HTML which wraps third party content (I'm thinking again of Yahoo
>>> wrapping user-generated HTML), I think they could reasonably expect
>>> the GRDDL spec to give _some_ clue about what ought to happen when
>>> they put a valid and metadata-rich wrapper round invalid and
>>> RDF-less content.  I think they should also reasonably expect that
>>> GRDDL processors _would_ have a go, and if so it would be good for
>>> the spec to bless that.  In this case, I think it would be useful to
>>> make it clear that if they do emit ill-formed XHTML and end up
>>> saying `:your_mother a :hamster.', then it's formally their fault,
>>> and no-one's allowed to sue the poor little GRDDL processor, which
>>> was only doing its best in adverse circumstances.
>>
>>     A think one part of GRDDL is the focus on "the author of a
>> document states that the transformation will provide a faithful
>> rendition of the source document, or some portion of the source
>> document, that preserves its meaning in RDF." [2] This puts one the
>> burden on the author to explicilty license the transform. One line of
>> argument could be that if the author wanted to license a faithful
>> rendition, they would want that rendition to be as "deterministic"
>> and unlikely to break as posible, and that would be one reason to use
>> XHTML instead of tagsoup.
>
> Indeed.  Authors certainly ought to produce valid/well-formed XHTML,
> and given that they do, there neither is nor should be any ambiguity
> or indeterminacy about what RDF it transforms into.
>
> I'm thinking of the case where an author is ignorant (they're
> following a recipe), or where they know what they're doing, but have
> made an engineering decision not to obsess about cleaning up their
> HTML because...
>
>> [...]many pages are generated using HTML that "in the small" for a
>> set of particular web-pages is itself generic and regular even, so
>> that the  author could be able to determine a transformation to RDF
>> and specify it.
>
> ...which I entirely agree with.
>
>
>
>>> As a tangential point, what about the case where a GRDDL processor
>>> is asked to handle an XHTML document which has a DTD, but which uses
>>> the xmlns:data-view technique for linking to the GRDDL
>>> transformation?  It's therefore well-formed but invalid.  That case
>>> is excluded by both section 2 and section 4.  Are all bets off?
>>
>>     Do you mean a DTD that XHTML does not allow xmlns:data-view? I
>> believe that should not be a problem. Could you give us a test case
>> (i.e. a sample input document and your suggested output or problems
>> that it brigns up)
>
> OK.  This is probably more an issue of wording than anything deeply
> technical.
>
> How about this?
>
> <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
>        "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
> <html xmlns="http://www.w3.org/1999/xhtml"
>       xmlns:data-view="http://www.w3.org/2003/g/data-view#"
>   
> data-view:transformation="http://www.w3.org/2003/12/rdf-in-xhtml-xslts/grokFOAF.xsl
>
>                             
> http://www.w3.org/2003/12/rdf-in-xhtml-xslts/grokCC.xsl
>                             
> http://www.w3.org/2003/12/rdf-in-xhtml-xslts/grokGeoURL.xsl">
> <head>
>   <title>Joe Lambda's Home page [an example of RDF in XHTML]</title>
> [...]
>
> That's a dialect of XHTML that _is_ constrained by DTD syntax (so the
> spec's section 2 doesn't apply), but it's invalid, because it has the
> xmlns: and data-view: attributes (so section 4 doesn't apply).
>
> This isn't just niggling: is any purpose served by the restriction of
> section 4 to _valid_ XHTML?  Writing valid XHTML is of course good for
> one's immortal soul, but if you give out invalid but well-formed
> XHTML, with a GRDDL transformation which produces the RDF you want,
> why should the GRDDL processor care?
>
> All that section 4 actually needs to do is define a special-case
> mechanism for a particular category of well-formed XML documents,
> which (for irrelvant reasons) can't use the section 2 mechanism.  Thus
> my suggestion is (i) that a GRDDL processor should be required to have
> no problems with XHTML such as the above, and (ii) that the three
> instances of the word 'valid' in section 4 -- and indeed elsewhere --
> could be deleted without loss.
>
> This does also imply that a document like "<html ...><wibble/><head
> profile='...'> ... </html>" would be acceptable.  But so what? 
> Indeed, the only way that either of these cases could be distinguished
> from the 'well-formed XML' of section 2 is if the GRDDL processor took
> the trouble to validate the document: this would simply be mad, so
> that the distinction between valid and invalid (but well-formed)
> documents in the spec is a distinction without a difference.
>
> You want text?  Delete the 'valid' words, and:
>> Stated more formally:
>>
>> If an XML document has an attribute with XPath /html/head/@profile,
>> and that attribute contains the string
>> "http://www.w3.org/2003/g/data-view", then the document has a GRDDL
>> transformation for each resource named by
>> /html/head/link[@rel='transformation']/@href
> That clearly applies to a much larger class of documents than just
> XHTML, but again that needn't be GRDDL's problem.
>
> All the best,
>
> Norman
>
>
> ------------------------------------------------------------------------------
>
> Norman Gray  /  http://nxg.me.uk
> eurovotech.org  /  University of Leicester, UK
>
>
>
>


-- 
		-harry

Harry Halpin,  University of Edinburgh 
http://www.ibiblio.org/hhalpin 6B522426
Received on Tuesday, 9 January 2007 04:22:59 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 6 January 2015 19:55:02 UTC