W3C home > Mailing lists > Public > public-grddl-comments@w3.org > January to March 2007


From: Harry Halpin <hhalpin@ibiblio.org>
Date: Mon, 08 Jan 2007 23:22:49 -0500
Message-ID: <45A31899.3040700@ibiblio.org>
To: Norman Gray <norman@astro.gla.ac.uk>, public-grddl-comments@w3.org

    We've added a use-case to the GRDDL Use-Case document[1] that we
believe addresses both the use of GRDDL transformations on non-XML HTML
(i.e., as you rightly pointed out, how it is possible) and then presents
the case for why XML (XHTML) is preferred.

Tell us if you find it satisfactory!



Norman Gray wrote:
> Harry, hello again.
> On 2006 Nov 27 , at 17.07, Harry Halpin wrote:
>>> At the same time, I can't help feeling that saying just `if it's not
>>> well-formed, all bets are off' and `it is possible...', while true,
>>> is rather avoiding the issue.  In the case of someone generating
>>> (X)HTML which wraps third party content (I'm thinking again of Yahoo
>>> wrapping user-generated HTML), I think they could reasonably expect
>>> the GRDDL spec to give _some_ clue about what ought to happen when
>>> they put a valid and metadata-rich wrapper round invalid and
>>> RDF-less content.  I think they should also reasonably expect that
>>> GRDDL processors _would_ have a go, and if so it would be good for
>>> the spec to bless that.  In this case, I think it would be useful to
>>> make it clear that if they do emit ill-formed XHTML and end up
>>> saying `:your_mother a :hamster.', then it's formally their fault,
>>> and no-one's allowed to sue the poor little GRDDL processor, which
>>> was only doing its best in adverse circumstances.
>>     A think one part of GRDDL is the focus on "the author of a
>> document states that the transformation will provide a faithful
>> rendition of the source document, or some portion of the source
>> document, that preserves its meaning in RDF." [2] This puts one the
>> burden on the author to explicilty license the transform. One line of
>> argument could be that if the author wanted to license a faithful
>> rendition, they would want that rendition to be as "deterministic"
>> and unlikely to break as posible, and that would be one reason to use
>> XHTML instead of tagsoup.
> Indeed.  Authors certainly ought to produce valid/well-formed XHTML,
> and given that they do, there neither is nor should be any ambiguity
> or indeterminacy about what RDF it transforms into.
> I'm thinking of the case where an author is ignorant (they're
> following a recipe), or where they know what they're doing, but have
> made an engineering decision not to obsess about cleaning up their
> HTML because...
>> [...]many pages are generated using HTML that "in the small" for a
>> set of particular web-pages is itself generic and regular even, so
>> that the  author could be able to determine a transformation to RDF
>> and specify it.
> ...which I entirely agree with.
>>> As a tangential point, what about the case where a GRDDL processor
>>> is asked to handle an XHTML document which has a DTD, but which uses
>>> the xmlns:data-view technique for linking to the GRDDL
>>> transformation?  It's therefore well-formed but invalid.  That case
>>> is excluded by both section 2 and section 4.  Are all bets off?
>>     Do you mean a DTD that XHTML does not allow xmlns:data-view? I
>> believe that should not be a problem. Could you give us a test case
>> (i.e. a sample input document and your suggested output or problems
>> that it brigns up)
> OK.  This is probably more an issue of wording than anything deeply
> technical.
> How about this?
> <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
>        "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
> <html xmlns="http://www.w3.org/1999/xhtml"
>       xmlns:data-view="http://www.w3.org/2003/g/data-view#"
> data-view:transformation="http://www.w3.org/2003/12/rdf-in-xhtml-xslts/grokFOAF.xsl
> http://www.w3.org/2003/12/rdf-in-xhtml-xslts/grokCC.xsl
> http://www.w3.org/2003/12/rdf-in-xhtml-xslts/grokGeoURL.xsl">
> <head>
>   <title>Joe Lambda's Home page [an example of RDF in XHTML]</title>
> [...]
> That's a dialect of XHTML that _is_ constrained by DTD syntax (so the
> spec's section 2 doesn't apply), but it's invalid, because it has the
> xmlns: and data-view: attributes (so section 4 doesn't apply).
> This isn't just niggling: is any purpose served by the restriction of
> section 4 to _valid_ XHTML?  Writing valid XHTML is of course good for
> one's immortal soul, but if you give out invalid but well-formed
> XHTML, with a GRDDL transformation which produces the RDF you want,
> why should the GRDDL processor care?
> All that section 4 actually needs to do is define a special-case
> mechanism for a particular category of well-formed XML documents,
> which (for irrelvant reasons) can't use the section 2 mechanism.  Thus
> my suggestion is (i) that a GRDDL processor should be required to have
> no problems with XHTML such as the above, and (ii) that the three
> instances of the word 'valid' in section 4 -- and indeed elsewhere --
> could be deleted without loss.
> This does also imply that a document like "<html ...><wibble/><head
> profile='...'> ... </html>" would be acceptable.  But so what? 
> Indeed, the only way that either of these cases could be distinguished
> from the 'well-formed XML' of section 2 is if the GRDDL processor took
> the trouble to validate the document: this would simply be mad, so
> that the distinction between valid and invalid (but well-formed)
> documents in the spec is a distinction without a difference.
> You want text?  Delete the 'valid' words, and:
>> Stated more formally:
>> If an XML document has an attribute with XPath /html/head/@profile,
>> and that attribute contains the string
>> "http://www.w3.org/2003/g/data-view", then the document has a GRDDL
>> transformation for each resource named by
>> /html/head/link[@rel='transformation']/@href
> That clearly applies to a much larger class of documents than just
> XHTML, but again that needn't be GRDDL's problem.
> All the best,
> Norman
> ------------------------------------------------------------------------------
> Norman Gray  /  http://nxg.me.uk
> eurovotech.org  /  University of Leicester, UK


Harry Halpin,  University of Edinburgh 
http://www.ibiblio.org/hhalpin 6B522426
Received on Tuesday, 9 January 2007 04:22:59 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 6 January 2015 19:55:02 UTC