- From: Harry Halpin <hhalpin@ibiblio.org>
- Date: Mon, 08 Jan 2007 23:22:49 -0500
- To: Norman Gray <norman@astro.gla.ac.uk>, public-grddl-comments@w3.org
Norman, We've added a use-case to the GRDDL Use-Case document[1] that we believe addresses both the use of GRDDL transformations on non-XML HTML (i.e., as you rightly pointed out, how it is possible) and then presents the case for why XML (XHTML) is preferred. Tell us if you find it satisfactory! [1] http://www.w3.org/2001/sw/grddl-wg/doc43/scenario-gallery.htm#html_tidy_use_case thanks! harry Norman Gray wrote: > > > Harry, hello again. > > On 2006 Nov 27 , at 17.07, Harry Halpin wrote: > >>> At the same time, I can't help feeling that saying just `if it's not >>> well-formed, all bets are off' and `it is possible...', while true, >>> is rather avoiding the issue. In the case of someone generating >>> (X)HTML which wraps third party content (I'm thinking again of Yahoo >>> wrapping user-generated HTML), I think they could reasonably expect >>> the GRDDL spec to give _some_ clue about what ought to happen when >>> they put a valid and metadata-rich wrapper round invalid and >>> RDF-less content. I think they should also reasonably expect that >>> GRDDL processors _would_ have a go, and if so it would be good for >>> the spec to bless that. In this case, I think it would be useful to >>> make it clear that if they do emit ill-formed XHTML and end up >>> saying `:your_mother a :hamster.', then it's formally their fault, >>> and no-one's allowed to sue the poor little GRDDL processor, which >>> was only doing its best in adverse circumstances. >> >> A think one part of GRDDL is the focus on "the author of a >> document states that the transformation will provide a faithful >> rendition of the source document, or some portion of the source >> document, that preserves its meaning in RDF." [2] This puts one the >> burden on the author to explicilty license the transform. One line of >> argument could be that if the author wanted to license a faithful >> rendition, they would want that rendition to be as "deterministic" >> and unlikely to break as posible, and that would be one reason to use >> XHTML instead of tagsoup. > > Indeed. Authors certainly ought to produce valid/well-formed XHTML, > and given that they do, there neither is nor should be any ambiguity > or indeterminacy about what RDF it transforms into. > > I'm thinking of the case where an author is ignorant (they're > following a recipe), or where they know what they're doing, but have > made an engineering decision not to obsess about cleaning up their > HTML because... > >> [...]many pages are generated using HTML that "in the small" for a >> set of particular web-pages is itself generic and regular even, so >> that the author could be able to determine a transformation to RDF >> and specify it. > > ...which I entirely agree with. > > > >>> As a tangential point, what about the case where a GRDDL processor >>> is asked to handle an XHTML document which has a DTD, but which uses >>> the xmlns:data-view technique for linking to the GRDDL >>> transformation? It's therefore well-formed but invalid. That case >>> is excluded by both section 2 and section 4. Are all bets off? >> >> Do you mean a DTD that XHTML does not allow xmlns:data-view? I >> believe that should not be a problem. Could you give us a test case >> (i.e. a sample input document and your suggested output or problems >> that it brigns up) > > OK. This is probably more an issue of wording than anything deeply > technical. > > How about this? > > <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" > "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"> > <html xmlns="http://www.w3.org/1999/xhtml" > xmlns:data-view="http://www.w3.org/2003/g/data-view#" > > data-view:transformation="http://www.w3.org/2003/12/rdf-in-xhtml-xslts/grokFOAF.xsl > > > http://www.w3.org/2003/12/rdf-in-xhtml-xslts/grokCC.xsl > > http://www.w3.org/2003/12/rdf-in-xhtml-xslts/grokGeoURL.xsl"> > <head> > <title>Joe Lambda's Home page [an example of RDF in XHTML]</title> > [...] > > That's a dialect of XHTML that _is_ constrained by DTD syntax (so the > spec's section 2 doesn't apply), but it's invalid, because it has the > xmlns: and data-view: attributes (so section 4 doesn't apply). > > This isn't just niggling: is any purpose served by the restriction of > section 4 to _valid_ XHTML? Writing valid XHTML is of course good for > one's immortal soul, but if you give out invalid but well-formed > XHTML, with a GRDDL transformation which produces the RDF you want, > why should the GRDDL processor care? > > All that section 4 actually needs to do is define a special-case > mechanism for a particular category of well-formed XML documents, > which (for irrelvant reasons) can't use the section 2 mechanism. Thus > my suggestion is (i) that a GRDDL processor should be required to have > no problems with XHTML such as the above, and (ii) that the three > instances of the word 'valid' in section 4 -- and indeed elsewhere -- > could be deleted without loss. > > This does also imply that a document like "<html ...><wibble/><head > profile='...'> ... </html>" would be acceptable. But so what? > Indeed, the only way that either of these cases could be distinguished > from the 'well-formed XML' of section 2 is if the GRDDL processor took > the trouble to validate the document: this would simply be mad, so > that the distinction between valid and invalid (but well-formed) > documents in the spec is a distinction without a difference. > > You want text? Delete the 'valid' words, and: >> Stated more formally: >> >> If an XML document has an attribute with XPath /html/head/@profile, >> and that attribute contains the string >> "http://www.w3.org/2003/g/data-view", then the document has a GRDDL >> transformation for each resource named by >> /html/head/link[@rel='transformation']/@href > That clearly applies to a much larger class of documents than just > XHTML, but again that needn't be GRDDL's problem. > > All the best, > > Norman > > > ------------------------------------------------------------------------------ > > Norman Gray / http://nxg.me.uk > eurovotech.org / University of Leicester, UK > > > > -- -harry Harry Halpin, University of Edinburgh http://www.ibiblio.org/hhalpin 6B522426
Received on Tuesday, 9 January 2007 04:22:59 UTC