- From: Norman Gray <norman@astro.gla.ac.uk>
- Date: Mon, 27 Nov 2006 21:03:57 +0000
- To: Harry Halpin <hhalpin@ibiblio.org>
- Cc: public-grddl-comments@w3.org
Harry, hello again. On 2006 Nov 27 , at 17.07, Harry Halpin wrote: >> At the same time, I can't help feeling that saying just `if it's >> not well-formed, all bets are off' and `it is possible...', while >> true, is rather avoiding the issue. In the case of someone >> generating (X)HTML which wraps third party content (I'm thinking >> again of Yahoo wrapping user-generated HTML), I think they could >> reasonably expect the GRDDL spec to give _some_ clue about what >> ought to happen when they put a valid and metadata-rich wrapper >> round invalid and RDF-less content. I think they should also >> reasonably expect that GRDDL processors _would_ have a go, and if >> so it would be good for the spec to bless that. In this case, I >> think it would be useful to make it clear that if they do emit ill- >> formed XHTML and end up saying `:your_mother a :hamster.', then >> it's formally their fault, and no-one's allowed to sue the poor >> little GRDDL processor, which was only doing its best in adverse >> circumstances. > > A think one part of GRDDL is the focus on "the author of a > document states that the transformation will provide a faithful > rendition of the source document, or some portion of the source > document, that preserves its meaning in RDF." [2] This puts one the > burden on the author to explicilty license the transform. One line > of argument could be that if the author wanted to license a > faithful rendition, they would want that rendition to be as > "deterministic" and unlikely to break as posible, and that would be > one reason to use XHTML instead of tagsoup. Indeed. Authors certainly ought to produce valid/well-formed XHTML, and given that they do, there neither is nor should be any ambiguity or indeterminacy about what RDF it transforms into. I'm thinking of the case where an author is ignorant (they're following a recipe), or where they know what they're doing, but have made an engineering decision not to obsess about cleaning up their HTML because... > [...]many pages are generated using HTML that "in the small" for a > set of particular web-pages is itself generic and regular even, so > that the author could be able to determine a transformation to RDF > and specify it. ...which I entirely agree with. >> As a tangential point, what about the case where a GRDDL processor >> is asked to handle an XHTML document which has a DTD, but which >> uses the xmlns:data-view technique for linking to the GRDDL >> transformation? It's therefore well-formed but invalid. That >> case is excluded by both section 2 and section 4. Are all bets off? > > Do you mean a DTD that XHTML does not allow xmlns:data-view? I > believe that should not be a problem. Could you give us a test case > (i.e. a sample input document and your suggested output or problems > that it brigns up) OK. This is probably more an issue of wording than anything deeply technical. How about this? <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"> <html xmlns="http://www.w3.org/1999/xhtml" xmlns:data-view="http://www.w3.org/2003/g/data-view#" data-view:transformation="http://www.w3.org/2003/12/rdf-in-xhtml- xslts/grokFOAF.xsl http://www.w3.org/2003/12/rdf-in-xhtml- xslts/grokCC.xsl http://www.w3.org/2003/12/rdf-in-xhtml- xslts/grokGeoURL.xsl"> <head> <title>Joe Lambda's Home page [an example of RDF in XHTML]</title> [...] That's a dialect of XHTML that _is_ constrained by DTD syntax (so the spec's section 2 doesn't apply), but it's invalid, because it has the xmlns: and data-view: attributes (so section 4 doesn't apply). This isn't just niggling: is any purpose served by the restriction of section 4 to _valid_ XHTML? Writing valid XHTML is of course good for one's immortal soul, but if you give out invalid but well-formed XHTML, with a GRDDL transformation which produces the RDF you want, why should the GRDDL processor care? All that section 4 actually needs to do is define a special-case mechanism for a particular category of well-formed XML documents, which (for irrelvant reasons) can't use the section 2 mechanism. Thus my suggestion is (i) that a GRDDL processor should be required to have no problems with XHTML such as the above, and (ii) that the three instances of the word 'valid' in section 4 -- and indeed elsewhere -- could be deleted without loss. This does also imply that a document like "<html ...><wibble/><head profile='...'> ... </html>" would be acceptable. But so what? Indeed, the only way that either of these cases could be distinguished from the 'well-formed XML' of section 2 is if the GRDDL processor took the trouble to validate the document: this would simply be mad, so that the distinction between valid and invalid (but well-formed) documents in the spec is a distinction without a difference. You want text? Delete the 'valid' words, and: > Stated more formally: > > If an XML document has an attribute with XPath /html/head/@profile, > and that attribute contains the string "http://www.w3.org/2003/g/ > data-view", then the document has a GRDDL transformation for each > resource named by /html/head/link[@rel='transformation']/@href That clearly applies to a much larger class of documents than just XHTML, but again that needn't be GRDDL's problem. All the best, Norman -- ------------------------------------------------------------------------ ---- Norman Gray / http://nxg.me.uk eurovotech.org / University of Leicester, UK
Received on Monday, 27 November 2006 21:04:23 UTC