- From: Norman Gray <norman@astro.gla.ac.uk>
- Date: Mon, 27 Nov 2006 16:19:01 +0000
- To: Harry Halpin <hhalpin@ibiblio.org>
- Cc: public-grddl-comments@w3.org
Harry and all, hello. Thanks for your comments. Oh dear: the following's become rather long, but it does have suggested text in it.... On 2006 Nov 27 , at 08.18, Harry Halpin wrote: > As for your comments on non-XML and HTML, it does appear that > since GRDDL is defined over the XPath Data Model, [...] That's not terrifically explicit in the GRDDL specification: "XPath" doesn't appear anywhere in the document, and it talks throughout of X (HT)ML _documents_. It's broadly implied by the fact that GRDDL is specified as an XSLT transform, but that's all as far as I can see. Why or when (just by the way) would you want to get XML from somewhere other than a document? I can think of a few fairly wacky scenarios, but one surely reasonable one is where you're using a GRDDL processor to grub through an XML database. Perhaps you have a collection of rather heterogeneous data objects in an XML database, and you decide that you can most neatly manage metadata there by including GRDDL transformations in strategic places. If you want to do that, then you'd be wise not to serialise and reparse the contents, but pipe the database contents straight into a SAX stream, and plug that into your transformer. [thinks: hmm, _I_ have a heterogeneous XML database, and it just now occurs to me that this scenario may not be fanciful after all ...] The vCard-to-SAX case is reasonable, too, I think, modulo some subtleties about where the GRDDL declarations actually appear. I'm not claiming this is a big deal -- I'm not climbing on a hobbyhorse, don't worry -- just that abstracting the definition keeps things a little more flexible for the future. And XML's not about angle-brackets! More significantly: You mentioned, Harry: > However, we might add since there is not a standardized "tagsoup" > algorithm, it makes sense while people *can* pull GRDDL results out > of non-XML HTML, it is much safer to do so with XHTML. So the WG > will likely only fully endorse using GRDDL with XHTML, although we > will mention it is possible to use it with non-XML HTML "at your > own risk" in our Use Case docs. That seems perfectly reasonable, and come to think of it, it wouldn't be reasonable to talk about `errors' in this context, since it wouldn't be feasible for such a spec to mandate error behaviour in anything but uselessly generic terms. At the same time, I can't help feeling that saying just `if it's not well-formed, all bets are off' and `it is possible...', while true, is rather avoiding the issue. In the case of someone generating (X) HTML which wraps third party content (I'm thinking again of Yahoo wrapping user-generated HTML), I think they could reasonably expect the GRDDL spec to give _some_ clue about what ought to happen when they put a valid and metadata-rich wrapper round invalid and RDF-less content. I think they should also reasonably expect that GRDDL processors _would_ have a go, and if so it would be good for the spec to bless that. In this case, I think it would be useful to make it clear that if they do emit ill-formed XHTML and end up saying `:your_mother a :hamster.', then it's formally their fault, and no- one's allowed to sue the poor little GRDDL processor, which was only doing its best in adverse circumstances. Thanks for the link in the minutes to Henry Thompson's description[1] of the recent TAG issue TagSoupIntegration-54 [2] -- this seems to be exactly the same problem, and I'm encouraged by the parenthetical remark in: > Is the indefinite persistence of 'tag soup' HTML* consistent with a > sound architecture for the Web? If so, (and the going-in assumption > is that it _is_ so), what changes, if any, to fundamental Web > technologies are necessary to integrate 'tag soup' with > HTML and well-formed XML? So while I certainly agree that it's not sensible for GRDDL to break its neck specifying what to do when faced with tag-soup nonsense, and inappropriate to specify a particular parser, I feel there's still probably scope for a couple of formal `shoulds' in there. How about: ``GRDDL processors SHOULD attempt to recover gracefully from well-formedness or validity errors, and SHOULD retain any RDF generated from this process. In this case (and only in this case), processors MAY use information about the document type (gained from a Content-Type header or otherwise) to assist in the best-effort parsing of the document. If such an error-recovery strategy is employed, a GRDDL processor MAY rely on the generated RDF as if it had been extracted from a conformant document.'' That would be coupled with a remark ``Document authors are responsible for the RDF statements generated by a correctly-applied GRDDL transformation, and must be aware that, confronted with ill-formed or invalid XML, GRDDL processors are free to use a range of strategies to recover from errors, and free to rely on the RDF thus generated.'' That doesn't really commit anyone to anything, but it enshrines Postel's law in an appropriate balance of permissions and suggestions, appropriately deprecates ill-formed documents, makes it clear that authors should make their documents valid or bear the consequences, and makes it clear whose fault it is (the author's) if a GRDDL processor relies on RDF statements gleaned from a misunderstanding of an ill-formed document (though most of the time I'm sure this would be just fine). As a tangential point, what about the case where a GRDDL processor is asked to handle an XHTML document which has a DTD, but which uses the xmlns:data-view technique for linking to the GRDDL transformation? It's therefore well-formed but invalid. That case is excluded by both section 2 and section 4. Are all bets off? Is it still appropriate to be commenting on this, or does everyone feel it's resolved or uninteresting? I get the impression that the telecon notes aren't intended to be Rulings -- is that correct? I hope this is still of use. All the best, Norman [1] http://lists.w3.org/Archives/Public/www-tag/2006Oct/0062.html [2] http://www.w3.org/2001/tag/issues.html#TagSoupIntegration-54 -- ------------------------------------------------------------------------ ---- Norman Gray / http://nxg.me.uk eurovotech.org / University of Leicester, UK
Received on Monday, 27 November 2006 16:19:23 UTC