- From: Maciej Stachowiak <mjs@apple.com>
- Date: Tue, 22 Sep 2009 16:47:53 -0700
- To: Mark Birbeck <mark.birbeck@webbackplane.com>
- Cc: Jonas Sicking <jonas@sicking.cc>, HTMLWG WG <public-html@w3.org>, RDFa mailing list <public-rdf-in-xhtml-tf@w3.org>
- Message-id: <28BBBFBD-6C38-4800-8FFE-E01A144B96A8@apple.com>
On Sep 22, 2009, at 3:42 PM, Mark Birbeck wrote: > HI Jonas, > >> It certainly matters. If for example if method 1 or 2 were used then >> no prefix mappings would be found at all in the DOM output from a >> HTML >> parser. So it really *does* matter how you do prefix mapping. And as >> far as DOM 2 goes, I think 1 or 2 are the intuitive solutions so if >> we're not using those then I *really* think it's important to specify >> so. >> >> In any case, I think I've spent enough time on this issue. I can't >> really articulate the problem any more than I have. I hope this issue >> is solved by the time last call rolls around. > > I see that you are frustrated, but you seem to think that the issue is > that no-one understands your position. > > We *do* understand your position, and are trying to explain to you, > that -- with all due respect -- it is based on a misunderstanding. > > You are looking at implementation specifics, and as many people have > explained, implementation is not the issue. This is because the spec > is defining an algorithm, which entitles people to implement things > how they see fit, on whatever platform they want to write for, using > whatever language they want to use. What Jonas is saying is that the spec algorithms as stated don't let you choose between implementation strategies that at first glance seem equally valid but in fact will give different results. He gave some specific examples - how to get prefix mappings in a DOM, how to extract triples from an HTML document that would result in reparenting, and whether prefix mappings should be assigned to elements at parse time or extraction time if the DOM can be mutated after parsing. It seems like people reject his arguments for what superficially appear to be mutually contradictory reasons: (a) that RDFa doesn't really use Namespaces in XML, it just uses a syntax that looks the same but could have been anything; (b) that RDFa normatively references Namespaces in XML for implementation requirements; (c) that RDFa is defined purely at the raw source text level (even though the spec's processing rules speak of an abstract tree model); (d) that RDFa can be applied directly to situations where original source text is not available or may not even exist. I'm pretty puzzled by the argument that RDFa is defined in terms of raw source text. The start of section 5 or XHTML+RDFa says: "Processing need not follow the DOM traversal technique outlined here, although the effect of following some other manner of processing must be the same as if the processing outlined here were followed. The processing model is explained using the idea of DOM traversal which makes it easier to describe (particularly in relation to the [evaluation context])." And indeed Section 5 describes processing in terms of DOM concepts such as "document object", "child element", "document order" and so forth. Later Section 5.5 describes its algorithm as "the DOM traversal technique defined here". It seems to me like it would be much more fruitful to go with this DOM- like formalism instead of pretending that things are actually defined at the textual level. They are not - nowhere does RDFa describe how to get from source characters to its tree model for processing, that is all left up to other specs (and with the understanding that implementations may do things without a tree, as long as they give equivalent results). Buying into the DOM-based model that XHTML+RDFa already uses for its processing rules would immediately answer many of Jonas's questions: - HTML5+RDFa should be processed by taking the DOM that results from the HTML5 parsing algorithm. As with XHTML+RDFa, you don't have to literally create a DOM, but your output must be equivalent to the processing defined in DOM terms. - DOM mutations that happen before RDFa extraction *do* potentially affect the extracted triples. - HTML source documents that are parsed in a way that reparents nodes. - There is no need to first serialize a DOM in order to process it according to RDFa. The only detail that would have to be filled in, if we accept the DOM- based model that the spec already uses, is how to find the prefix mappings. Either an XHTML+RDFa erratum or HTML5+RDFa could specify that any attribute with a qualified name (tagName) that starts with "xmlns:" creates a prefix mapping. Buying into the DOM approach would also address Henri's objection about bad spec layering. Regards, Maciej
Received on Tuesday, 22 September 2009 23:48:36 UTC