- From: Henri Sivonen <hsivonen@iki.fi>
- Date: Fri, 6 Mar 2009 14:20:13 +0200
- To: Tim Berners-Lee <timbl@w3.org>
- Cc: HTMLWG WG <public-html@w3.org>, RDFa mailing list <public-rdf-in-xhtml-tf@w3.org>, public-xhtml2@w3.org, "www-tag@w3.org WG" <www-tag@w3.org>
On Mar 6, 2009, at 02:49, Tim Berners-Lee wrote: > On 2009-03 -02, at 01:23, Henri Sivonen wrote: > >> I'm not suggesting change [to RDFa] for the sake of change. My >> interest here is keeping things so that text/html and application/ >> xhtml+xml can be consumed with a single namespace-aware application- >> layer code path using the infoset representation API of the >> application developer's choice given a conforming XML+Namespaces >> parser for that API and a conforming HTML for that API. That is, >> I'm interested in keeping the software architecture for consuming >> applications sane. I think language design that implies bad >> software architecture can't be good Web Architecture. The single >> code path architecture also precludes taking branches on version >> identifiers and such. >> >> Concretely, given the software architecture of Validator.nu (which >> is SAX2-based and pretty good architecture in the absence of RDFa), >> I couldn't add RDFa validation with the xmlns:foo syntax without >> either: >> 1) Resorting to bad software architecture by implementing notably >> different above-parser code paths for text/html and XML. >> OR >> 2) Changing text/html parsing to map xmlns:foo to infoset >> differently from how already shipped Gecko, Opera and WebKit have >> mapped xmlns:foo in text/html to infoset (by considering how they >> map to DOM Level 2 and then by applying the DOM Level 3 to infoset >> mapping). > > Yes, the goal of having one code path on top of a namespace-aware > API is important. > > When one has a namespace-aware API, shame not to have the namespaces. > What are the arguments against implementing xmlns: recognition in > *future* HTML5 parsers? There are three different issues here: 1) What are the arguments against implementing xmlns: recognition in such a way that xmlns:foo changes the way element and attribute names of the form foo:bar are mapped to the DOM/infoset by the parser? (RDFa doesn't need this.) 2) What are the arguments against implementing xmlns: recognition in such a way that xmlns:foo is presented as a namespace mapping to the application layer without making it affect the way the parser maps element or attribute names of the form foo:bar to the DOM/infoset? (This would be sufficient to enable RDFa with the xmlns:foo syntax with one application-layer code path for text/html and XML.) 3) What are the arguments against doing #1 or #2 for xmlns="..." also? Others have in their replies focused on question #1. I'll address it also, but first, let's get #3 out of the way: We can't let xmlns="..." change the way unprefixed element names are mapped to the DOM/infoset, because there are all sorts of xmlns="..." values out there even when pages depend on the elements getting the HTML treatment. Changing xmlns="..." to assign unprefixed names to arbitrary namespaces in text/html would Break the Web. (IIRC, Opera has previously experimented with this and found the change not feasible.) IIRC, we can't let the xmlns="..." attribute itself be assigned to the "http://www.w3.org/2000/xmlns/" namespace in the DOM in text/html, because there is a CSS selector trick for using the mapping difference to detect whether a polyglot document got parsed as text/html or XML. (Sorry, this statement is based on a vague recollection. I don't have a proper reference for this.) Onto question #1 then: Changing how element and attribute names of the form foo:bar are mapped to the DOM/infoset is a problem, because there is already content out there that uses colonified pseudo-XML is text/html. Conceivably, existing content may also use DOM Level 1 document.createElement() and setAttribute() to create such elements and attributes from script. Currently, such scripts create [namespace, local] pairs that are consistent with the parser-created [namespace, local]. Conceivably, existing content may also use CSS selectors to match against these elements or attributes. Currently, such selectors match predictably and consistently across different browser engines and across parser-created and script-created names. (I say "conceivably" above, because I don't have the results of a crawl the Web at my disposal.) If we changed how the parser maps such names to the DOM/infoset, the selectors in existing content would stop matching against previously matched parser-created names and selector matching would be inconsistent between parser-created names and DOM Level 1 script- created names. Demo: http://hsivonen.iki.fi/test/moz/selector-colon.html http://hsivonen.iki.fi/test/moz/selector-colon.xhtml In short, the deployment of pseudo-XML as text/html has poisoned text/ html in such a way that it harder to change to be more like real XML. And then question #2 which is the question most relevant to addressing only the RDFa case: It is not known if (only) changing the way attributes of the form xmlns:foo are mapped to the DOM/infoset (such that xmlns:foo becomes ["http://www.w3.org/2000/xmlns/ ", "foo"] instead of ["", "xmlns:foo"]) would Break the Web if all classes of products implement the same text/html to DOM/infoset mapping for all conforming and meaningful syntax (i.e. xml:lang and non-conforming stuff excluded but xmlns:foo included if it were made conforming along with RDFa). The main problem here is how wanting things and bearing the cost spreads on the subcommunities at the W3C. By subcommunities, I mean communities such as the browsable Web community, the Semantic Web Community, the former SGML community turned into XML community, the WS- * community, etc. The XHTML2 WG doesn't quite fit into these, so I guess it needs to be considered as a community on its own for the purposes of the point I'm trying to make. The community that would bear the cost of finding out if the parsing change would Break the Web is the browsable Web community. First, a massive Web crawl analysis by a browsable Web search engine operator such as Google or Microsoft would be needed to show that there aren't obvious reasons for breakage. If a problem were still unshown, a browser vendor would have to implement the change and ship a browser with the change and see if users complain. However, the community that'd bear the cost of the experiment wasn't asking for feature in the first place. Instead, the it was either the Semantic Web community or the XHTML2 WG who wanted to use the xmlns:foo syntax for RDFa. (I don't know which or both.) I think it's a problem for collaboration where the subcommunities interact at the W3C if one subcommunity wants things and another bears the cost. There a previous example in the form of inflicting permanent complexity (i.e. cost) onto an adjacent subcommunity: Namespaces in XML wasn't something that the SGML documentation community (turned into XML community) needed. Instead, Namespace in XML were a requirement posed by the Semantic Web community's RDF/XML. This requirement permanently complicated the processing model for the SGML community turned into XML community: http://www.flightlab.com/~joe/sgml/sanity.txt This complexity remains even after the Semantic Web community started to steer away from RDF/XML to alternative serializations such as N3, Turtle, etc. I think a similar up-front infliction of complexity onto an adjacent community is happening with CURIEs (regardless of xmlns:foo vs. @prefix): the browsable Web community would have to take on some permanent processing model complexity to address the wishes of another community. (This why I advocate using full absolute URIs instead of @prefix in RDFa.) To get back to the point of who bears the cost of experimenting with changes to the text/html to DOM/infoset mapping: It has been argued (over on the WHATWG list) that HTML5 is adding SVG to text/html, so why should RDFa with its current syntax not get the privilege. Adding SVG to text/html is indeed a non-trivial change which also requires shipping an implementation to browser users to find out if it Breaks the Web. However, in the case of SVG, three of the top four browser engines have a significant application-layer investment in XHTML+SVG compound document support but unleashing the rewards of this investment has been blocked by the complication of migrating existing content management systems to XML and by even the XHTML part not degrading gracefully in IE. Thus, for the browser vendors who've implemented in SVG, the gamble of experimenting with SVG in text/html has the significant potential upside of being able to reap significantly better returns on the application-layer SVG investment. The gamble of making RDFa-motivated parsing changes has no such upside for those who'd need to make the gamble. To avoid problems like this, the least subcommunities who wish to extend (X)HTML but who don't bear the cost of experimenting with parsing changes could do is to stay away from the syntactic areas that prima facie might be problematic and stay in the areas that prima facie most likely aren't that problematic. > (I can't imagine that there are a lot of people who have > accidentally used the string xmlns: inside attribute names in the > past. :) Web authors do all sorts of things. :-/ > There would still be a need for kludge code for legacy browsers, but > with time some applications would just propose to work with XHTML > and HTML in newer browsers. (For example, things which need other > new features anyway). Others would keep the kludge code in forever. > But it would be a huge step forward toward solving this issue. Even if this particular issue could be papered over, the question remains whether the W3C would let other groups still poke the problematic areas of text/html & application/xhtml+xml polyglot syntax and keep HTML parser implementors and definers on the treadmill solving new issues thereby created. -- Henri Sivonen hsivonen@iki.fi http://hsivonen.iki.fi/
Received on Friday, 6 March 2009 12:20:58 UTC