Re: @rel syntax in RDFa (relevant to ISSUE-60 discussion), was: Using XMLNS in link/@rel from Henri Sivonen on 2009-03-06 (public-rdf-in-xhtml-tf@w3.org from March 2009)

From: Henri Sivonen <hsivonen@iki.fi>
Date: Fri, 6 Mar 2009 14:20:13 +0200
To: Tim Berners-Lee <timbl@w3.org>
Cc: HTMLWG WG <public-html@w3.org>, RDFa mailing list <public-rdf-in-xhtml-tf@w3.org>, public-xhtml2@w3.org, "www-tag@w3.org WG" <www-tag@w3.org>
Message-Id: <EB2CAFB1-B7F9-4FD7-B670-AD9915E02D94@iki.fi>
On Mar 6, 2009, at 02:49, Tim Berners-Lee wrote:

> On 2009-03 -02, at 01:23, Henri Sivonen wrote:
>
>> I'm not suggesting change [to RDFa] for the sake of change. My  
>> interest here is keeping things so that text/html and application/ 
>> xhtml+xml can be consumed with a single namespace-aware application- 
>> layer code path using the infoset representation API of the  
>> application developer's choice given a conforming XML+Namespaces  
>> parser for that API and a conforming HTML for that API. That is,  
>> I'm interested in keeping the software architecture for consuming  
>> applications sane. I think language design that implies bad  
>> software architecture can't be good Web Architecture. The single  
>> code path architecture also precludes taking branches on version  
>> identifiers and such.
>>
>> Concretely, given the software architecture of Validator.nu (which  
>> is SAX2-based and pretty good architecture in the absence of RDFa),  
>> I couldn't add RDFa validation with the xmlns:foo syntax without  
>> either:
>> 1) Resorting to bad software architecture by implementing notably  
>> different above-parser code paths for text/html and XML.
>> OR
>> 2) Changing text/html parsing to map xmlns:foo to infoset  
>> differently from how already shipped Gecko, Opera and WebKit have  
>> mapped xmlns:foo in text/html to infoset (by considering how they  
>> map to DOM Level 2 and then by applying the DOM Level 3 to infoset  
>> mapping).
>
> Yes, the goal of having one code path on top of a namespace-aware  
> API is important.
>
> When one has a namespace-aware API, shame not to have the namespaces.
> What are the arguments against implementing xmlns: recognition in  
> *future* HTML5 parsers?

There are three different issues here:
  1) What are the arguments against implementing xmlns: recognition in  
such a way that xmlns:foo changes the way element and attribute names  
of the form foo:bar are mapped to the DOM/infoset by the parser? (RDFa  
doesn't need this.)
  2) What are the arguments against implementing xmlns: recognition in  
such a way that xmlns:foo is presented as a namespace mapping to the  
application layer without making it affect the way the parser maps  
element or attribute names of the form foo:bar to the DOM/infoset?  
(This would be sufficient to enable RDFa with the xmlns:foo syntax  
with one application-layer code path for text/html and XML.)
  3) What are the arguments against doing #1 or #2 for xmlns="..." also?

Others have in their replies focused on question #1. I'll address it  
also, but first, let's get #3 out of the way:

We can't let xmlns="..." change the way unprefixed element names are  
mapped to the DOM/infoset, because there are all sorts of xmlns="..."  
values out there even when pages depend on the elements getting the  
HTML treatment. Changing xmlns="..." to assign unprefixed names to  
arbitrary namespaces in text/html would Break the Web. (IIRC, Opera  
has previously experimented with this and found the change not  
feasible.)

IIRC, we can't let the xmlns="..." attribute itself be assigned to the  
"http://www.w3.org/2000/xmlns/" namespace in the DOM in text/html,  
because there is a CSS selector trick for using the mapping difference  
to detect whether a polyglot document got parsed as text/html or XML.  
(Sorry, this statement is based on a vague recollection. I don't have  
a proper reference for this.)

Onto question #1 then:

Changing how element and attribute names of the form foo:bar are  
mapped to the DOM/infoset is a problem, because there is already  
content out there that uses colonified pseudo-XML is text/html.  
Conceivably, existing content may also use DOM Level 1  
document.createElement() and setAttribute() to create such elements  
and attributes from script. Currently, such scripts create [namespace,  
local] pairs that are consistent with the parser-created [namespace,  
local].

Conceivably, existing content may also use CSS selectors to match  
against these elements or attributes. Currently, such selectors match  
predictably and consistently across different browser engines and  
across parser-created and script-created names.

(I say "conceivably" above, because I don't have the results of a  
crawl the Web at my disposal.)

If we changed how the parser maps such names to the DOM/infoset, the  
selectors in existing content would stop matching against previously  
matched parser-created names and selector matching would be  
inconsistent between parser-created names and DOM Level 1 script- 
created names.

Demo:
http://hsivonen.iki.fi/test/moz/selector-colon.html
http://hsivonen.iki.fi/test/moz/selector-colon.xhtml

In short, the deployment of pseudo-XML as text/html has poisoned text/ 
html in such a way that it harder to change to be more like real XML.

And then question #2 which is the question most relevant to addressing  
only the RDFa case:

It is not known if (only) changing the way attributes of the form  
xmlns:foo are mapped to the DOM/infoset (such that xmlns:foo becomes ["http://www.w3.org/2000/xmlns/ 
", "foo"] instead of ["", "xmlns:foo"]) would Break the Web if all  
classes of products implement the same text/html to DOM/infoset  
mapping for all conforming and meaningful syntax (i.e. xml:lang and  
non-conforming stuff excluded but xmlns:foo included if it were made  
conforming along with RDFa).

The main problem here is how wanting things and bearing the cost  
spreads on the subcommunities at the W3C. By subcommunities, I mean  
communities such as the browsable Web community, the Semantic Web  
Community, the former SGML community turned into XML community, the WS- 
* community, etc. The XHTML2 WG doesn't quite fit into these, so I  
guess it needs to be considered as a community on its own for the  
purposes of the point I'm trying to make.

The community that would bear the cost of finding out if the parsing  
change would Break the Web is the browsable Web community. First, a  
massive Web crawl analysis by a browsable Web search engine operator  
such as Google or Microsoft would be needed to show that there aren't  
obvious reasons for breakage. If a problem were still unshown, a  
browser vendor would have to implement the change and ship a browser  
with the change and see if users complain.

However, the community that'd bear the cost of the experiment wasn't  
asking for feature in the first place. Instead, the it was either the  
Semantic Web community or the XHTML2 WG who wanted to use the  
xmlns:foo syntax for RDFa. (I don't know which or both.)

I think it's a problem for collaboration where the subcommunities  
interact at the W3C if one subcommunity wants things and another bears  
the cost.

There a previous example in the form of inflicting permanent  
complexity (i.e. cost) onto an adjacent subcommunity: Namespaces in  
XML wasn't something that the SGML documentation community (turned  
into XML community) needed. Instead, Namespace in XML were a  
requirement posed by the Semantic Web community's RDF/XML. This  
requirement permanently complicated the processing model for the SGML  
community turned into XML community: http://www.flightlab.com/~joe/sgml/sanity.txt 
  This complexity remains even after the Semantic Web community  
started to steer away from RDF/XML to alternative serializations such  
as N3, Turtle, etc. I think a similar up-front infliction of  
complexity onto an adjacent community is happening with CURIEs  
(regardless of xmlns:foo vs. @prefix): the browsable Web community  
would have to take on some permanent processing model complexity to  
address the wishes of another community. (This why I advocate using  
full absolute URIs instead of @prefix in RDFa.)

To get back to the point of who bears the cost of experimenting with  
changes to the text/html to DOM/infoset mapping:

It has been argued (over on the WHATWG list) that HTML5 is adding SVG  
to text/html, so why should RDFa with its current syntax not get the  
privilege. Adding SVG to text/html is indeed a non-trivial change  
which also requires shipping an implementation to browser users to  
find out if it Breaks the Web. However, in the case of SVG, three of  
the top four browser engines have a significant application-layer  
investment in XHTML+SVG compound document support but unleashing the  
rewards of this investment has been blocked by the complication of  
migrating existing content management systems to XML and by even the  
XHTML part not degrading gracefully in IE. Thus, for the browser  
vendors who've implemented in SVG, the gamble of experimenting with  
SVG in text/html has the significant potential upside of being able to  
reap significantly better returns on the application-layer SVG  
investment.

The gamble of making RDFa-motivated parsing changes has no such upside  
for those who'd need to make the gamble.

To avoid problems like this, the least subcommunities who wish to  
extend (X)HTML but who don't bear the cost of experimenting with  
parsing changes could do is to stay away from the syntactic areas that  
prima facie might be problematic and stay in the areas that prima  
facie most likely aren't that problematic.

> (I can't imagine that there are a lot of people who have  
> accidentally used the string xmlns: inside attribute names in the  
> past. :)

Web authors do all sorts of things. :-/

> There would still be a need for kludge code for legacy browsers, but  
> with time some applications would just propose to work with XHTML  
> and HTML in newer browsers.  (For example, things which need other  
> new features anyway). Others would keep the kludge code in forever.   
> But it would be a huge step forward toward solving this issue.

Even if this particular issue could be papered over, the question  
remains whether the W3C would let other groups still poke the  
problematic areas of text/html & application/xhtml+xml polyglot syntax  
and keep HTML parser implementors and definers on the treadmill  
solving new issues thereby created.

-- 
Henri Sivonen
hsivonen@iki.fi
http://hsivonen.iki.fi/
Received on Friday, 6 March 2009 12:21:01 UTC