Re: RDFa and Web Directions North 2009 from Henri Sivonen on 2009-02-17 (public-rdf-in-xhtml-tf@w3.org from February 2009)

From: Henri Sivonen <hsivonen@iki.fi>
Date: Tue, 17 Feb 2009 19:04:58 +0200
To: Ben Adida <ben@adida.net>
Cc: Karl Dubost <karl@la-grange.net>, Mark Birbeck <mark.birbeck@webbackplane.com>, Sam Ruby <rubys@intertwingly.net>, Kingsley Idehen <kidehen@openlinksw.com>, Dan Brickley <danbri@danbri.org>, Michael Bolger <michael@michaelbolger.net>, public-rdfa@w3.org, RDFa mailing list <public-rdf-in-xhtml-tf@w3.org>, Tim Berners-Lee <timbl@w3.org>, Dan Connolly <connolly@w3.org>, Ian Hickson <ian@hixie.ch>
Message-Id: <CD3801E9-BB71-4B4B-9168-D8A608F9E089@iki.fi>
On Feb 17, 2009, at 09:28, Ben Adida wrote:

> Henri Sivonen wrote:
>>> JavaScript
>>>
>>>    * RDFa Bookmarklets
>>>      Author: Ben Adida
>>>      http://www.w3.org/2006/07/SWD/RDFa/impl/js/
>>
>> This one reads xmlns:foo using a namespace-unaware DOM Level 1 view.
>>
>> See points 4 though 7 in my earlier email.
>
> You provided an example that supposedly breaks RDFa parsing. I showed
> you a library that works without any code fork on HTML/XHTML. Now,
> you've come up with new conditions.
>
> If I show you that these conditions are, in fact, satisfied, will  
> there
> be more conditions? Is there an official list of conditions that  
> you're
> reading from?

I don't have an official list beyond the HTML Design Principles  
document. However, I did include *a* list of points in
http://lists.w3.org/Archives/Public/public-rdfa/2009Feb/0079.html

Please consider my concrete points about SAX and XOM in particular as  
applying to any Namespace-wise correct XML API in at least semi- 
popular use for at least semi-popular programming language.

The name of the principle DOM Consistency names the most important API  
without getting into the thinner air of Infoset. However, non-browser  
HTML consumers are important, too, so the consistency should apply to  
non-DOM APIs, too.

Here's a non-exhaustive list of APIs that should be considered in  
addition to Web DOM as exposed to JavaScript:
  * The internal API of Gecko
  * The internal API of WebKit
  * The internal API of Presto (assuming it has a separate internal API)
  * Java DOM
  * SAX2
  * JDOM
  * dom4j
  * XOM
  * StAX
  * reXML
  * Python minidom
  * ElementTree
  * .Net System.Xml pull API
  * .Net System.Xml tree API
  * libxml2 DOM
  * libxml2 stream API
  * The API of expat
  * XNI

When considering how to expose HTML5 as if it were XHTML5 to all these  
APIs, it is important to consider whether the syntax of RDFa would  
require additional code in the HTML5 parser that none of the currently  
drafted HTML5 features (including MathML support and the commented out  
SVG support) require. (Point #13 in my email referenced above.)

It helps to consider multiple APIs in one go through the concept of  
Infoset. See http://www.whatwg.org/specs/web-apps/current-work/#coercing-an-html-dom-into-an-infoset

> So let's examine:
>
>> 4) The qname is an artifact of [....]
>
> We're not using QNames.

I meant the qname "xmlns:foo" as opposed to the pair ["http://www.w3.org/2000/xmlns/ 
","foo"].

>> 5) Given the points above, you should also do dispatch on the
>> [namespace,local] pair on the HTML side.
>
> Only if RDFa fields were defined as QNames. They're not.

I'm not talking about the attribute values. I'm talking about using  
the DOM in a way that is architecturally sound from the point of view  
of Namespaces--i.e. using the Level 2 view instead of the Level 1  
view. (This may seem theoretical; see below.)

>> 6) All features going into HTML5 should be robust and sane under
>> scripting even if the people proposing the feature where interested
>> in read-only use case is outside browsers. This includes keeping
>> script-generated DOMs serializable.
>>
>> 7) If, in order to satisfy point #2 above, your feature requires
>> using getAttribute (without NS) on getting but setAttributeNS (with
>> NS) on setting (to keep the XML DOM serializable!), your feature
>> isn't satisfying point #6.
>
> Mark is right on this: RDFa parsing remains easy and consistent. The  
> key
> is to *never* use setAttributeNS or getAttributeNS. Since RDFa doesn't
> use QNames, that's not surprising.
>
> We do use xmlns prefix bindings, but we don't need to rely on the
> browser to parse those bindings, we can do that ourselves easily,
> exactly as if we had used @prefix all along.
>
> So, to implementors, we simply say: use setAttribute and getAttribute,
> *never* setAttributeNS or getAttributeNS.

If you are advocating never using the Level 2 view, you are basically  
advocating a programming model that isn't architecturally soundly  
layered on top of Namespaces in XML--on top of which all XML  
vocabularies defined by the W3C have been layered.

When you were speccing this, didn't it seem like a problem  
(Architectural, architectural or otherwise) to you that you needed to  
adopt implementation advice that effective says "Never use DOM Level 2"?

Although this looks like a non-problem in browsers because the  
Namespace-unaware DOM Level 1 view is available, it is a technical  
problem with APIs that only provide a Namespace-aware representation.  
For example, XOM doesn't allow attributes called xmlns:foo in the data  
model. Non-browser consumers are important, and it should be perfectly  
reasonable to use XOM in such a consumer.

There's also a technical issue for browsers: For resolving a prefix in  
a namespace mapping context on the XML side in a browser, it would  
make sense to intern the prefix being queried and then do pointer  
compares against interned local names of attributes in the "http://www.w3.org/2000/xmlns/ 
" namespace as traversing up the tree. If you ever wanted browsers to  
implement any RDFa features natively, being sensitive to xmlns:foo  
attributes set with setAttribute() would preclude a pointer compare- 
based lookup and would require actually inspecting string data unless  
the internal data structures of browsers were changed. (But see point  
#13 in my previous email on the topic of changing the data structures.)

(Politically, it seems rather radical to remove Namespaces in XML from  
the XML Architecture layer cake when recommending how to process a W3C  
XML vocabulary. Even if HTML5 may at times be seen as politically  
radical at the W3C, it isn't *this* radical! HTML5 tries to hide  
namespace *syntax* from the view as much as possible, but data model  
is still the Namespace-aware DOM.)

> I've expanded our evolving example to show that, no matter the mime  
> type:
>
> - you can dynamically add RDFa to the DOM using setAttribute.
>
> - you can serialize the resulting DOM appropriately using innerHTML.

I admit that I was surprised that this step didn't throw on the XML  
side.

> I think it's safe to say that we have a robust and sane way to
> consistently parse RDFa in both HTML and XHTML DOMs, with robustness
> across DOM manipulation and serialization, even when using xmlns:*.  
> The
> key is to remember that we don't use QNames.

Let's see if it's robust when a script mutates a parser-inserted  
attribute:
http://hsivonen.iki.fi/test/moz/xmlns-dom-setter-cc.xhtml

Not robust in Opera.

> So, I maintain my claim that the opposition to xmlns:* is mainly one  
> of
> personal taste, not a technical problem of any sort. I'm certainly
> receptive to this issue of taste, which is why I'm happy with our
> @prefix explorations.
>
> But it's important to be clear about why we're doing this: it isn't a
> technical limitation or even a question of developer consistency.

The technical issue with using Level 1 setters isn't as bad in  
browsers as I had thought, but I disagree with your dismissal of the  
technical issue (see the XOM and browser-internal cases above).

-- 
Henri Sivonen
hsivonen@iki.fi
http://hsivonen.iki.fi/
Received on Tuesday, 17 February 2009 17:05:50 UTC