Re: RDFa and Web Directions North 2009 from Mark Birbeck on 2009-02-16 (public-rdf-in-xhtml-tf@w3.org from February 2009)

From: Mark Birbeck <mark.birbeck@webbackplane.com>
Date: Mon, 16 Feb 2009 08:32:06 +0000
To: Henri Sivonen <hsivonen@iki.fi>
Cc: Sam Ruby <rubys@intertwingly.net>, Kingsley Idehen <kidehen@openlinksw.com>, Dan Brickley <danbri@danbri.org>, Michael Bolger <michael@michaelbolger.net>, public-rdfa@w3.org, RDFa mailing list <public-rdf-in-xhtml-tf@w3.org>, Tim Berners-Lee <timbl@w3.org>, Dan Connolly <connolly@w3.org>, Ian Hickson <ian@hixie.ch>
Message-ID: <ed77aa9f0902160032k24cc0f5cq602b45bbd705d587@mail.gmail.com>
Hi Henri,

I hope you'll forgive me for not replying to each of the points in
your email in turn, but the bulk of it does seem to cover ground that
has already been addressed. (You still seem to insist that somehow
there must be a code fork if we want to have script that runs in both
HTML and XHTML DOMs, yet three people have now explained that there is
no need for one.)

So if I may, I'll just highlight a couple of things.

The first is that whether 'URIs as identifiers' are an "annoyance", or
the fundamental architecture of RDF needs revising, I can only say
that with respect, you need to lighten up a bit; it may surprise you
to learn that many things you don't like, or cannot see a use for,
have enormous communities based around them and are daily solving
serious problems. No-one is asking you to like them, but don't assume
that just because something doesn't fit with your world-view, there is
no value in what has been done.

The second is that you say that if I "wish to get new features added
to HTML5", I need to do this or that; I would like to clarify that
whether HTML5 supports RDFa is neither here nor there to me. (You'll
notice that I only joined this debate to clarify a point about
processing.)

This is because, provided that HTML5 doesn't break the DOM, i.e., the
JavaScript interfaces are backwards compatible, then RDFa is *already
in* -- by usage rather than by standard.

I should clarify that I am not speaking for others in the RDFa
community here (or indeed the RDF one), but, hey...we're a broad
church. :)

I'm speaking only for myself, and in my view the rationale for HTML5,
and the process it is following, simply does not square with something
as open as RDFa. Consequently it is really unlikely to ever
accommodate RDFa 'officially'. (More on that at [1].)

Your email seems to illustrate this quite well, in that the more we
discuss this, the greater the number of objections there seem to be,
rather than fewer.

But since RDFa is just a bunch of attributes, and incorporating them
into any document is straightforward, I personally don't have any
problem with that.

Regards,

Mark

[1] <http://webbackplane.com/mark-birbeck/blog/2009/01/rdfa-means-extensibility>

On Mon, Feb 16, 2009 at 7:54 AM, Henri Sivonen <hsivonen@iki.fi> wrote:
> On Feb 14, 2009, at 01:57, Mark Birbeck wrote:
>
>> You seem to be implying that there is a fundamental impediment to
>> creating an RDFa parser using the tools available in an HTML DOM. You
>> base this assertion on Henri's document, but all his script shows is
>> that objects in an HTML DOM don't have namespace information
>> available.
>>
>> That's no surprise.
>>
>> My response is that this is irrelevant.
>
>  1) Content consumer software should work both with HTML (text/html) and
> XHTML (application/xhtml+xml) if it works with one of them.
>
>  2) For sane *software* architecture, code above the HTML/XML parsing layer
> should be able to run its dispatch code without any conditional branches on
> the HTMLness or XMLness of the origin of the data it is operating on. This
> applies to native browser code, JavaScript code running in a browser and
> non-browser (X)HTML consumers. (Even easy-looking tiny variations add up.)
>
>  3) The point above is not about abstract XML architecture. It is an actual
> way of implementing software including (but not limited to) Gecko, WebKit,
> Presto (as far as can be guessed without seeing the code) and Validator.nu.
> Furthermore, the dominant design
> (http://en.wikipedia.org/wiki/Dominant_Design) of HTML5 parsers for
> non-browser applications is that they expose an XML API so that the
> application-level code is written as if working with an XML parser parsing
> an equivalent XHTML5 file.
>
>  4) The qname is an artifact of the Namespaces in XML layer in XML and
> should not be significant to the application. The correct way to do
> namespace-wise correct dispatch is to dispatch on the [namespace,local]
> pair. If you are inspecting the qname of an attribute or element for any
> reason other than round-tripping serialization, you are Doing it Wrong.
>
>  5) Given the points above, you should also do dispatch on the
> [namespace,local] pair on the HTML side.
>
>  6) All features going into HTML5 should be robust and sane under scripting
> even if the people proposing the feature where interested in read-only use
> case is outside browsers. This includes keeping script-generated DOMs
> serializable.
>
>  7) If, in order to satisfy point #2 above, your feature requires using
> getAttribute (without NS) on getting but setAttributeNS (with NS) on setting
> (to keep the XML DOM serializable!), your feature isn't satisfying point #6.
>
>  8) So far, experience shows that even violations all of the above points
> that look small--such as lang vs. xml:lang--are more hurtful than people
> imagine at first. Examples:
>  a) Browsers need to inspect two attributes instead of one to discover the
> language.
>  b) To abstract problem a) away in non-browser applications in
> high-performance (in terms of CPU instructions executed per application-made
> query for an attribute) manner, the static RAM footprint of the Validator.nu
> HTML Parser is bloated by pointer size times 2328!
>  c) The lang & xml:lang part of the HTML5 spec has had the highest incidence
> of validator bugs per spec sentence. (Bugs are bad and costly.)
> Hence, all violations all the above points should be taken very seriously
> even if in isolation on their face the violations seemed ridiculously small
> to be indignant about. Violations for xml:lang legacy are somewhat
> excusable. Introducing new violations isn't.
>
>  9) If you are defining something in terms all of the namespace mapping
> context, but you can't use DOM Level 3 lookupPrefix() to implement it
> (without violationg point #2), you are Doing it Wrong.
>
> 10) Browsers aren't the only kind of Web content consumer software. What you
> are specifying should work with XML API environments other than the browser
> flavor of DOM.
>
> 11) SAX2--arguable the most correct and complete XML API there is--when run
> in the Namespace-aware mode (i.e. the correct mode considering contemporary
> XML architecture) doesn't expose the namespace declarations as attributes.
> Therefore, a SAX2-based RDFa-in-XHTML consumer needs to use the
> non-attribute abstraction (startPrefixMapping()) for gathering the namespace
> mapping context. However, the same application-level code (see point #2)
> wouldn't work with an HTML5 parser that implements mapping from text/html to
> SAX2 as defined today in the HTML 5 draft and as sufficient for all the
> HTML5 features drafted so far.
>
> 12) XOM--arguable the most correct of the well-known XML tree APIs for
> Java--doesn't expose the namespace declarations as attributes. Therefore, a
> XOM-based RDFa-in-XHTML consumer needs to use the non-attribute abstraction
> for using the namespace mapping context. However, the same application-level
> code (see point #2) wouldn't work with an HTML5 parser that implements
> mapping from text/html to XOM as defined today in the HTML 5 draft and as
> sufficient for all the HTML5 features drafted so far. (XOM even disallows
> including attributes names xmlns:foo in the tree.)
>
> 13) If points 9 through 12 were addressed by changing HTML5 parsers to
> expose attributes called xmlns:foo as namespace mapping context, the change
> HTML5 to enable RDFa would be notably more complex than just adding a few
> attributes.
>
>> An RDFa parser needs to be able to 'spot' whether an attribute name
>> begins 'xmlns:', but for that we don't need namespace support -- it's
>> just string matching, no different to detecting an attribute like
>> @data-length [1].
>
> getAttributesNS(null, "data-length") works consistently in text/html and
> application/xhtml+xml.
>
>>> And I wrote that "HTML parsing rules differ in visible ways from XHTML.
>>> Ways that affect the specific names of attributes chose[sic] in RDFa."
>>
>> But the attributes in RDFa are not prefixed -- @about, @resource,
>> @datatype and @content are new attributes, whilst @rel, @rev, @href
>> and @src already exist -- so I don't see in what way the names were
>> 'chosen' in a way that was influenced by XHTML.
>
> Thank you for not prefixing the attribute names. However, you did to make
> the attribute values sensitive to the namespace mapping context.
>
>>> A list of the parsers alluded to above would be helpful as an existence
>>> proof for the above assertion.
>>
>> I think you have this the wrong way round.
>>
>> The parsing algorithm for RDFa refers to attributes and elements,
>> navigated by recursively traversing the hierarchy. It's therefore
>> applicable to anything that has such a hierarchical structure, and
>> that allows attribute values to be retrieved. Both HTML and XHTML DOMs
>> fit this description.
>
> But do they fit the description with the exact same above-parser code? (See
> my point #2 above.)
>
>> So I'd like to see a proof that shows that this simple architecture
>> makes it impossible to create an RDFa parser on top of an HTML DOM.
>> Henri has not provided a proof of anything other than that an HTML DOM
>> doesn't support namespaces, yet for some reason this 'non-proof' gets
>> circulated as fact.
>
> It is not circulated as proof that you can't implement an RDFa parser on top
> of an HTML DOM. It is circulated as proof that you can't implement an RDFa
> parser that a) works without conditional branches on HTMLness/XMLness and b)
> without violating Namespace-wise correct coding practices on c) *both* HTML
> and XML parser output.
>
>>> Your recent statement that "I can assure you that the parsing rules were
>>> very explicitly written in such a way that the only thing they require to
>>> do
>>> their work is a hierarchy of nodes, and the ability to obtain the value
>>> of
>>> an attribute.", while technically true, tends to obscure more than reveal
>>> when it comes to these differences.
>>
>> Again...what differences? I'm still confused as to what it is that
>> we're being different to.
>>
>> Just in case what you are getting at is that there is somehow a
>> difference between parsing RDFa in XHTML and parsing RDFa in HTML, I
>> can only say again that there isn't -- there is only one parsing
>> algorithm in RDFa.
>
> See my points 9 through 12 above.
>
> Do the existing RDFa parsers run different code (i.e. taking different
> branches) above the HTML and XML parsers?
>
> Obviously, you can make an RDFa parser for text/html if the API the parser
> exposes violates the Infoset or differs from browser behavior and you run
> different code for expanding CURIEs in the text/html and
> application/xhtml+xml cases or you run Namespace-wise bogus code for the XML
> case.
>
>>> Actually, I say differences.  I only have an existence proof for one
>>> difference at the moment.  Is there more?  Beats me.  Hence my assertion
>>> that a definitive list would be helpful.
>>
>> As I said, the "existence proof" of which you speak (Henri's one),
>> proves only that namespace properties do not exist in an HTML DOM,
>> whilst they do in an XHTML DOM.
>>
>> That's very different from being an "existence proof" that there are
>> two (or more) algorithms for parsing RDFa in a DOM, since RDFa does
>> not require namespaces per se.
>
> Again, points 9 through 12 above.
>
>> The only reason I entered this debate was to clarify the single point
>> that you made, propagating Henri's false claim -- that since the HTML
>> DOM does not provide namespace information, it is therefore not
>> possible (or 'more difficult') to create an RDFa parser.
>
> If you violate point #2, you make things more difficult. By how much? See
> point #8.
>
> This problem can be addressed by using absolute URIs instead of CURIEs and
> phasing out CURIEs by declaring xmlns:http="http:" on the XML side during
> the transition. (If that makes the predicates annoyingly long, what you have
> is a fundamental problem with the idea of using URIs as identifiers as
> opposed to using them for application-level addressing on the Internet. In
> that case, you should address that problem directly on the level of the RDF
> model instead of trying to push the annoyance around syntactically.)
>
> If you wish to get new features added to HTML5 and the proposed syntax
> depends on element or attribute names that contain the colon (xmlns:foo in
> this case), you are just asking for trouble because the colon is special in
> XML but not in text/html (and if you ask making it special in text/html,
> too, you are asking more than just adding a few attributes).
>
> --
> Henri Sivonen
> hsivonen@iki.fi
> http://hsivonen.iki.fi/
>
>
>



-- 
Mark Birbeck, webBackplane

mark.birbeck@webBackplane.com

http://webBackplane.com/mark-birbeck

webBackplane is a trading name of Backplane Ltd. (company number
05972288, registered office: 2nd Floor, 69/85 Tabernacle Street,
London, EC2A 4RR)
Received on Monday, 16 February 2009 08:32:46 UTC