Re: Request to publish HTML+RDFa (draft 3) as FPWD from Maciej Stachowiak on 2009-09-22 (public-rdf-in-xhtml-tf@w3.org from September 2009)

From: Maciej Stachowiak <mjs@apple.com>
Date: Tue, 22 Sep 2009 16:47:53 -0700
To: Mark Birbeck <mark.birbeck@webbackplane.com>
Cc: Jonas Sicking <jonas@sicking.cc>, HTMLWG WG <public-html@w3.org>, RDFa mailing list <public-rdf-in-xhtml-tf@w3.org>
Message-id: <28BBBFBD-6C38-4800-8FFE-E01A144B96A8@apple.com>
On Sep 22, 2009, at 3:42 PM, Mark Birbeck wrote:

> HI Jonas,
>
>> It certainly matters. If for example if method 1 or 2 were used then
>> no prefix mappings would be found at all in the DOM output from a  
>> HTML
>> parser. So it really *does* matter how you do prefix mapping. And as
>> far as DOM 2 goes, I think 1 or 2 are the intuitive solutions so if
>> we're not using those then I *really* think it's important to specify
>> so.
>>
>> In any case, I think I've spent enough time on this issue. I can't
>> really articulate the problem any more than I have. I hope this issue
>> is solved by the time last call rolls around.
>
> I see that you are frustrated, but you seem to think that the issue is
> that no-one understands your position.
>
> We *do* understand your position, and are trying to explain to you,
> that -- with all due respect -- it is based on a misunderstanding.
>
> You are looking at implementation specifics, and as many people have
> explained, implementation is not the issue. This is because the spec
> is defining an algorithm, which entitles people to implement things
> how they see fit, on whatever platform they want to write for, using
> whatever language they want to use.

What Jonas is saying is that the spec algorithms as stated don't let  
you choose between implementation strategies that at first glance seem  
equally valid but in fact will give different results. He gave some  
specific examples - how to get prefix mappings in a DOM, how to  
extract triples from an HTML document that would result in  
reparenting, and whether prefix mappings should be assigned to  
elements at parse time or extraction time if the DOM can be mutated  
after parsing.

It seems like people reject his arguments for what superficially  
appear to be mutually contradictory reasons: (a) that RDFa doesn't  
really use Namespaces in XML, it just uses a syntax that looks the  
same but could have been anything; (b) that RDFa normatively  
references Namespaces in XML for implementation requirements; (c) that  
RDFa is defined purely at the raw source text level (even though the  
spec's processing rules speak of an abstract tree model); (d) that  
RDFa can be applied directly to situations where original source text  
is not available or may not even exist.

I'm pretty puzzled by the argument that RDFa is defined in terms of  
raw source text. The start of section 5 or XHTML+RDFa says:

"Processing need not follow the DOM traversal technique outlined here,  
although the effect of following some other manner of processing must  
be the same as if the processing outlined here were followed. The  
processing model is explained using the idea of DOM traversal which  
makes it easier to describe (particularly in relation to the  
[evaluation context])."

And indeed Section 5 describes processing in terms of DOM concepts  
such as "document object", "child element", "document order" and so  
forth. Later Section 5.5 describes its algorithm as "the DOM traversal  
technique defined here".

It seems to me like it would be much more fruitful to go with this DOM- 
like formalism instead of pretending that things are actually defined  
at the textual level. They are not - nowhere does RDFa describe how to  
get from source characters to its tree model for processing, that is  
all left up to other specs (and with the understanding that  
implementations may do things without a tree, as long as they give  
equivalent results).

Buying into the DOM-based model that XHTML+RDFa already uses for its  
processing rules would immediately answer many of Jonas's questions:

- HTML5+RDFa should be processed by taking the DOM that results from  
the HTML5 parsing algorithm. As with XHTML+RDFa, you don't have to  
literally create a DOM, but your output must be equivalent to the  
processing defined in DOM terms.
- DOM mutations that happen before RDFa extraction *do* potentially  
affect the extracted triples.
- HTML source documents that are parsed in a way that reparents nodes.
- There is no need to first serialize a DOM in order to process it  
according to RDFa.

The only detail that would have to be filled in, if we accept the DOM- 
based model that the spec already uses, is how to find the prefix  
mappings. Either an XHTML+RDFa erratum or HTML5+RDFa could specify  
that any attribute with a qualified name (tagName) that starts with  
"xmlns:" creates a prefix mapping.

Buying into the DOM approach would also address Henri's objection  
about bad spec layering.

Regards,
Maciej
Received on Tuesday, 22 September 2009 23:48:36 UTC