Re: HTML 4 Profile for RDFa from Julian Reschke on 2009-05-23 (public-html@w3.org from May 2009)

From: Julian Reschke <julian.reschke@gmx.de>
Date: Sat, 23 May 2009 14:34:07 +0200
To: Philip Taylor <pjt47@cam.ac.uk>
CC: Sam Ruby <rubys@intertwingly.net>, Shane McCarron <shane@aptest.com>, RDFa Community <public-rdfa@w3.org>, "public-rdf-in-xhtml-tf.w3.org" <public-rdf-in-xhtml-tf@w3.org>, HTML WG <public-html@w3.org>
Message-ID: <4A17ED3F.6020106@gmx.de>

Philip Taylor wrote:
> ...
>> That being said, I wouldn't hurt to have a section that defines 
>> special aspects of processing RDFa from a DOM instead of a HTML 
>> document (as a series of bytes/characters).
> 
> I think it would hurt if some RDFa implementations (that used a DOM) 
> extracted one set of triples, and some other implementations (that don't 
> use a DOM) extracted a different set of triples, so if there are 
> multiple sections defining different styles of processing then it'll 
> have to be very careful to produce identical results.

Yes.

>> Is it still underspecified once we require a valid HTML5 document as 
>> input?
> 
> Probably not. But I wouldn't consider it acceptable to require a valid 
> document as input - people make mistakes all the time, and I want them 
> to get consistent (and hopefully predictable) RDF triples out of it 
> regardless of what implementation they use, so the specification has to 
> deal precisely with invalid input. See 
> http://lists.w3.org/Archives/Public/public-rdf-in-xhtml-tf/2009May/0156.html 
> for an example of someone with precisely this kind of error.

Understood; I just wanted to understand the scope of the problem.

>>> For this to make sense in real HTML implementations, the definition 
>>> should be in terms of the document layer rather than the byte layer. 
>>
>> Disagreed. Many implementations never build a DOM. We're not only 
>> talking about browsers here.
> 
> By "DOM" I generally mean any kind of tree structure of elements and 
> attributes, either as an explicit data structure (DOM, XOM, ElementTree) 
> or implicit (SAX). Would any RDFa implementation *not* parse the input 
> HTML into that kind of structure and operate over the elements and 
> attributes as distinct objects? (e.g. would they just use regular 
> expressions over the input byte stream? That seems quite infeasible to 
> me...)

Depends on the definition of "tree structure". I've been involved in 
code that just uses a tokenizer and specialized stack, and 
implementations like these will not do the re-arranging of elements the 
HTML5 spec specifies for some kinds of broken input.

>>> How are xmlns:* attributes meant to be processed? E.g. what is the 
>>> expected output in the following cases:
>>>
>>> <div xmlns:T="test:">
>>>   <span typeof="t:x" property="t:y">Test</span>
>>> </div>
>>>
>>> <div XMLNS:t="test:">
>>>   <span typeof="t:x" property="t:y">Test</span>
>>> </div>
>>> [...]
>>
>> I would expect the results to be the same for XHTML and HTML 
>> serializations.
> 
> It would be good to be the same as far as possible, but in general that 
> is impossible to implement in a browser-based environment (or anything 
> built on any HTML parser I'm familiar with), because the case of 
> attributes is lost when parsing. We want to allow implementations in 
> browser-based environments, and we want them to match any other 
> implementations, so implementations in any other environment must handle 
> case-sensitivity in the same way.

That's impossible, at least for now as RDFa-in-XHTML relies on 
XML-NS-wellformedness (so XMLNS:* would be recognized as namespace 
declaration, right?).

BR, Julian

Received on Saturday, 23 May 2009 12:35:11 UTC