Re: HTML 4 Profile for RDFa from Philip Taylor on 2009-05-23 (public-html@w3.org from May 2009)

From: Philip Taylor <pjt47@cam.ac.uk>
Date: Sat, 23 May 2009 13:22:37 +0100
To: Julian Reschke <julian.reschke@gmx.de>
CC: Sam Ruby <rubys@intertwingly.net>, Shane McCarron <shane@aptest.com>, RDFa Community <public-rdfa@w3.org>, "public-rdf-in-xhtml-tf.w3.org" <public-rdf-in-xhtml-tf@w3.org>, HTML WG <public-html@w3.org>
Message-ID: <4A17EA8D.3080208@cam.ac.uk>

Julian Reschke wrote:
> Philip Taylor wrote:
>> [...]
>> What is "the @xml:lang attribute"? Is it the attribute with local name 
> 
> It's unambiguous as long as we talk about a stream of characters, right?

Yes (assuming it's clear that it means the sequence of characters that 
matches the 'attribute name' part of whatever grammar defines the HTML 
syntax and ASCII-case-insensitively matches the string "xml:lang", and 
assuming we don't worry about e.g. multiple xml:lang attributes being 
(invalidly) specified on the same element).

>> "xml:lang" in no namespace (as would be produced by an HTML 5 parser 
>> (and by current HTML browser parser implementations))? or the 
>> attribute with local name "lang" in the namespace 
>> "http://www.w3.org/XML/1998/namespace" (as would be produced by an XML 
>> parser, and could be inserted in an HTML document via DOM APIs)? or 
>> both (in which case both could be specified on one element, in 
>> addition to "lang" in no namespace)?
> 
> Both can only be specified in the DOM, but not in a serialization (or am 
> I missing something?).

I think that's roughly correct:

In an XML serialisation with no scripting, you can only get the 
attribute "lang" in "http://www.w3.org/XML/1998/namespace".

In a HTML5 text/html serialisation with no scripting, you can only get 
the attribute "xml:lang" in no namespace.

It would be easy to invent a new serialisation that does let you declare 
both attributes, e.g. http://simon.html5.org/specs/sdf

> That being said, I wouldn't hurt to have a section that defines special 
> aspects of processing RDFa from a DOM instead of a HTML document (as a 
> series of bytes/characters).

I think it would hurt if some RDFa implementations (that used a DOM) 
extracted one set of triples, and some other implementations (that don't 
use a DOM) extracted a different set of triples, so if there are 
multiple sections defining different styles of processing then it'll 
have to be very careful to produce identical results.

>> [...]
>>   <table>
>>     <tr some-attributes-to-say-this-element-outputs-an-XMLLiteral>
>>       <td> This text goes inside the table </td>
>>       This text gets parsed to *outside* the table
>>       <td> This text goes inside the table </td>
>>     </tr>
>>   </table>
>> [...] 
> Is it still underspecified once we require a valid HTML5 document as input?

Probably not. But I wouldn't consider it acceptable to require a valid 
document as input - people make mistakes all the time, and I want them 
to get consistent (and hopefully predictable) RDF triples out of it 
regardless of what implementation they use, so the specification has to 
deal precisely with invalid input. See 
http://lists.w3.org/Archives/Public/public-rdf-in-xhtml-tf/2009May/0156.html 
for an example of someone with precisely this kind of error.

>> For this to make sense in real HTML implementations, the definition 
>> should be in terms of the document layer rather than the byte layer. 
> 
> Disagreed. Many implementations never build a DOM. We're not only 
> talking about browsers here.

By "DOM" I generally mean any kind of tree structure of elements and 
attributes, either as an explicit data structure (DOM, XOM, ElementTree) 
or implicit (SAX). Would any RDFa implementation *not* parse the input 
HTML into that kind of structure and operate over the elements and 
attributes as distinct objects? (e.g. would they just use regular 
expressions over the input byte stream? That seems quite infeasible to 
me...)

>> How are xmlns:* attributes meant to be processed? E.g. what is the 
>> expected output in the following cases:
>>
>> <div xmlns:T="test:">
>>   <span typeof="t:x" property="t:y">Test</span>
>> </div>
>>
>> <div XMLNS:t="test:">
>>   <span typeof="t:x" property="t:y">Test</span>
>> </div>
>> [...]
> 
> I would expect the results to be the same for XHTML and HTML 
> serializations.

It would be good to be the same as far as possible, but in general that 
is impossible to implement in a browser-based environment (or anything 
built on any HTML parser I'm familiar with), because the case of 
attributes is lost when parsing. We want to allow implementations in 
browser-based environments, and we want them to match any other 
implementations, so implementations in any other environment must handle 
case-sensitivity in the same way.

-- 
Philip Taylor
pjt47@cam.ac.uk

Received on Saturday, 23 May 2009 12:23:20 UTC