- From: Julian Reschke <julian.reschke@gmx.de>
- Date: Sat, 23 May 2009 13:17:45 +0200
- To: Philip Taylor <pjt47@cam.ac.uk>
- CC: Sam Ruby <rubys@intertwingly.net>, Shane McCarron <shane@aptest.com>, RDFa Community <public-rdfa@w3.org>, "public-rdf-in-xhtml-tf.w3.org" <public-rdf-in-xhtml-tf@w3.org>, HTML WG <public-html@w3.org>
Philip Taylor wrote:
> ...
> Indeed, it would be good have this defined with the level of precision
> that HTML 5 has, so we can be sure implementations will be able to agree
> on how to extract RDFa from text/html content.
>
> A few significant issues that I see in the current version:
>
> What is "the @xml:lang attribute"? Is it the attribute with local name
It's unambiguous as long as we talk about a stream of characters, right?
> "xml:lang" in no namespace (as would be produced by an HTML 5 parser
> (and by current HTML browser parser implementations))? or the attribute
> with local name "lang" in the namespace
> "http://www.w3.org/XML/1998/namespace" (as would be produced by an XML
> parser, and could be inserted in an HTML document via DOM APIs)? or both
> (in which case both could be specified on one element, in addition to
> "lang" in no namespace)?
Both can only be specified in the DOM, but not in a serialization (or am
I missing something?).
That being said, I wouldn't hurt to have a section that defines special
aspects of processing RDFa from a DOM instead of a HTML document (as a
series of bytes/characters).
> "If the object of a triple would be an XMLLiteral, and the input to the
> processor is not well-formed [XML]" - I don't understand what that means
> in an HTML context. Is it meant to mean something like "the bytes in the
> HTML file that correspond to the contents of the relevant element could
> be parsed as well-formed XML (modulo various namespace declaration
> issues)"? If so, that seems impossible to implement. The input to the
> RDFa processor will most likely be a DOM, possibly manipulated by the
> DOM APIs rather than coming straight from an HTML parser, so it may
> never have had a byte representation at all.
>
> Even without scripting, there isn't always a contiguous sequence of
> bytes corresponding to the content of an element. E.g. if the HTML input
> is:
> <table>
> <tr some-attributes-to-say-this-element-outputs-an-XMLLiteral>
> <td> This text goes inside the table </td>
> This text gets parsed to *outside* the table
> <td> This text goes inside the table </td>
> </tr>
> </table>
> then (according to the HTML 5 parsing algorithm, and implemented in (at
> least) Firefox) the content of the <tr> element includes the first and
> third lines of text, but not the second. How would you decide whether
> the content is well-formed XML?
Is it still underspecified once we require a valid HTML5 document as input?
> For this to make sense in real HTML implementations, the definition
> should be in terms of the document layer rather than the byte layer.
Disagreed. Many implementations never build a DOM. We're not only
talking about browsers here.
> ...
> How are xmlns:* attributes meant to be processed? E.g. what is the
> expected output in the following cases:
>
> <div xmlns:T="test:">
> <span typeof="t:x" property="t:y">Test</span>
> </div>
>
> <div XMLNS:t="test:">
> <span typeof="t:x" property="t:y">Test</span>
> </div>
>
> <div xmlns:T="test:">
> <span typeof="T:x" property="T:y">Test</span>
> </div>
>
> <div xmlns:t="test:">
> <div xmlns:t="">
> <span typeof="t:x" property="t:y">Test</span>
> </div>
> </div>
I would expect the results to be the same for XHTML and HTML serializations.
> <div xmlns:t="test1:" id="d">
> <span typeof="t:x" property="t:y">Test</span>
> </div>
> <script>
> document.getElementById('d').setAttributeNS(
> 'http://www.w3.org/2000/xmlns/', 'xmlns:t', 'test2:');
> /* (now the element has two distinct attributes,
> each in different namespaces) */
> </script>
That example illustrates why it's dangerous to focus too much on
processing in the DOM. Many RDFa processors will never execute the
script. So I think considerations like the one above should be treated
as a distinct problem (potentially in an appendix of the spec).
> ...
BR, Julian
Received on Saturday, 23 May 2009 11:18:36 UTC