- From: Julian Reschke <julian.reschke@gmx.de>
- Date: Sat, 23 May 2009 13:17:45 +0200
- To: Philip Taylor <pjt47@cam.ac.uk>
- CC: Sam Ruby <rubys@intertwingly.net>, Shane McCarron <shane@aptest.com>, RDFa Community <public-rdfa@w3.org>, "public-rdf-in-xhtml-tf.w3.org" <public-rdf-in-xhtml-tf@w3.org>, HTML WG <public-html@w3.org>
Philip Taylor wrote: > ... > Indeed, it would be good have this defined with the level of precision > that HTML 5 has, so we can be sure implementations will be able to agree > on how to extract RDFa from text/html content. > > A few significant issues that I see in the current version: > > What is "the @xml:lang attribute"? Is it the attribute with local name It's unambiguous as long as we talk about a stream of characters, right? > "xml:lang" in no namespace (as would be produced by an HTML 5 parser > (and by current HTML browser parser implementations))? or the attribute > with local name "lang" in the namespace > "http://www.w3.org/XML/1998/namespace" (as would be produced by an XML > parser, and could be inserted in an HTML document via DOM APIs)? or both > (in which case both could be specified on one element, in addition to > "lang" in no namespace)? Both can only be specified in the DOM, but not in a serialization (or am I missing something?). That being said, I wouldn't hurt to have a section that defines special aspects of processing RDFa from a DOM instead of a HTML document (as a series of bytes/characters). > "If the object of a triple would be an XMLLiteral, and the input to the > processor is not well-formed [XML]" - I don't understand what that means > in an HTML context. Is it meant to mean something like "the bytes in the > HTML file that correspond to the contents of the relevant element could > be parsed as well-formed XML (modulo various namespace declaration > issues)"? If so, that seems impossible to implement. The input to the > RDFa processor will most likely be a DOM, possibly manipulated by the > DOM APIs rather than coming straight from an HTML parser, so it may > never have had a byte representation at all. > > Even without scripting, there isn't always a contiguous sequence of > bytes corresponding to the content of an element. E.g. if the HTML input > is: > <table> > <tr some-attributes-to-say-this-element-outputs-an-XMLLiteral> > <td> This text goes inside the table </td> > This text gets parsed to *outside* the table > <td> This text goes inside the table </td> > </tr> > </table> > then (according to the HTML 5 parsing algorithm, and implemented in (at > least) Firefox) the content of the <tr> element includes the first and > third lines of text, but not the second. How would you decide whether > the content is well-formed XML? Is it still underspecified once we require a valid HTML5 document as input? > For this to make sense in real HTML implementations, the definition > should be in terms of the document layer rather than the byte layer. Disagreed. Many implementations never build a DOM. We're not only talking about browsers here. > ... > How are xmlns:* attributes meant to be processed? E.g. what is the > expected output in the following cases: > > <div xmlns:T="test:"> > <span typeof="t:x" property="t:y">Test</span> > </div> > > <div XMLNS:t="test:"> > <span typeof="t:x" property="t:y">Test</span> > </div> > > <div xmlns:T="test:"> > <span typeof="T:x" property="T:y">Test</span> > </div> > > <div xmlns:t="test:"> > <div xmlns:t=""> > <span typeof="t:x" property="t:y">Test</span> > </div> > </div> I would expect the results to be the same for XHTML and HTML serializations. > <div xmlns:t="test1:" id="d"> > <span typeof="t:x" property="t:y">Test</span> > </div> > <script> > document.getElementById('d').setAttributeNS( > 'http://www.w3.org/2000/xmlns/', 'xmlns:t', 'test2:'); > /* (now the element has two distinct attributes, > each in different namespaces) */ > </script> That example illustrates why it's dangerous to focus too much on processing in the DOM. Many RDFa processors will never execute the script. So I think considerations like the one above should be treated as a distinct problem (potentially in an appendix of the spec). > ... BR, Julian
Received on Saturday, 23 May 2009 11:18:36 UTC