- From: Philip Taylor <pjt47@cam.ac.uk>
- Date: Thu, 14 May 2009 21:11:21 +0100
- To: Sam Ruby <rubys@intertwingly.net>
- CC: Shane McCarron <shane@aptest.com>, RDFa Community <public-rdfa@w3.org>, "public-rdf-in-xhtml-tf.w3.org" <public-rdf-in-xhtml-tf@w3.org>, HTML WG <public-html@w3.org>
Sam Ruby wrote: > Shane McCarron wrote: >> Folks, >> >> Thanks to you all for encouraging me to create a draft profile for >> RDFa in HTML 4. This document has no official standing of course - it >> is just something we at ApTest have been using for a while as a way of >> pushing metadata into traditional web sites and user agents. >> >> You can find the latest version at >> http://www3.aptest.com/standards/rdfa-html/ >> >> Feel free to send comments to me directly or to the public-rdfa@w3.org >> list if you want to share them with the community. I look forward to >> seeing what you think! > > A promising start! > > I would hope that we could work together to get HTML 5 included and the > various issues that have been discussed to date resolved. Indeed, it would be good have this defined with the level of precision that HTML 5 has, so we can be sure implementations will be able to agree on how to extract RDFa from text/html content. A few significant issues that I see in the current version: What is "the @xml:lang attribute"? Is it the attribute with local name "xml:lang" in no namespace (as would be produced by an HTML 5 parser (and by current HTML browser parser implementations))? or the attribute with local name "lang" in the namespace "http://www.w3.org/XML/1998/namespace" (as would be produced by an XML parser, and could be inserted in an HTML document via DOM APIs)? or both (in which case both could be specified on one element, in addition to "lang" in no namespace)? "If the object of a triple would be an XMLLiteral, and the input to the processor is not well-formed [XML]" - I don't understand what that means in an HTML context. Is it meant to mean something like "the bytes in the HTML file that correspond to the contents of the relevant element could be parsed as well-formed XML (modulo various namespace declaration issues)"? If so, that seems impossible to implement. The input to the RDFa processor will most likely be a DOM, possibly manipulated by the DOM APIs rather than coming straight from an HTML parser, so it may never have had a byte representation at all. Even without scripting, there isn't always a contiguous sequence of bytes corresponding to the content of an element. E.g. if the HTML input is: <table> <tr some-attributes-to-say-this-element-outputs-an-XMLLiteral> <td> This text goes inside the table </td> This text gets parsed to *outside* the table <td> This text goes inside the table </td> </tr> </table> then (according to the HTML 5 parsing algorithm, and implemented in (at least) Firefox) the content of the <tr> element includes the first and third lines of text, but not the second. How would you decide whether the content is well-formed XML? For this to make sense in real HTML implementations, the definition should be in terms of the document layer rather than the byte layer. (The XMLLiteral should be an XML-fragment serialisation of the element, and some error handling (like ignoring the triple) would occur if it's impossible to serialise as XML, similar to the requirements in <http://www.whatwg.org/specs/web-apps/current-work/multipage/the-xhtml-syntax.html#serializing-xhtml-fragments>) How are xmlns:* attributes meant to be processed? E.g. what is the expected output in the following cases: <div xmlns:T="test:"> <span typeof="t:x" property="t:y">Test</span> </div> <div XMLNS:t="test:"> <span typeof="t:x" property="t:y">Test</span> </div> <div xmlns:T="test:"> <span typeof="T:x" property="T:y">Test</span> </div> <div xmlns:t="test:"> <div xmlns:t=""> <span typeof="t:x" property="t:y">Test</span> </div> </div> <div xmlns:t="test1:" id="d"> <span typeof="t:x" property="t:y">Test</span> </div> <script> document.getElementById('d').setAttributeNS( 'http://www.w3.org/2000/xmlns/', 'xmlns:t', 'test2:'); /* (now the element has two distinct attributes, each in different namespaces) */ </script> Should the same processing rules be used for documents from both HTML and XHTML parsers, or would DOM-based implementations need to detect where the input came from and switch processing rules accordingly? If there is a difference, what happens if I adoptNode from an XHTML document into an HTML document, or vice versa? -- Philip Taylor pjt47@cam.ac.uk
Received on Thursday, 14 May 2009 20:12:02 UTC