- From: Manu Sporny <msporny@digitalbazaar.com>
- Date: Tue, 26 May 2009 22:06:22 -0400
- To: Toby Inkster +ADw-tai+AEA-g5n.co.uk+AD4
- CC: RDFa mailing list +ADw-public-rdf-in-xhtml-tf+AEA-w3.org+AD4, HTMLWG WG +ADw-public-html+AEA-w3.org+AD4
Toby Inkster wrote: > On Mon, 2009-05-25 at 20:55 -0400, Manu Sporny wrote: >> So, thoughts on this issue? > > I don't think that a big song and dance is needed over this. The issue > seems pretty simple to me. Hmm, I don't think it is that simple, and here's why... If you have the following markup: <div about="#foo" xmlns:dc="http://purl.org/dc/elements/1.1/"> <span property="dc:description"><br>para1</span> </div> A SAX-based parser (such as Expat), parsing an XHTML document will fail to generate a triple due to a parser error. Even if you do some sort of self-healing and continue processing the document, the XMLLiteral should not be produced because the contents are not well-formed XML. However, an HTML5lib-based parser would correct the input to the following before a purely DOM-based RDFa processor could see the contents of the SPAN element: <div about="#foo" xmlns:dc="http://purl.org/dc/elements/1.1/"> <span property="dc:description"><br/>para1</span> </div> which would then generate the following triple: <#foo> <http://purl.org/dc/elements/1.1/description> '<br xmlns="http://www.w3.org/1999/xhtml" xmlns:dc="http://purl.org/dc/elements/1.1/" />para1'^^rdf:XMLLiteral . So, we have the exact same markup generating two completely different sets of XMLLiteral triples. If one of our goals is to generate the same triples across different types of markup - we are failing to do so with the current set of processing rules. > Sometimes an RDFa parser, dealing with HTML, > will hit a situation where it needs to generate an XMLLiteral from > non-wellformed HTML. In these situations, it seems to me that we have a > choice of three potential "the parser MUST" actions, all of which are > roughly consistent with RDFa in XHTML: > > 1. The parser MUST ignore this triple altogether. A simple solution, and > it means that the HTML graph would be a subset of the XHTML graph. RDF > vocabularies are generally defined so that if a graph G is true, then > any graph H such that H is a subset of G is also true. The XHTML parser can't ignore the triple due to a parser error, or if it corrects the parser error, shouldn't output the malformed XMLLiteral. The HTML5lib parser will never see that the XMLLiteral was malformed. > 2. The parser MUST add the triple to the graph as normal, but MUST NOT > set the literal's datatype to XMLLiteral. They could either leave the > literal as an untyped literal (that happened to have a lot of angled > brackets in it) or perhaps set it to some HTMLLiteral datatype of our > own concoction. This would be a problem because the XML-based parser implementations would switch the datatype of the object to something like XMLCharacterStream, while the html5lib parser would output an XMLLiteral. I don't believe that there is any such thing as an malformed XMLLiteral in HTML5... is there? Can anybody think of an example of an invalid XMLLiteral in an html5 parser? > 3. The parser MUST coerce the HTML fragment into a well-formed (but not > necessarily valid) XHTML fragment. The HTML5 draft gives us decent > algorithms for doing this. It does, but HTML5 has nothing to do with XHTML1.1 and XHTML2 - why should we apply HTML5's parsing rules to XHTML1.1 and XHTML2 documents? I don't think that this is something we can 'MUST' ourselves out of... relaxing the conformance requirements to not include XMLLiterals seems to be a mechanism that would: a) Allow variance in IF and HOW XMLLiterals are generated - which will vary based on if a document is being parsed by a SAX-based XML parser in XHTML1.1, or a DOM-based Javascript parser in HTML5. b) Not automatically disqualify all DOM-based HTML5 implementations, or non-raw-stream-based XHTML1.1 implementations. Although, even this approach bothers me quite a bit... as does getting rid of XMLLiterals all-together. -- manu -- Manu Sporny President/CEO - Digital Bazaar, Inc. blog: A Collaborative Distribution Model for Music http://blog.digitalbazaar.com/2009/04/04/collaborative-music-model/
Received on Wednesday, 27 May 2009 02:06:59 UTC