- From: Manu Sporny <msporny@digitalbazaar.com>
- Date: Mon, 25 May 2009 20:55:46 -0400
- To: RDFa mailing list <public-rdf-in-xhtml-tf@w3.org>, HTMLWG WG <public-html@w3.org>
What follows is an offline conversation Shane and I had regarding XMLLiterals in the RDFa in HTML spec. It concerns the "an RDFa parser must generate the same triples across all HTML family languages" goal. [17:32:39] Manu Sporny: I'm getting increasingly concerned over our attempts to solve the XMLLiteral problem (in RDFa in HTML). [17:33:27] … especially since browser DOM-based implementations are not going to be able to do what raw-stream implementations can do. [17:34:36] … specifically, requiring well-formedness. [17:36:32] … I think we have two really nasty issues right now: The first being xmlns + case sensitivity: both of which are fixed if we move to @prefix and declare that prefix is always case-sensitive (and implement the legacy case-insensitivity stuff for xmlns that we've been talking about in the community over the past couple of days). [17:36:52] … The second is XMLLiterals - to which I don't really have a good solution... [17:37:09] … They're just borked based on the current DOM implementations. [17:37:12] Shane McCarron: we have no option about well formedness of xml literals. its a requirement [17:37:18] … not our requirement. RDF [17:37:56] Manu Sporny: I agree - it's the implementation of the extraction method of an XML Literal that I'm having a problem with. [17:39:11] … especially since the extraction method varies greatly between HTML4, XHTML and HTML5. [17:40:05] Shane McCarron: I have no problem at all with just eliminating XMLLiterals altogether. [17:40:22] Manu Sporny: Did you see that <table> <tr></tr><span>foobar</span><tr></tr></table> example that was outlined on the mailing list and how html5lib handles that case? [17:40:47] … Right, I'm in favor of eliminating XMLLiterals completely... or replacing it with something like XMLCharacterSequence. [17:41:25] … The application can attempt to do something with XMLCharacterSequence to transform it into an XMLLiteral, but I certainly don't think we should be doing anything with it. [17:41:32] Shane McCarron: the example in the mailing list is a red herring imho. [17:41:36] Manu Sporny: although, that's a pretty huge change to the spec. [17:41:38] … why? [17:42:07] Shane McCarron: because it is invalid and we ONLY define behavior for valid input. I know that gives you hearburn, but I can't help that [17:42:24] … there are millions of error conditions that we would need to document. so we document none of them. [17:42:51] Manu Sporny: No, I'm not a "document all the error conditions" person - I think that's ridiculous. [17:43:46] … My argument is that we can't, in good faith, expect that this XMLLiteral thing is going to work. [17:44:04] … because there is so much erroneous XML text out there. [17:44:16] … and because of what DOM implementations do to the original document. [17:45:33] … XMLLiterals (and XMLCharacterSequences) are just flat out not implementable in Javascript (to the same degree that they're implementable in raw input data streams). [17:45:43] … s/data streams/ data stream parsers/ [17:46:11] … In any case, I think we need to seriously re-think this whole XMLLiteral thing... [17:47:57] … That or provide an API for Javascript to get the raw document content (which may already exist). [17:49:11] Shane McCarron: would only help client side. what about dom based implementations server side or in the toolchain? [17:49:35] Manu Sporny: yeah, you're right... that's still an issue... [17:49:59] Shane McCarron: its a way bigger issue imho. the interesting part of the semantic web is NOT client side [17:50:04] … at least not at this level [17:50:05] Manu Sporny: which means XML Literals are very difficult to do not only in the browser, but out of the browser as well. [17:51:10] … The core of the issue is that it bothers me that we're defining behavior for something we know to not be implementable in DOM-based implementations. [17:51:55] Shane McCarron: err... well, we didn't know that at the time. we defined this YEARS ago [17:53:30] Manu Sporny: Right, I'm not throwing blame for past decisions made - time lends clarity to things like this... but it feels like we're trying to just put in support for XMLLiterals without looking at the DOM-based implementation landscape first. [17:53:56] … There's a strong argument for XML Literals, which is "if it isn't well-formed XML, then it's not an XML Literal - so don't generate a triple". [17:55:03] … but then, you're never going to have the same sorts of ill-formed XMLLiterals in tag-soup parsers or HTML5, since it'll re-arrange the DOM to ensure a well-formed XML Literal (in some cases). [17:56:20] … So now you'll have tagsoup/HTML5 DOM-based parsers outputting a valid XML Literal when their non-DOM based, raw stream parsers know better than to output the same invalid XML Literal. [17:56:42] Shane McCarron: understood [17:58:07] Manu Sporny: I think I just convinced myself to be strongly against XMLLiterals in RDFa in HTML. [17:59:02] Shane McCarron: Yeah. It makes some things a lot easier, and I don't know that they add that much to the grammar. On the other hand, it means we need to define what happens when datatype="rdf:XMLLiteral" is used [18:00:25] Manu Sporny: XMLLiteral is generated whenever we do datatype="rdf:XMLLiteral" or when we do this: <span property="foo">and then <em>mixed content</em></span> [18:00:48] Shane McCarron: right, and for mixed content [18:01:33] Manu Sporny: OR - we (I) can just suck it up and warn people that when reusing XMLLiteral content, that the content may be different depending on the type of parser that extracted the data and to not depend on the content for anything mission critical. [18:02:00] … Then we'd at least have an "I told you so" in the spec. [18:03:32] Shane McCarron: that's sort of what I said in a recent mail on the topic. those parsers would be non-conforming, but... yeah. [18:04:21] Manu Sporny: I think saying all Javascript/HTML5/DOM-based parsers are non-conforming would be a bad move. [18:04:38] Shane McCarron: yeah that's sort of an issue [18:05:00] Manu Sporny: We could say that XMLLiteral processing on the raw data stream is not required for conformance? [18:06:41] Shane McCarron: Doesnt solve the basic issue. different parsers return different triples from the same input. So, thoughts on this issue? -- manu -- Manu Sporny President/CEO - Digital Bazaar, Inc. blog: A Collaborative Distribution Model for Music http://blog.digitalbazaar.com/2009/04/04/collaborative-music-model/
Received on Tuesday, 26 May 2009 00:56:26 UTC