XMLLiteral handling in RDFa in HTML

What follows is an offline conversation Shane and I had regarding
XMLLiterals in the RDFa in HTML spec. It concerns the "an RDFa parser
must generate the same triples across all HTML family languages" goal.

[17:32:39] Manu Sporny: I'm getting increasingly concerned over our
attempts to solve the XMLLiteral problem (in RDFa in HTML).
[17:33:27] … especially since browser DOM-based implementations are not
going to be able to do what raw-stream implementations can do.
[17:34:36] … specifically, requiring well-formedness.
[17:36:32] … I think we have two really nasty issues right now: The
first being xmlns + case sensitivity: both of which are fixed if we move
to @prefix and declare that prefix is always case-sensitive (and
implement the legacy case-insensitivity stuff for xmlns that we've been
talking about in the community over the past couple of days).
[17:36:52] … The second is XMLLiterals - to which I don't really have a
good solution...
[17:37:09] … They're just borked based on the current DOM implementations.
[17:37:12] Shane McCarron: we have no option about well formedness of
xml literals.  its a requirement
[17:37:18] … not our requirement.  RDF
[17:37:56] Manu Sporny: I agree - it's the implementation of the
extraction method of an XML Literal that I'm having a problem with.
[17:39:11] … especially since the extraction method varies greatly
between HTML4, XHTML and HTML5.
[17:40:05] Shane McCarron: I have no problem at all with just
eliminating XMLLiterals altogether.
[17:40:22] Manu Sporny: Did you see that <table>
<tr></tr><span>foobar</span><tr></tr></table> example that was outlined
on the mailing list and how html5lib handles that case?
[17:40:47] … Right, I'm in favor of eliminating XMLLiterals
completely... or replacing it with something like XMLCharacterSequence.
[17:41:25] … The application can attempt to do something with
XMLCharacterSequence to transform it into an XMLLiteral, but I certainly
don't think we should be doing anything with it.
[17:41:32] Shane McCarron: the example in the mailing list is a red
herring imho.
[17:41:36] Manu Sporny: although, that's a pretty huge change to the spec.
[17:41:38] … why?
[17:42:07] Shane McCarron: because it is invalid and we ONLY define
behavior for valid input.  I know that gives you hearburn, but I can't
help that
[17:42:24] … there are millions of error conditions that we would need
to document.  so we document none of them.
[17:42:51] Manu Sporny: No, I'm not a "document all the error
conditions" person - I think that's ridiculous.
[17:43:46] … My argument is that we can't, in good faith, expect that
this XMLLiteral thing is going to work.
[17:44:04] … because there is so much erroneous XML text out there.
[17:44:16] … and because of what DOM implementations do to the original
document.
[17:45:33] … XMLLiterals (and XMLCharacterSequences) are just flat out
not implementable in Javascript (to the same degree that they're
implementable in raw input data streams).
[17:45:43] … s/data streams/ data stream parsers/
[17:46:11] … In any case, I think we need to seriously re-think this
whole XMLLiteral thing...
[17:47:57] … That or provide an API for Javascript to get the raw
document content (which may already exist).
[17:49:11] Shane McCarron: would only help client side.  what about dom
based implementations server side or in the toolchain?
[17:49:35] Manu Sporny: yeah, you're right... that's still an issue...
[17:49:59] Shane McCarron: its a way bigger issue imho.  the interesting
part of the semantic web is NOT client side
[17:50:04] … at least not at this level
[17:50:05] Manu Sporny: which means XML Literals are very difficult to
do not only in the browser, but out of the browser as well.
[17:51:10] … The core of the issue is that it bothers me that we're
defining behavior for something we know to not be implementable in
DOM-based implementations.
[17:51:55] Shane McCarron: err... well, we didn't know that at the time.
 we defined this YEARS ago
[17:53:30] Manu Sporny: Right, I'm not throwing blame for past decisions
made - time lends clarity to things like this... but it feels like we're
trying to just put in support for XMLLiterals without looking at the
DOM-based implementation landscape first.
[17:53:56] … There's a strong argument for XML Literals, which is "if it
isn't well-formed XML, then it's not an XML Literal - so don't generate
a triple".
[17:55:03] … but then, you're never going to have the same sorts of
ill-formed XMLLiterals in tag-soup parsers or HTML5, since it'll
re-arrange the DOM to ensure a well-formed XML Literal (in some cases).
[17:56:20] … So now you'll have tagsoup/HTML5 DOM-based parsers
outputting a valid XML Literal when their non-DOM based, raw stream
parsers know better than to output the same invalid XML Literal.
[17:56:42] Shane McCarron: understood
[17:58:07] Manu Sporny: I think I just convinced myself to be strongly
against XMLLiterals in RDFa in HTML.
[17:59:02] Shane McCarron: Yeah.  It makes some things a lot easier, and
I don't know that they add that much to the grammar.  On the other hand,
it means we need to define what happens when datatype="rdf:XMLLiteral"
is used
[18:00:25] Manu Sporny: XMLLiteral is generated whenever we do
datatype="rdf:XMLLiteral" or when we do this: <span property="foo">and
then <em>mixed content</em></span>
[18:00:48] Shane McCarron: right, and for mixed content
[18:01:33] Manu Sporny: OR - we (I) can just suck it up and warn people
that when reusing XMLLiteral content, that the content may be different
depending on the type of parser that extracted the data and to not
depend on the content for anything mission critical.
[18:02:00] … Then we'd at least have an "I told you so" in the spec.
[18:03:32] Shane McCarron: that's sort of what I said in a recent mail
on the topic.  those parsers would be non-conforming, but... yeah.
[18:04:21] Manu Sporny: I think saying all Javascript/HTML5/DOM-based
parsers are non-conforming would be a bad move.
[18:04:38] Shane McCarron: yeah that's sort of an issue
[18:05:00] Manu Sporny: We could say that XMLLiteral processing on the
raw data stream is not required for conformance?
[18:06:41] Shane McCarron: Doesnt solve the basic issue.  different
parsers return different triples from the same input.

So, thoughts on this issue?

-- manu

-- 
Manu Sporny
President/CEO - Digital Bazaar, Inc.
blog: A Collaborative Distribution Model for Music
http://blog.digitalbazaar.com/2009/04/04/collaborative-music-model/

Received on Tuesday, 26 May 2009 00:56:25 UTC