- From: Mark Birbeck <mark.birbeck@x-port.net>
- Date: Sun, 4 Feb 2007 22:41:55 +0000
- To: public-rdf-in-xhtml-tf@w3.org
Hello all, I don't think the issue has been understood correctly, so I'll re-construct the thought processes I went through when working through the use of rdf:XMLLiteral, way, way back, in an early draft. I'm not at all suggesting that my solution is beyond dispute :), but if people want to change it, I think we all need to understand what the problems were that we were originally trying to address. The issue is not about which of "XML mark-up" or "strings" is the most common situation; I take it as given that 'strings' will be more common. :) The issue is essentially whether there is any need to distinguish between them, and if there isn't, whether we can use that fact to our advantage to make RDFa easier to author. I think it's important to state the problem this way round, since if there *is* a problem with always using XMLLiteral, then of course we can't do what I originally proposed! PLAIN VERSUS TYPED LITERALS To set the context, the first thing to remember is that in RDF, xsd:string is *not* the default datatype. In RDF, plain literals do not have *any* datatype. The word 'string' is being used very loosely in this discussion--from the subject line of the thread to comments added--and we need to be clear on what is being proposed. At first sight the lack of any type seems fine; after all, why should we worry that this: <div about=""> <h1 property="dc:title">RDFa Primer</h1> </div> can produce this: <> dc:title "RDFa Primer" . But unfortunately we _do_ need to worry. If we take, for example, Einstein's famous 1946 article on nuclear weapons, we would obviously mark it up as follows: <div about=""> <h1 property="dc:title"> E = mc<sup>2</sup>: The Most Urgent Problem of Our Time </h1> </div> We have to ask what would we *like* this mark-up to generate, and I think it's clear we'd want this: <> dc:title "E = mc<sup>2</sup>: The Most Urgent Problem of Our Time"^^rdf:XMLLiteral . But of course this is the crux of the problem; our preference for the first example was a plain literal, but our preference for the second was an XML literal, so we must now ask what it is that could 'trigger' this difference in parsing behaviour. PROPOSAL 1: ALL TEXT IS PLAIN LITERAL The first option is to say that actually there is no trigger, and that _all_ text should be treated as a plain literal unless the author says otherwise. So our example would produce this: <> dc:title "E = mc<sup>2</sup>: The Most Urgent Problem of Our Time" . To create our original triples, the author would make use of @datatype, and write this: <div about=""> <h1 property="dc:title" datatype="rdf:XMLLiteral"> E = mc<sup>2</sup>: The Most Urgent Problem of Our Time </h1> </div> At the time I was working on this I rejected this as probably the worst solution. :) My reasoning was simply that in examples such as this, the title is _already_ mark-up, since it originates from an XHTML document. The author clearly knows what they are doing, and so for them to have to repeat the fact that the title is mark-up is counter-intuitive, and breaks with the idea that we are 'decorating' XHTML, rather than fundamentally modifying it. PROPOSAL 2: ALL TEXT IS XSD:STRING The second option is also to say there is no trigger, but that instead of using plain literals, the data is automatically typed as an xsd:string: <> dc:title "E = mc<sup>2</sup>: The Most Urgent Problem of Our Time"^^xsd:string . Although this solves some use cases, as I'll discuss at the end it doesn't solve all, and I think we should be very careful with this. PROPOSAL 3: ALL TEXT IS XML LITERAL The third option--as we know, the one I actually went with--is to flip things round, and ask whether the ordinary string (or plain literal) couldn't be represented by an rdf:XMLLiteral? So this: <div about=""> <h1 property="dc:title">RDFa Primer</h1> </div> parses as this: <> dc:title "RDFa Primer"^^rdf:XMLLiteral . In other words, the 'trigger' to create an rdf:XMLLiteral is any use of @property where the object of the statement appears in *mark-up*. There is a strong logic to this. First, the object _really has_ appeared in mark-up. But second, at the level of XML itself, it is not a problem that we don't have any 'tags' surrounding our text, since (as XSLT makes great use of), "RDFa Primer" is XML as much as "<div>42</div>" is. For those not familiar with this idea, I'll explain. Most people are probably familiar with XSLT, so we'll use that to illustrate. When XSLT 'outputs' XML, it creates 'external general parsed entities', which are defined as: [78] extParsedEnt ::= TextDecl? content The key definition for us here is that of 'content', appearing after the optional TextDecl: [43] content ::=bCharData? ((element | Reference | CDSect | PI | Comment) CharData?)* This covers all the 'atoms' of XML, such as elements, character data, comments, processing instructions, and so on. In other words, the output of an XSLT process does not have to be a full XML document, with only one root node, etc. It could be a string, a comment, a processing instruction, an element, a list of elements, an element followed by a comment followed by an element...you get the picture. I've used XSLT to illustrate the concept, since that is probably what many are familiar with, but much closer to home the RDF Concepts document talks of rdf:XMLLiteral in *exactly* this way. The document links to production 43--the production I quoted above--which means that the definition of XML literals in RDF is _already_ that it is not just an XML element, but that it can be any of the 'atoms' of XML--strings, comments, PIs, nodelists, etc. More significantly for our discussion, the RDF Concepts document has this note: Note: RDF applications may use additional equivalence relations, such as that which relates an xsd:string with an rdf:XMLLiteral corresponding to a single text node of the same string. (See the end of section 5.1.) What I had in mind was that some server storing the data as triples would somehow 'augment' the rdf:XMLLiteral data type to include something more specific; at least xsd:string, but perhaps also xsd:date, xsd:integer, and so on. I'll come back to this 'casting' or post-processing in a moment, but the main point is that there is a strong argument for saying that any data that originates from an XHTML document is *by definition* an EGPE, and therefore at the very least cannot be a plain literal (and so #1 is out). I'd also argue that we should be wary of making the default xsd:string since once done it can't be 'undone'. I don't have time to develop this point now, but at root is the fact that in XML Schemas, an xsd:integer is *not* derived from xsd:string. (A vote against #2.) NOTE: Just to tie up all loose ends, for the author who _wants_ plain literals--i.e., no datatype at all--the original proposal contained the idea that @content should provide 'non-typed' literals: <meta property="dc:title" content="RDFa Primer" /> <> dc:title "RDFa Primer" . The rationale was that attributes can't contain mark-up anyway, so @content could never contain an XMLLiteral. FINALLY...SPARQL So, now we've looked at the question from the point of view of the mark-up, we should look at the problem raised by Ivan concerning SPARQL. The main point made is that by using rdf:XMLLiteral queries don't always match correctly. However, I don't think that choosing plain literals or xsd:strings over rdf:XMLLiterals will necessarily solve the problem Ivan is seeing, and I would suggest that in situations where you are querying data that you have no control over, the str() function should generally be used. (I'd also be interested to double-check whether the behaviour seen is correct in relation to SPARQL itself, but I'll have to look at that later.) CONCLUSION The ideal solution in my view, is that we stick to rdf:XMLLiterals, but at some stage in the processing some level of augmentation takes place, and data that is identifiable as an XML Schema simple type is typed as such. This step could be carried out on the server that is storing the data into a triple store, but it might be possible to define the necessary regular expressions to incorporate this step into the RDFa specification. Regards, Mark On 02/02/07, Wing C Yung <wingyung@us.ibm.com> wrote: > > Just wanted to chime in on the following, if it's not too late: > > http://lists.w3.org/Archives/Public/public-rdf-in-xhtml-tf/2007Jan/0017 > > > I am inclined to agree with you on the default datatype: it should just > > be a string, except if you really want some XML. What do others think? > > > > -Ben > > We (our Semantic Web group here at IBM Cambridge) agree that it should be a > string since this almost certainly going to be the common case. In our use > of RDFa, we always want strings. XMLLiterals should be specified with the > datatype attribute. > > Wing Yung > Internet Technology > wingyung@us.ibm.com > 617.693.3763 > > > > -- Mark Birbeck CEO x-port.net Ltd. e: Mark.Birbeck@x-port.net t: +44 (0) 20 7689 9232 w: http://www.formsPlayer.com/ b: http://internet-apps.blogspot.com/ Download our XForms processor from http://www.formsPlayer.com/
Received on Sunday, 4 February 2007 22:42:12 UTC