- From: Sean B. Palmer <sean@mysterylights.com>
- Date: Sat, 14 Jun 2003 19:08:04 +0100
- To: <www-rdf-interest@w3.org>
[+BCC to Tim Bray] Tim Bray has again brought up the age old debate about the inadequacy of RDF/XML, this time by linking [1] to yet another person who so openly slams RDF/XML [2] without, as far as I know, following the old "don't criticize if you can't do any better" maxim. Bray has, of course, tried his hand out at an alternate RDF XML serialization, RPV [3]. RPV has some pretty major shortcomings, in my opinion. For example, one can't use QNames as an abbreviation method; the best one can do is to provide a "base" for subjs/preds/objts. It also doesn't seem to contain any facility for using bNodes--correct me if I'm wrong. I've previously tried to come up with alternate serializations myself, notably BSWL [4], and N3-in-XML [5], but this time I wanted to try a different approach. I believe that N-Triples is a good starting point for any serialization due to its extraordinary level of parseability. It is not, however, easy to author (no QNames, one triple per line), and nor is it based on XML, which indicates to me that it is unlikely ever to progress from being a simple RDFCore WG test format to something used on a wider scale. So this is a proposal to enrich N-Triples using XML. At the basic level, XENT (an obvious but chance acronym) is very much like N-Triples, with very minor XMLification. A <Graph> element is used to wrap an entire document, upon which namespaces can be declared. URIs use an 'URI syntax (prefixed with an apostrophe) now instead of <URI>, since <URI> would obviously be illegal in XML. Each triple is wrapped in a <t> element, and there is no longer any need for the trailing period that was previously used for backwards compatiblity with N3. Line breaks can be added at will since <t> is, instead of newlines, used to delimit triples. <Graph xmlns="@@"> <t>'http://example.org/ 'http://example.org/#author 'http://example.org/#bob </t> </Graph> QNames are allowed in place of URIs. You just write these in the actual text themselves--example coming up. (Aside: I expect that the major criticism of this format will be its lack of recourse to innate XML machinery for expressing the various parts of the triples; more on why I believe that this is actually a *benefit* later on). bNodes are represented using a $label syntax--this keeps parsing costs down, and eliminates the _: prefix hack. Literals are now wrapped in an <s> element. Example:- <Graph xmlns="@@" xmlns:ex="http://example.org/stuff/1.0/" > <t>'http://www.w3.org/TR/rdf-syntax-grammar ex:editor $Dave</t> <t>$Dave ex:fullName <s>Dave Beckett</s></t> <t>$Dave ex:homePage 'http://purl.org/net/dajobe/</t> </Graph> The last bit of syntax to introduce are the <properties> and <objects> elements. Consider this N-Triples graph:- _:Sean <...#name> "Sean B. Palmer" . _:Sean <...#homepage> <http://purl.org/net/sbp/> . _:Sean <...#nick> "sbp" . The subject is repeated quite a lot. Using a <properties> element, one can basically reduce the repetition. <Graph xmlns="@@" xmlns:foaf="http://xmlns.com/foaf/0.1/" /> <t>$Sean <properties> foaf:name <s>Sean B. Palmer</s> foaf:homepage 'http://purl.org/net/sbp/ foaf:nick <s>sbp</s> </properties> </t> </Graph> I think that this is highly readable, writable, and parseable. In actual fact, even the non-abbreviated syntax isn't so bad for that particular example (note that I've added an example of the <objects> element to this one, too):- <Graph xmlns="@@" xmlns:foaf="http://xmlns.com/foaf/0.1/" /> <t>$Sean foaf:name <s>Sean B. Palmer</s></t> <t>$Sean foaf:homepage 'http://purl.org/net/sbp/</t> <t>$Sean foaf:nick <objects><s>sbp</s> <s>SeanP</s></objects></t> </Graph> For lots of pred/objt repetition, though, <properties> and <objects> will be useful. Here's another quick example:- <Graph xmlns="@@" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:ex="http://example.org/stuff/1.0/" > <t>'http://www.w3.org/TR/rdf-syntax-grammar <properties> dc:title <s>RDF/XML Syntax Specification (Revised)</s> ex:editor $Dave </properties> </t> <t>$Dave ex:fullName <s>Dave Beckett</s></t> <t>$Dave ex:homePage 'http://purl.org/net/dajobe/</t> </Graph> Of course, one might be led into believing that datatyping all of the tokens with <uri> <bNode> and <literal> elements and using elements for QNames would be easier on parsers, but I challenge anyone raising this criticism to actually *prove* that that is the case. If I receive positive feedback on this serialization attempt (though I don't particularly expect it...) I may attempt to put my money where my mouth is, as it were, and write a parser. In the meantime, my rationalization is that XML parsers tend to be in languages that can cope with a little string munging: all one has to do is make sure that it is possible to:- * Keep a list of the namespaces declared, and their short names (both XSLT and any XML parser worth its salt can do this) * Be able to tokenize strings splitting on whitespace (easy programming task) * Be able to datatype based on whether a token starts with "$" or "'", and get the substring from [1:] if it's not a QName, and split on the colon and get the mapping to the URI otherwise (a bit of work, but I'm sure that this is possible in XSLT and it's obviously laughably easy in Perl, Python, Java, C, C++, etc.) That's it. There are some issues, but they're mainly just todos. * Internationalization. Probably can inherit most of the solutions from N-Triples. * Datatypes and lang on literals. <s lang="en">string</s> and <s dt="datatypeURI">string</s> perhaps. * XML literals. Perhaps <x> should preserve XML literals, and everything else has to get flattened to text. Or! Perhaps <s> should be an XML literal, and people can use <![CDATA[]]> to flatten anything down should they need to. Tricky. * Collections? Refication? Shouldn't be too hard to add. <t> could be used as an object/subject for refication, perhaps, though then you can't give it an id (unless you add an attribute to the <t> element, perhaps). * No more truly blank nodes. Does this even really matter? * The 'URI syntax trick could be eliminated by saying that any QName/URI things whose prefixes have been declared using XML namespaces are QNames, and anything else is a URI, but that's horrid, and it's only one character. This is just a quick sketch and I don't have many free cycles with which to work on it, but I'll try to contribute to any resultant thread as much as I can. Comments are most welcome, of course. Thanks, [1] http://www.tbray.org/ongoing/When/200x/2003/06/13/SemWeb [2] http://www-uk.hpl.hp.com/people/marbut/isTheSemanticWebHype.pdf [3] http://www.textuality.com/xml/RPV.html [4] http://infomesh.net/2001/07/bswl/ [5] http://lists.w3.org/Archives/Public/www-rdf-interest/2002Mar/0128 -- Sean B. Palmer, <http://purl.org/net/sbp/> "phenomicity by the bucketful" - http://miscoranda.com/
Received on Saturday, 14 June 2003 14:08:15 UTC