RE: RDF/XML does not support numeric terms [was: RE: 8 points for GRDDL (output other than RDF/XML?)]

On 2006-03-10, Mark Birbeck wrote:

> Misha's point is that since the RDF/XML syntax relies on *QNames* to 
> represent URIs, and *QNames* do not allow a numeric part as the local 
> name, then RDF/XML does not support his schemes.

Oh. XML indeed doesn't support numeric local parts in QNames, while at 
the same time RDF/XML as a whole does support more or less arbitrary 
URIs. They just have to be expressed using attributes.

My problem is that I never came to think that anybody would want to go 
with numeric element names in the first place. The primary benefit of 
XML compared to plaintext or binary is that it has a regular, 
standardized, human-readable syntax, and a broad selection of tools 
available. It is easy to define, learn and use XML based markup 
languages for data that looks like a document or otherwise has a 
hierarchical, tree-like structure. Simple property-value metadata is 
like that, and translating it into good XML style leads to 
human-readable element and attribute names. Like general graphs, purely 
numeric data is excluded or has to be encoded out-of-line in attribute 
values, or at a higher level (e.g. in reference semantics).

As a whole RDF is built on a graph data model and refers to things by 
general URIs or association via blank nodes. It doesn't really translate 
too gracefully into human-parsable XML. I think dealing with purely 
numeric namespaces is just one instance of this more general difference 
in aims: QNames have to be nonnumeric because purely numeric data is not 
human-parsable while XML is supposed to be. Conversely, if you need to 
deal with primarily machine-parsable data like ISBN's or general, 
non-tree like graphs, trying to fit it into XML's model of 
documents/ordered trees might not be such a good idea. If you're going 
to be dealing with data like that, you've already gone beyond 
human-readability, and so you might as well go all the way and ditch the 
document-like syntax. That leads to RDF/XML documents encoded entirely 
using rdf:about and rdf:resource, without striping deeper than the first 
level, or perhaps to the use of alternative syntaxes like N3 and 
N-Triples.

If you really, really want to have it both ways, I think a better way to 
go about it would be to create another set of valid, descriptive QNames 
to be used with the striped RDF/XML syntax, and to programmatically 
equate those with identifiers from the purely numeric namespace, 
out-of-band. This is what ISO did with their OID hierarchy (each branch 
in the numeric hierarchy can have a human-readable object descriptor, 
and their use is encouraged) and I think IPTC NewsCodes are a prime 
example of a namespace where adding such extra structure would be easy. 
This would lead to RDF/XML documents which are much easier to read and 
generate by hand than allowing numeric local parts in element names, or 
making complicated URIs easier to express and leaving it at that.

> It's one of the many examples of inappropriate use of QNames in 
> different languages, and was the whole reason I have proposed CURIEs 
> [1].

I don't want to sound like a troll, but... While CURIE's make sense in 
the current RDF/XML environment, I think the need to abbreviate URI's 
tells more about the lack of proper user interfaces where machine 
readable identifiers are hidden from the users, than about the possible 
shortcomings of any given machine parsable syntax for RDF.
-- 
Sampo Syreeni, aka decoy - mailto:decoy@iki.fi, tel:+358-50-5756111
student/math+cs/helsinki university, http://www.iki.fi/~decoy/front
openpgp: 050985C2/025E D175 ABE5 027C 9494 EEB0 E090 8BA9 0509 85C2

Received on Sunday, 12 March 2006 22:28:47 UTC