rdfms-literals-as-resources and rdfms-xmllang from Sampo Syreeni on 2001-07-02 (www-rdf-comments@w3.org from July to September 2001)

From: Sampo Syreeni <decoy@iki.fi>
Date: Mon, 2 Jul 2001 11:32:10 +0300 (EEST)
To: <www-rdf-comments@w3.org>
Message-ID: <Pine.SOL.4.30.0107021018340.18560-100000@kruuna.Helsinki.FI>

>Should literals be considered a type of resource, possibly "data:" URIs
>rather than a special case in the model?

I agree that special casing literals is messy. There are at least the issues
of literal type (XML? number? plaintext?), of literals being possibly
unbounded in length (e.g. in RSS), and of unstructured string literals
inviting use as difficult to handle BLOBs. All these make data-oriented RDF
applications difficult to implement.

However, I think using data: URIs is even worse. Forcing all literal data to
obey URI syntax and encoding rules is complicated and counter-productive.
Then, if one thinks about what happens when we need to extensively annotate
a data: URI (e.g. give it a type, a language, whatnot), it is easy to see
that the conceptual model will suffer data multiplication. Such models
become difficult to update (unlike URIs referencing data, data: URIs *are*
the data and are expected to change), bump into canonicalization issues
beyond those imposed by URIs used as names only, and in addition have all
the trouble current literals do. The simplification would be merely
conceptual, and would encourage what can be viewed as abuse of the triple
based data model.

In my mind the ideal solution would be to drop literals, and force actual
content to always be defined as external resources, perhaps referring back
to the concrete XML instance originating the triples by default. This would
leverage the existing content negotiation and typing framework present in
the Net, help solve rdfms-xmllang as well and lead to a completely uniform
data model since resources would now be completely divorced from the
reference based RDF model.

But as this would also force significant changes to M&S, would incur a
dereferencing penalty on the retrieval of simple string literals, would not
permit abbreviated syntax to be used with literals at all, and would have to
deal with how to reliably generate the URIs referencing what were previously
literal strings, my vote goes for keeping literals as they are.

As for rdfms-xmllang, I suggest that xml:lang be dropped from the model. One
should only be able to describe resources, not literals. Language data would
not be automatically derived from xml:lang, but would have to be explicitly
included in the model by a triple of type (:a hasLanguage :en), avoiding any
ambiguities. If the language facility is needed, one can then encode the
data as a separate XML resource, giving the language metadata inline via
xml:lang, or set up a URI reference and use a triple instead.

Sampo Syreeni, aka decoy, mailto:decoy@iki.fi, gsm: +358-50-5756111
student/math+cs/helsinki university, http://www.iki.fi/~decoy/front

Received on Monday, 2 July 2001 04:32:13 UTC