Re: Typed literals: current status from Patrick Stickler on 2002-10-22 (w3c-rdfcore-wg@w3.org from October 2002)

From: Patrick Stickler <patrick.stickler@nokia.com>
Date: Tue, 22 Oct 2002 09:52:04 +0300
To: "Dan Connolly" <connolly@w3.org>, "ext Brian McBride" <bwm@hplb.hpl.hp.com>
Cc: "Jeremy Carroll" <jjc@hplb.hpl.hp.com>, <w3c-rdfcore-wg@w3.org>
Message-ID: <004c01c27997$87bae3f0$c99316ac@NOE.Nokia.com>

[Patrick Stickler, Nokia/Finland, (+358 40) 801 9690, patrick.stickler@nokia.com]

----- Original Message ----- 
From: "ext Brian McBride" <bwm@hplb.hpl.hp.com>
To: "Dan Connolly" <connolly@w3.org>
Cc: "Jeremy Carroll" <jjc@hplb.hpl.hp.com>; <w3c-rdfcore-wg@w3.org>
Sent: 21 October, 2002 19:24
Subject: Re: Typed literals: current status

> I agree that doesn't look good either, which would suggest we drop the lang 
> tag from the abstract syntax, but that invalidates the Nokia data, which is 
> undesirable.
> 
> I'm suggesting the WG consider which is the least damaging option we 
> have.  Patrick: comment?

Well, our present usage is presuming value-based semantics for
inline literals, where we would have (using the most recent syntax):

   mars:shortLabel rdfs:range mars:Token .
   some:Resource mars:shortLabel "Foo"-en .
   some:Resource mars:shortLabel "Bar"-fi .

Now, since we can't do that, we're going to have to instead say

   some:Resource mars:shortLabel mars:Token"Foo"-en .

We need the language classification in order to differentiate
between different language variants, and we also need the
datatype to constrain values to valid tokens. It is not
acceptable to only say

   some:Resource mars:shortLabel "Foo"-en .

as the property value is a token, not an abitrary Unicode string.

Yes, theoretically, we could go with some other representation,
such as using a bnode, but that complicates all code which then
has to deal with alternate graph representations for language
qualified values versus non-language qualified values, rather
than just an optional lang suffix on the label which is much 
easier and more consistent to deal with.

Alternately, we could use reification and qualify each statement
for language, but again, that is a very heavy solution and
grossly redundant.

Ideally, RDF would have a proper scoping mechanism a'la XTM
for such things, but as it doesn't, the above approach seems
the most straightforward.

If it comes down to choosing between no lang tag and literals
denoting pairs of value + lang then I'd definitely opt for no 
lang tag on literals and keeping the typed literals denoting
values, as the pair interpretation simply breaks everything IMO.

Changing the software to deal with some alternate way of 
capturing the language qualification is not my worry. No, it's
not trivial, but it's not all that terrible either. But having
to change deployed content, and get numerous existing systems
that push metadata into centralized tools is going to be
*extremely* painful, if not impossible (those who work for
big companies will fully appreciate this). It's concievable
that the old-style RDF can be transformed into the new-style
RDF at various key points in the process, but most likely
our RDF (at the creation source at least) will remain persistently 
deviant.

To be quite frank, RDF over XML has not been an easy sell, and
given the distance that RDF is now diverging from the way we
do things (or visa versa) it looks probably that RDF will become 
unsellable for future incarnations of our present systems, at least 
at the more general level, even if future XML is mined into RDF 
for some purposes. 

But hey, you can't please everyone all of the time. I guess
we may have to get what we need elsewhere and relegate RDF
to the fringe rather than core of our processes.

So I guess the WG can omit lang tags from literals entirely.
It's looking like it won't matter to us one way or another.

Patrick

Received on Tuesday, 22 October 2002 02:52:10 UTC