Re: heading toward datatyping telecon

Literals as subjects don't work because they are just
ambiguous strings.

E.g. consider the following simple example:

   "fi" <rdf:type> <urn:iso:3166_1> .
   "fi" <rdf:type> <urn:iso:3166_1> .

One defines a language, the other a country. Yet these
get merged into ambigous "knowledge" about the subject "fi".

What is it? A language or a country? Both? How does
one differentiate between where it denotes *only* the 
country or the language?

So, unfortunately, literals as subjects don't work (or at 
least don't scale reliably, beyond very small closed systems).

The problem is that so many folks are used to using strings
as values, relying on static, native understanding embodied
in the systems they use, that they keep trying to use strings
in their knowledge bases, when in fact, in most cases, they
should be using resources identified by URIs.

Thus rather than

   <urn:foo> dc:title [ rdf:value "Foo"; 
                        dc:language [ rdf:value "fi";
                                      rdf:type <urn:iso:3166_1>;
                                      rdf:label [ rdf:value "Finnish";
                                                  dc:language [ rdf:value
"en"; ... ] ];
                                      rdf:label [ rdf:value "Suomi";
                                                  dc:language [ rdf:value
"en"; ... ] ];
                                      ... ]
                      ].

(where '...' represents possible infinite recursion of "local"
specification of type due to the fact that literals are not
resources about which global knowledge can be defined...)

We should be doing stuff like

   <urn:foo> dc:title [ rdf:value "Foo"; dc:language
<voc://iso.ch/3166_1/fi> ].
   <voc://iso.ch/3166_1/fi> rdf:label [ rdf:value "Finnish";
<voc://iso.ch/3166_1/en> ].
   <voc://iso.ch/3166_1/fi> rdf:label [ rdf:value "Suomi";
<voc://iso.ch/3166_1/fi> ].

Representation as resources can be achieved for most, if not all, literals,
such that the use of actual RDF literals would remain only for actual
strings
to which no further interpretation is intended to be applied.

Whatever we work out with regards to typed data literals, let's also
perhaps try to find a solution that encourages more useful and
scalable knowledge representations -- particularly those employing
controlled vocabularies, and which will (hopefully) encourage folks
to move away from the problemmatic use of ambiguous strings.

Cheers,

Patrick

PS: BTW, the specification of the 'voc:' URI scheme for vocabularies, 
taxonomies, and codes will be published as an I-D shortly.

--
               
Patrick Stickler              Phone: +358 50 483 9453
Senior Research Scientist     Fax:   +358 7180 35409
Nokia Research Center         Email: patrick.stickler@nokia.com

Received on Monday, 5 November 2001 04:52:01 UTC