Re: How does RDF get extended to new datatypes? from Thomas Baker on 2013-04-29 (public-rdf-wg@w3.org from April 2013)

From: Thomas Baker <tom@tombaker.org>
Date: Mon, 29 Apr 2013 09:18:20 -0400
To: Sandro Hawke <sandro@w3.org>
Cc: Pat Hayes <phayes@ihmc.us>, W3C RDF WG <public-rdf-wg@w3.org>
Message-ID: <20130429131820.GA56627@julius>
On Sun, Apr 28, 2013 at 04:52:30PM -0400, Sandro Hawke wrote:
> >Here's an example. DCMI declares twelve URIs as rdf:type rdfs:Datatype.  In
> >DCMI terminology, the following are URIs for "Syntax Encoding Schemes" [1].
> >
> >     http://purl.org/dc/terms/Box
> >     http://purl.org/dc/terms/ISO3166
> >     http://purl.org/dc/terms/ISO639-2
> >     http://purl.org/dc/terms/ISO639-3
> >     http://purl.org/dc/terms/Period
> >     http://purl.org/dc/terms/Point
> >     http://purl.org/dc/terms/RFC1766
> >     http://purl.org/dc/terms/RFC3066
> >     http://purl.org/dc/terms/RFC4646
> >     http://purl.org/dc/terms/RFC5646
> >     http://purl.org/dc/terms/URI
> >     http://purl.org/dc/terms/W3CDTF
> >
> >ISO3166, for example, is defined as "The set of codes listed in ISO 3166-1 for
> >the representation of names of countries."
> >
> >Most of these twelve URIs date from 2000 [2]. The ones coined after 2000 were
> >for updated versions of the ISO and RFC specifications. If I correctly recall,
> >the idea of saying that these are RDFS datatypes was first proposed in circa
> >2002 by Eric Miller.  In the mid 2000s, the DCMI Usage Board reviewed all of
> >the existing "encoding schemes" [3] to decide whether they represented
> >Vocabulary Encoding Schemes (which are something like SKOS Concept Schemes,
> >only without necessarily being expressed in SKOS or having URIs for individual
> >terms) or Syntax Encoding Schemes (the twelves listed above).
> >
> >At the time, we interpreted the ISO 3166 specification, for example, as
> >representing a lexical space (e.g., "AS", "AU"...), a value space ("American
> >Samoa", "Australia"...), and a lexical-to-value mapping ("AS" = "American
> >Samoa", as specified in [4]).
> 
> Interesting.    Kind of an aside: why use datatypes instead of just
> properties?    Has it turned out to be better this way?   My
> understanding is that in RDF modeling, when you have an
> (inverse-functional) mapping from something to strings, you have to
> choose whether to call it a datatype or just have it be a property.
> My sense is that the only good time to make it a new datatype is if
> you're going to have hardcoded software support for it, as in many
> SPARQL  engines.
> 
> But maybe there's some other reason to use datatypes....?

A good question!  The reason is historical.  Dublin Core started out in 1995
with 13 "elements" (later 15).  RDF "properties" had not yet been invented.

In the late 1990s, the emphasis was on trying out the 15 elements in different
contexts and on moving from DC 1.0 to DC 1.1, but implementers wanted to
"qualify" an element with information that "specifies a context for the
interpretation of a given element" (from the report of the March 1997 workshop
where the notion of "qualifiers" was introduced [1]).

The idea was that given:

    X dc:subject  "China -- History"
    X dc:date     "1995-03-15"

...information could be added to say:

    "China -- History" is from the Library of Congress Subject Headings

    "1995-03-15" is formatted according to the W3C Date and Time Formats Specification

The idea was that this additional information could be ignored (or simply not
understood) by a consuming application and one would still be left with a
usable value, even if its interpretation would be less precise.  By analogy to
natural language, qualifiers were initially seen as "adjectives" that could
safely be dropped, leaving just the "nouns" (elements).

The set of qualifiers published in 2000 [2] distinguished two types of
qualifiers: "element refinements", which were expressed in the first RDF
schemas as subproperties -- an interpretation which came to supplant their
interpretation as "adjectives"; and "element encoding schemes", such as in the
examples above.

At the time, any proposal to turn encoding schemes into separate properties
would have met with strong resistance because of the focus on Core Elements.

It's not obvious to me why using user-defined datatypes -- datatypes without
widespread hard-coded software support, possibly pointing to pre-RDF,
Print-World specifications -- would not still be a reasonable way to tag
strings with useful (but ignorable) context.  For example, in order to
semanticize some messy, legacy, pre-RDF, string-based data as Linked Data,
would it not make more sense to tack on a few user-defined datatypes than to
coin (and use) alot of new properties?  An unknown datatype can more safely be
ignored than an unknown property.

Tom

[1] http://www.dlib.org/dlib/june97/metadata/06weibel.html
[2] http://dublincore.org/documents/usageguide/qualifiers.shtml

-- 
Tom Baker <tom@tombaker.org>
Received on Monday, 29 April 2013 13:18:58 UTC