Re: How does RDF get extended to new datatypes?

* Sandro Hawke <sandro@w3.org> [2013-04-25 09:37-0400]
> On 04/24/2013 10:06 AM, Antoine Zimmermann wrote:
> >It seems to me that this problem is due to the removal of the
> >notion of datatype map. In 2004, applications could implement the
> >D-entailment they liked, with D being a partial mapping from IRI
> >to datatypes.
> >Now, there are just IRIs in D. The association between the IRI and
> >the datatype it should denote is completely unspecified. The only
> >indication that the application can have to implement a datatype
> >map is that XSD URIs must denote the corresponding XSD datatypes.
> >
> >I have troubles understanding why datatype maps should be removed.
> >I don't remember any discussions saying that they should be
> >changed to a set. This change, which now creates issues, suddenly
> >appear in RDF Semantics ED, with no apparent indication that it
> >was motivated by complaints about the 2004 design.
> >
> >Currently, I see a downside of having a plain set, as it does not
> >specify to what datatype the IRIs correspond to, while I do not
> >see the positive side of having a plain set. Can someone provide
> >references to evidence that this change is required or has more
> >advantages than it has drawbacks?
> >
> 
> You seem to have a very different usage scenario in mind than I do.
> 
> My primary use case (and I'm sorry I sometimes forget there are
> others) is the the situation where n independent actors publish data
> in RDF, on the web, to be consumed by m independent actors.   The n
> publishers each makes a choice about which vocabulary to use; the m
> consumers each get to see what vocabularies are used and then have
> to decide which IRIs to recognize.  There are market forces at work,
> as publishers want to be as accurate and expressive as possible, but
> they also want to stick to IRIs that will be recognized.  Consumers
> want to make use of as much data as possible, but every new IRI they
> recognize is more work, sometimes a lot more work, so they want to
> keep the recognized set small.
> 
> In this kind of situation, datatype IRIs are just like very other
> IRI; all the "standardization" effects are the same.   It's great
> for both producers and consumers if we can pick a core set of IRIs
> that producers can assume consumers will recognize.   Things also
> work okay if a closed group of producers and consumers agree to use
> a different set.   But one of the great strengths of RDF is that the
> set can be extended without a need for prior agreement.  A producer
> can simply start to use some new IRI, and consumers can dereference
> it, learn what it means, and change their code to recognize it.   Of
> course, it's still painful (details, details), but it's probably not
> as painful as switching to a new data format with a new media type.
> In fact, because it can be done independently for each class,
> property, individual, and datatype, and data can be presented many
> ways at once, I expect it to be vastly less painful.

In case it's helpful, a quick intro to SPARQL's extensible handling of
datatypes:

SPARQL basic graph patterns work with term equivalence for datatyped
literals as well as every other term. { ?s ?p "1"^^xs:integer }
matches anything in the graph which has an object with a lexical value
of "1" and a datatype of xs:integer. If your entailment regime allowed
you to parse <s1> <p1> "1"^^xs:byte and conclude <s1> <p1>
"1"^^xs:integer, that's fine, but SPARQL doesn't demand that (nor does
the test suite encourage it). In this regard, any extensibility comes
from an entailment regime which infers new triples from old ones.

For FILTERs, SPARQL has a table of stuff you MUST do in order to call
yourself conformant <http://www.w3.org/TR/sparql11-query/#OperatorMapping>.
The ordering of the table says that if you see
  FILTER ("ii"^^my:romanNumeral = "2"^^xs:integer)
that you scan down the types supported for = and say "is it a
numeric?, is it a dateTime, ..." with the last resort at the bottom
where the A=B operator compares two RDF terms.

The basic implementation must understand a prescribed set of XSD
datatypes and all of the operators in the Operating Mapping.
Extensions may extend the set of things they recognize as numerics
(and thus can compare to other numerics) or even add rows to the
Operator Mapping (e.g. supports for dates as well as dateTimes).

Operators like <, >, *, +, etc. have a sort of monotonic behavior in
that if I ask if FILTER("ii"^^my:romanNumeral<"iii"^^my:romanNumeral)
in a basic implementation the filter throws an error and effectively
fails. If I ask !("ii"^^my:romanNumeral<"iii"^^my:romanNumeral), I
also fail so for these operators, you *generally* get strictly more
solutions in an extended implementation than in a basic one.


> So, given this usage scenario, I can't see how D helps anybody
> except as a shorthand for saying "the IRIs which are recognized as
> datatype identifiers".
> 
> Pat, does this answer the question of how RDF gets extended to a new
> datatype?    I'm happy to try to work this through in more detail,
> if anyone's interested.
> 
>      -- Sandro
> 
> 
> >
> >AZ.
> >
> >Le 24/04/2013 05:09, Pat Hayes a écrit :
> >>I think we still have a datatype issue that needs a little thought.
> >>
> >>The D in D-entailment is a parameter. Although RDF is usually treated
> >>as having its own special datatypes and the compatible XSD types as
> >>being the standard D, it is quite possible to use RDF with a larger D
> >>set, so that as new datatypes come along (eg geolocation datatypes,
> >>or time-interval datatypes, or physical unit datatypes, to mention
> >>three that I know have been suggested) and, presumably, get canonized
> >>by appropriate standards bodies (maybe not the W3C, though) for use
> >>by various communities, they can be smoothly incorporated into RDF
> >>data without a lot of fuss and without re-writing the RDF specs.
> >>
> >>Do we want to impose any conditions on this process? How can a reader
> >>of some RDF know which datatypes are being recognized by this RDF?
> >>What do we say about how to interpret a literal whose datatype IRI
> >>you don't recognize? Should it be OK to throw an error at that point,
> >>or should it *not* be OK to do that? Shouid we require that RDF
> >>extensions with larger D's only recognize IRIs that have been
> >>standardly specified in some way? How would we say this?
> >>
> >>The current semantic story is that a literal
> >>"foo"^^unknown:datatypeIRI  is (1) syntactically OK (2) not an error
> >>but (3) has no special meaning and is treated just like an unknown
> >>IRI, ie it presumably denotes something, but we don't know what. Is
> >>this good enough?
> >>
> >>Pat
> >>
> >>------------------------------------------------------------ IHMC
> >>(850)434 8903 or (650)494 3973 40 South Alcaniz St.
> >>(850)202 4416   office Pensacola (850)202
> >>4440   fax FL 32502                              (850)291 0667
> >>mobile phayesAT-SIGNihmc.us http://www.ihmc.us/users/phayes
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >
> 
> 

-- 
-ericP

Received on Thursday, 25 April 2013 14:49:58 UTC