Re: How does RDF get extended to new datatypes?

A custom D-entailment, like any entailment regime, requires 
pre-programming. E.g., an RDF system does not adapt to RDFS-entailment 
without having pre-programmed RDFS entailment rules.

It must be understood that for each datatype map D, D-entailment is a 
different entailment regime that requires its own pre-programmed processing.

The way I see RDF 1.0 Semantics is as follow:

1) there are 3 entailmens regimes that are fully defined by the spec 
(Simple, RDF and RDFS).

2) there is a family of entailment regimes that are not completely 
defined by the spec but that can simply be defined by providing a 
datatype map (that is, by providing fixed interpretations in the form of 
datatypes for a given set of IRIs).


What 2) provides could be changed to a more general notion of 
"customised entailment regimes" where any of the given regimes could be 
extended by specifying constraints on certain IRIs.

For instance, I could define RDFS+graph where I give special meaning to 
http://ex.com/Graph, http://ex.com/subGraphOf, http://ex.com/emptyGraph, 
etc. with special constraints on the interpretation of these IRIs.

But in any case, it seems to me obvious that any such extension cannot 
be defined *solely* by the *set* of "recognised" IRIs. At some point, 
one needs to provide the constraints explicitly.


An example of custom datatype found in the wild is Virtuoso's Geometry:

http://www.openlinksw.com/schemas/virtrdf#Geometry

It maps to a datatype where the lexical space is of the form POINT(<lat> 
<long>), with <lat> and <long> are decimals, between -90 and +90 for 
<lat> and between -180 and +180 for long. The value space is the set of 
geographic point on the idealised Earth sphere. The L2V mapping is the 
obvious one.

Note that the URL http://www.openlinksw.com/schemas/virtrdf#Geometry 
leads to a 404, so Linked Data intuitions would not work for this one.


AZ.


Le 26/04/2013 15:42, Andy Seaborne a écrit :
> Eric's description covers what I have seen most of - an unknown datatype
> URI is treated as an RDF term not a value (Pat's "no special meaning").
>
> I have not seen a datatype map in the wild in machien readable form and
> hence not encountered an RDF processor that adapts to a new datatype
> without pre-programming.
>
>      Andy
>
> On 26/04/13 12:34, Sandro Hawke wrote:
>> Okay, that makes sense.   I'm ambivalent.   On the one hand, I prefer
>> the linked data approach (any real spec is going to be built on
>> references to other specs; why not make those references machine
>> readable?) but on the other hand I see your point that we shouldn't
>> change things from the 2004 spec without more evidence than has been
>> presented here.
>>
>>           -- Sandro
>>
>> On 04/26/2013 03:36 AM, Antoine Zimmermann wrote:
>>> Le 26/04/2013 00:41, Sandro Hawke a écrit :
>>>> I think you're saying that in the 2004 semantics, one can't just say
>>>>
>>>>     (a) In addition to some XSD stuff, this system also implements
>>>>     datatype http://example.org/daterange
>>>
>>> "http://example.org/daterange" is an IRI, not a datatype. So what
>>> datatype does this system implements? Intuitively, this should mean
>>> that it implements the datatype identified by
>>> "http://example.org/daterange". But what this datatype is? It's not an
>>> XSD datatype, so the standards do not say. There is no datatype map
>>> given, so my RDF-2004 assumptions do not allow me to decide. Still, I
>>> have my Linked-data assumptions that tell me I just have to look up
>>> and figure out. All right, let's do that. This URL is redirecting to
>>> http://example.iana.org/, which does not tell me anything useful.
>>> So your entailment regime is incompletely defined.
>>>
>>>>
>>>> Instead, by the 2004 spec, one has to say:
>>>>
>>>>     (b) In addition to some XSD stuff, this system also implements
>>>>     datatype http://example.org/daterange as meaning the datatype such
>>>>     that the the value space is all pairs of time instants, with the
>>>>     first element < the second elment, and the lexical space which is
>>>>     the concatenation of two elements from the lexical space of
>>>>     xs:dateTime, separated by a "..", and the mapping between the
>>>> two is
>>>>     such that....etc, etc.
>>>>
>>>> Is that right?
>>>>> That's pretty much it. This could take another form, and informed
>>> Linked Data specialists would take care that the IRI dereferences to a
>>> description of the datatype, which would suffice as a way to indicate
>>> what the IRI maps to (that is, in practice, D *can* be specified by
>>> simply providing the set of IRIs and indicating that the actual
>>> datatype is described in the document to which the IRIs dereference to).
>>>
>>>
>>>    And Pat's proposal would make it so people would be
>>>> saying (a) instead of (b)?
>>>>
>>>>        -- Sandro
>>>>
>>>>
>>>>
>>>> On 04/25/2013 11:05 AM, Antoine Zimmermann wrote:
>>>>>
>>>>> Le 25/04/2013 15:37, Sandro Hawke a écrit :
>>>>>> On 04/24/2013 10:06 AM, Antoine Zimmermann wrote:
>>>>>>> It seems to me that this problem is due to the removal of the notion
>>>>>>> of datatype map. In 2004, applications could implement the
>>>>>>> D-entailment they liked, with D being a partial mapping from IRI to
>>>>>>> datatypes.
>>>>>>> Now, there are just IRIs in D. The association between the IRI and
>>>>>>> the
>>>>>>> datatype it should denote is completely unspecified. The only
>>>>>>> indication that the application can have to implement a datatype map
>>>>>>> is that XSD URIs must denote the corresponding XSD datatypes.
>>>>>>>
>>>>>>> I have troubles understanding why datatype maps should be removed. I
>>>>>>> don't remember any discussions saying that they should be changed
>>>>>>> to a
>>>>>>> set. This change, which now creates issues, suddenly appear in RDF
>>>>>>> Semantics ED, with no apparent indication that it was motivated by
>>>>>>> complaints about the 2004 design.
>>>>>>>
>>>>>>> Currently, I see a downside of having a plain set, as it does not
>>>>>>> specify to what datatype the IRIs correspond to, while I do not see
>>>>>>> the positive side of having a plain set. Can someone provide
>>>>>>> references to evidence that this change is required or has more
>>>>>>> advantages than it has drawbacks?
>>>>>>>
>>>>>>
>>>>>> You seem to have a very different usage scenario in mind than I do.
>>>>>
>>>>> I do not have any scenario or use case in mind. In RDF 1.0, given an
>>>>> entailment regime and a set of triples, it was possible to determine
>>>>> what are the valid entailments and what are non-entailments wrt the
>>>>> given regime, regardless of anybody's usage scenario. In particular,
>>>>> given a datatype map D, anybody who's given a set of triples and use
>>>>> D-entailment regime would derive exactly the same triples because the
>>>>> D is saying how to interpret the datatype IRIs. It is not related to
>>>>> scenarios or use case.
>>>>>
>>>>> In the current RDF Semantics, if you have a D, you just know what IRIs
>>>>> are recognised as datatypes, but you have no indication about what
>>>>> datatypes they denote. So, say D = {http://example.com/dt}, it is not
>>>>> possible to know what the following triple entails:
>>>>>
>>>>>  [] <http://ex.com/p> "a"^^<http://example.com/dt> .
>>>>>
>>>>> To be able to entail anything from it, you would need to know to what
>>>>> datatype the IRI maps to. That's why we need somewhere, somehow, a
>>>>> mapping. And the mapping is affecting the entailment regime, so it
>>>>> makes sense to have it as a parameter of the regime.
>>>>>
>>>>> This is very different from the case where an application is making a
>>>>> certain usage of an IRI. For instance, displaying instances of
>>>>> foaf:Person in a certain way in a webpage does not change anything the
>>>>> the conclusions you can normatively draw from the set of triples in
>>>>> any entailment regime.
>>>>>
>>>>>
>>>>>> My primary use case (and I'm sorry I sometimes forget there are
>>>>>> others)
>>>>>> is the the situation where n independent actors publish data in
>>>>>> RDF, on
>>>>>> the web, to be consumed by m independent actors.   The n publishers
>>>>>> each
>>>>>> makes a choice about which vocabulary to use; the m consumers each
>>>>>> get
>>>>>> to see what vocabularies are used and then have to decide which
>>>>>> IRIs to
>>>>>> recognize.  There are market forces at work, as publishers want to
>>>>>> be as
>>>>>> accurate and expressive as possible, but they also want to stick to
>>>>>> IRIs
>>>>>> that will be recognized.  Consumers want to make use of as much
>>>>>> data as
>>>>>> possible, but every new IRI they recognize is more work, sometimes
>>>>>> a lot
>>>>>> more work, so they want to keep the recognized set small.
>>>>>>
>>>>>> In this kind of situation, datatype IRIs are just like very other
>>>>>> IRI;
>>>>>> all the "standardization" effects are the same.
>>>>>
>>>>> That would be true if we did not have the D-entailment machinery.
>>>>> Applications can apply specific treatments to specific IRIs, including
>>>>> datatype IRIs (for instance, display dates using French conventions).
>>>>> But if we introduce the D-entailment regime, it means we want to
>>>>> impose more precise constraints on how to interpret the IRIs (that is,
>>>>> more than just "I recognise this set of IRIs").
>>>>>
>>>>>>   It's great for both
>>>>>> producers and consumers if we can pick a core set of IRIs that
>>>>>> producers
>>>>>> can assume consumers will recognize.   Things also work okay if a
>>>>>> closed
>>>>>> group of producers and consumers agree to use a different set. But
>>>>>> one
>>>>>> of the great strengths of RDF is that the set can be extended
>>>>>> without a
>>>>>> need for prior agreement.  A producer can simply start to use some
>>>>>> new
>>>>>> IRI, and consumers can dereference it, learn what it means, and
>>>>>> change
>>>>>> their code to recognize it.   Of course, it's still painful (details,
>>>>>> details), but it's probably not as painful as switching to a new data
>>>>>> format with a new media type.   In fact, because it can be done
>>>>>> independently for each class, property, individual, and datatype, and
>>>>>> data can be presented many ways at once, I expect it to be vastly
>>>>>> less
>>>>>> painful.
>>>>>
>>>>> What you say is perfectly true and I agree with it wholeheartedly.
>>>>> However, I do not think it is relevant to the D-entailment debate (or
>>>>> maybe only marginally).
>>>>>
>>>>>
>>>>>> So, given this usage scenario, I can't see how D helps anybody
>>>>>> except as
>>>>>> a shorthand for saying "the IRIs which are recognized as datatype
>>>>>> identifiers".
>>>>>
>>>>> In 2004, it says more: it says "These are the datatype IRIs of my
>>>>> custom D-entailment regime, and these non-XSD datatype IRIs are
>>>>> interpret in this way, according to these datatypes". It could be done
>>>>> independently of the D-entailment machinery, in the internal
>>>>> specificities of an application, but having it in the standard allows
>>>>> one to refer to the normative mechanism.
>>>>>
>>>>>>
>>>>>> Pat, does this answer the question of how RDF gets extended to a new
>>>>>> datatype?    I'm happy to try to work this through in more detail, if
>>>>>> anyone's interested.
>>>>>
>>>>> So, to summarise what I understand about your position, you say that
>>>>> the D-entailment machinery isn't that much useful at all, or only in a
>>>>> weak version of it. Fair enough. As I said during the meeting, I'm not
>>>>> resisting strongly to the change but in general, I am reluctant to
>>>>> make any change to a standard that is not motivated by clear evidence
>>>>> that it improves the existing situation. If any criticism arises from
>>>>> our design of D-entailment, it is far easier to justify a no-change
>>>>> ("we want to keep backward compatibility, persistence of definitions,
>>>>> avoid changes to implementations, etc") rather than a change.
>>>>>
>>>>>
>>>>> AZ.
>>>>>
>>>>>>
>>>>>>       -- Sandro
>>>>>>
>>>>>>
>>>>>>>
>>>>>>> AZ.
>>>>>>>
>>>>>>> Le 24/04/2013 05:09, Pat Hayes a écrit :
>>>>>>>> I think we still have a datatype issue that needs a little thought.
>>>>>>>>
>>>>>>>> The D in D-entailment is a parameter. Although RDF is usually
>>>>>>>> treated
>>>>>>>> as having its own special datatypes and the compatible XSD types as
>>>>>>>> being the standard D, it is quite possible to use RDF with a
>>>>>>>> larger D
>>>>>>>> set, so that as new datatypes come along (eg geolocation datatypes,
>>>>>>>> or time-interval datatypes, or physical unit datatypes, to mention
>>>>>>>> three that I know have been suggested) and, presumably, get
>>>>>>>> canonized
>>>>>>>> by appropriate standards bodies (maybe not the W3C, though) for use
>>>>>>>> by various communities, they can be smoothly incorporated into RDF
>>>>>>>> data without a lot of fuss and without re-writing the RDF specs.
>>>>>>>>
>>>>>>>> Do we want to impose any conditions on this process? How can a
>>>>>>>> reader
>>>>>>>> of some RDF know which datatypes are being recognized by this RDF?
>>>>>>>> What do we say about how to interpret a literal whose datatype IRI
>>>>>>>> you don't recognize? Should it be OK to throw an error at that
>>>>>>>> point,
>>>>>>>> or should it *not* be OK to do that? Shouid we require that RDF
>>>>>>>> extensions with larger D's only recognize IRIs that have been
>>>>>>>> standardly specified in some way? How would we say this?
>>>>>>>>
>>>>>>>> The current semantic story is that a literal
>>>>>>>> "foo"^^unknown:datatypeIRI  is (1) syntactically OK (2) not an
>>>>>>>> error
>>>>>>>> but (3) has no special meaning and is treated just like an unknown
>>>>>>>> IRI, ie it presumably denotes something, but we don't know what. Is
>>>>>>>> this good enough?
>>>>>>>>
>>>>>>>> Pat
>>>>>>>>
>>>>>>>> ------------------------------------------------------------ IHMC
>>>>>>>> (850)434 8903 or (650)494 3973 40 South Alcaniz St.
>>>>>>>> (850)202 4416   office Pensacola (850)202
>>>>>>>> 4440   fax FL 32502 (850)291 0667
>>>>>>>> mobile phayesAT-SIGNihmc.us http://www.ihmc.us/users/phayes
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>>
>>
>>
>
>

-- 
Antoine Zimmermann
ISCOD / LSTI - Institut Henri Fayol
École Nationale Supérieure des Mines de Saint-Étienne
158 cours Fauriel
42023 Saint-Étienne Cedex 2
France
Tél:+33(0)4 77 42 66 03
Fax:+33(0)4 77 42 66 66
http://zimmer.aprilfoolsreview.com/

Received on Friday, 26 April 2013 14:17:52 UTC