Re: How does RDF get extended to new datatypes? from Andy Seaborne on 2013-04-26 (public-rdf-wg@w3.org from April 2013)

From: Andy Seaborne <andy.seaborne@epimorphics.com>
Date: Fri, 26 Apr 2013 14:42:46 +0100
To: public-rdf-wg@w3.org
Message-ID: <517A8456.6010003@epimorphics.com>
Eric's description covers what I have seen most of - an unknown datatype 
URI is treated as an RDF term not a value (Pat's "no special meaning").

I have not seen a datatype map in the wild in machien readable form and 
hence not encountered an RDF processor that adapts to a new datatype 
without pre-programming.

	Andy

On 26/04/13 12:34, Sandro Hawke wrote:
> Okay, that makes sense.   I'm ambivalent.   On the one hand, I prefer
> the linked data approach (any real spec is going to be built on
> references to other specs; why not make those references machine
> readable?) but on the other hand I see your point that we shouldn't
> change things from the 2004 spec without more evidence than has been
> presented here.
>
>           -- Sandro
>
> On 04/26/2013 03:36 AM, Antoine Zimmermann wrote:
>> Le 26/04/2013 00:41, Sandro Hawke a écrit :
>>> I think you're saying that in the 2004 semantics, one can't just say
>>>
>>>     (a) In addition to some XSD stuff, this system also implements
>>>     datatype http://example.org/daterange
>>
>> "http://example.org/daterange" is an IRI, not a datatype. So what
>> datatype does this system implements? Intuitively, this should mean
>> that it implements the datatype identified by
>> "http://example.org/daterange". But what this datatype is? It's not an
>> XSD datatype, so the standards do not say. There is no datatype map
>> given, so my RDF-2004 assumptions do not allow me to decide. Still, I
>> have my Linked-data assumptions that tell me I just have to look up
>> and figure out. All right, let's do that. This URL is redirecting to
>> http://example.iana.org/, which does not tell me anything useful.
>> So your entailment regime is incompletely defined.
>>
>>>
>>> Instead, by the 2004 spec, one has to say:
>>>
>>>     (b) In addition to some XSD stuff, this system also implements
>>>     datatype http://example.org/daterange as meaning the datatype such
>>>     that the the value space is all pairs of time instants, with the
>>>     first element < the second elment, and the lexical space which is
>>>     the concatenation of two elements from the lexical space of
>>>     xs:dateTime, separated by a "..", and the mapping between the two is
>>>     such that....etc, etc.
>>>
>>> Is that right?
>>
>> That's pretty much it. This could take another form, and informed
>> Linked Data specialists would take care that the IRI dereferences to a
>> description of the datatype, which would suffice as a way to indicate
>> what the IRI maps to (that is, in practice, D *can* be specified by
>> simply providing the set of IRIs and indicating that the actual
>> datatype is described in the document to which the IRIs dereference to).
>>
>>
>>    And Pat's proposal would make it so people would be
>>> saying (a) instead of (b)?
>>>
>>>        -- Sandro
>>>
>>>
>>>
>>> On 04/25/2013 11:05 AM, Antoine Zimmermann wrote:
>>>>
>>>> Le 25/04/2013 15:37, Sandro Hawke a écrit :
>>>>> On 04/24/2013 10:06 AM, Antoine Zimmermann wrote:
>>>>>> It seems to me that this problem is due to the removal of the notion
>>>>>> of datatype map. In 2004, applications could implement the
>>>>>> D-entailment they liked, with D being a partial mapping from IRI to
>>>>>> datatypes.
>>>>>> Now, there are just IRIs in D. The association between the IRI and
>>>>>> the
>>>>>> datatype it should denote is completely unspecified. The only
>>>>>> indication that the application can have to implement a datatype map
>>>>>> is that XSD URIs must denote the corresponding XSD datatypes.
>>>>>>
>>>>>> I have troubles understanding why datatype maps should be removed. I
>>>>>> don't remember any discussions saying that they should be changed
>>>>>> to a
>>>>>> set. This change, which now creates issues, suddenly appear in RDF
>>>>>> Semantics ED, with no apparent indication that it was motivated by
>>>>>> complaints about the 2004 design.
>>>>>>
>>>>>> Currently, I see a downside of having a plain set, as it does not
>>>>>> specify to what datatype the IRIs correspond to, while I do not see
>>>>>> the positive side of having a plain set. Can someone provide
>>>>>> references to evidence that this change is required or has more
>>>>>> advantages than it has drawbacks?
>>>>>>
>>>>>
>>>>> You seem to have a very different usage scenario in mind than I do.
>>>>
>>>> I do not have any scenario or use case in mind. In RDF 1.0, given an
>>>> entailment regime and a set of triples, it was possible to determine
>>>> what are the valid entailments and what are non-entailments wrt the
>>>> given regime, regardless of anybody's usage scenario. In particular,
>>>> given a datatype map D, anybody who's given a set of triples and use
>>>> D-entailment regime would derive exactly the same triples because the
>>>> D is saying how to interpret the datatype IRIs. It is not related to
>>>> scenarios or use case.
>>>>
>>>> In the current RDF Semantics, if you have a D, you just know what IRIs
>>>> are recognised as datatypes, but you have no indication about what
>>>> datatypes they denote. So, say D = {http://example.com/dt}, it is not
>>>> possible to know what the following triple entails:
>>>>
>>>>  [] <http://ex.com/p> "a"^^<http://example.com/dt> .
>>>>
>>>> To be able to entail anything from it, you would need to know to what
>>>> datatype the IRI maps to. That's why we need somewhere, somehow, a
>>>> mapping. And the mapping is affecting the entailment regime, so it
>>>> makes sense to have it as a parameter of the regime.
>>>>
>>>> This is very different from the case where an application is making a
>>>> certain usage of an IRI. For instance, displaying instances of
>>>> foaf:Person in a certain way in a webpage does not change anything the
>>>> the conclusions you can normatively draw from the set of triples in
>>>> any entailment regime.
>>>>
>>>>
>>>>> My primary use case (and I'm sorry I sometimes forget there are
>>>>> others)
>>>>> is the the situation where n independent actors publish data in
>>>>> RDF, on
>>>>> the web, to be consumed by m independent actors.   The n publishers
>>>>> each
>>>>> makes a choice about which vocabulary to use; the m consumers each get
>>>>> to see what vocabularies are used and then have to decide which
>>>>> IRIs to
>>>>> recognize.  There are market forces at work, as publishers want to
>>>>> be as
>>>>> accurate and expressive as possible, but they also want to stick to
>>>>> IRIs
>>>>> that will be recognized.  Consumers want to make use of as much
>>>>> data as
>>>>> possible, but every new IRI they recognize is more work, sometimes
>>>>> a lot
>>>>> more work, so they want to keep the recognized set small.
>>>>>
>>>>> In this kind of situation, datatype IRIs are just like very other IRI;
>>>>> all the "standardization" effects are the same.
>>>>
>>>> That would be true if we did not have the D-entailment machinery.
>>>> Applications can apply specific treatments to specific IRIs, including
>>>> datatype IRIs (for instance, display dates using French conventions).
>>>> But if we introduce the D-entailment regime, it means we want to
>>>> impose more precise constraints on how to interpret the IRIs (that is,
>>>> more than just "I recognise this set of IRIs").
>>>>
>>>>>   It's great for both
>>>>> producers and consumers if we can pick a core set of IRIs that
>>>>> producers
>>>>> can assume consumers will recognize.   Things also work okay if a
>>>>> closed
>>>>> group of producers and consumers agree to use a different set. But one
>>>>> of the great strengths of RDF is that the set can be extended
>>>>> without a
>>>>> need for prior agreement.  A producer can simply start to use some new
>>>>> IRI, and consumers can dereference it, learn what it means, and change
>>>>> their code to recognize it.   Of course, it's still painful (details,
>>>>> details), but it's probably not as painful as switching to a new data
>>>>> format with a new media type.   In fact, because it can be done
>>>>> independently for each class, property, individual, and datatype, and
>>>>> data can be presented many ways at once, I expect it to be vastly less
>>>>> painful.
>>>>
>>>> What you say is perfectly true and I agree with it wholeheartedly.
>>>> However, I do not think it is relevant to the D-entailment debate (or
>>>> maybe only marginally).
>>>>
>>>>
>>>>> So, given this usage scenario, I can't see how D helps anybody
>>>>> except as
>>>>> a shorthand for saying "the IRIs which are recognized as datatype
>>>>> identifiers".
>>>>
>>>> In 2004, it says more: it says "These are the datatype IRIs of my
>>>> custom D-entailment regime, and these non-XSD datatype IRIs are
>>>> interpret in this way, according to these datatypes". It could be done
>>>> independently of the D-entailment machinery, in the internal
>>>> specificities of an application, but having it in the standard allows
>>>> one to refer to the normative mechanism.
>>>>
>>>>>
>>>>> Pat, does this answer the question of how RDF gets extended to a new
>>>>> datatype?    I'm happy to try to work this through in more detail, if
>>>>> anyone's interested.
>>>>
>>>> So, to summarise what I understand about your position, you say that
>>>> the D-entailment machinery isn't that much useful at all, or only in a
>>>> weak version of it. Fair enough. As I said during the meeting, I'm not
>>>> resisting strongly to the change but in general, I am reluctant to
>>>> make any change to a standard that is not motivated by clear evidence
>>>> that it improves the existing situation. If any criticism arises from
>>>> our design of D-entailment, it is far easier to justify a no-change
>>>> ("we want to keep backward compatibility, persistence of definitions,
>>>> avoid changes to implementations, etc") rather than a change.
>>>>
>>>>
>>>> AZ.
>>>>
>>>>>
>>>>>       -- Sandro
>>>>>
>>>>>
>>>>>>
>>>>>> AZ.
>>>>>>
>>>>>> Le 24/04/2013 05:09, Pat Hayes a écrit :
>>>>>>> I think we still have a datatype issue that needs a little thought.
>>>>>>>
>>>>>>> The D in D-entailment is a parameter. Although RDF is usually
>>>>>>> treated
>>>>>>> as having its own special datatypes and the compatible XSD types as
>>>>>>> being the standard D, it is quite possible to use RDF with a
>>>>>>> larger D
>>>>>>> set, so that as new datatypes come along (eg geolocation datatypes,
>>>>>>> or time-interval datatypes, or physical unit datatypes, to mention
>>>>>>> three that I know have been suggested) and, presumably, get
>>>>>>> canonized
>>>>>>> by appropriate standards bodies (maybe not the W3C, though) for use
>>>>>>> by various communities, they can be smoothly incorporated into RDF
>>>>>>> data without a lot of fuss and without re-writing the RDF specs.
>>>>>>>
>>>>>>> Do we want to impose any conditions on this process? How can a
>>>>>>> reader
>>>>>>> of some RDF know which datatypes are being recognized by this RDF?
>>>>>>> What do we say about how to interpret a literal whose datatype IRI
>>>>>>> you don't recognize? Should it be OK to throw an error at that
>>>>>>> point,
>>>>>>> or should it *not* be OK to do that? Shouid we require that RDF
>>>>>>> extensions with larger D's only recognize IRIs that have been
>>>>>>> standardly specified in some way? How would we say this?
>>>>>>>
>>>>>>> The current semantic story is that a literal
>>>>>>> "foo"^^unknown:datatypeIRI  is (1) syntactically OK (2) not an error
>>>>>>> but (3) has no special meaning and is treated just like an unknown
>>>>>>> IRI, ie it presumably denotes something, but we don't know what. Is
>>>>>>> this good enough?
>>>>>>>
>>>>>>> Pat
>>>>>>>
>>>>>>> ------------------------------------------------------------ IHMC
>>>>>>> (850)434 8903 or (650)494 3973 40 South Alcaniz St.
>>>>>>> (850)202 4416   office Pensacola (850)202
>>>>>>> 4440   fax FL 32502 (850)291 0667
>>>>>>> mobile phayesAT-SIGNihmc.us http://www.ihmc.us/users/phayes
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>>
>
>
Received on Friday, 26 April 2013 13:43:31 UTC