Re: How does RDF get extended to new datatypes? from Sandro Hawke on 2013-04-26 (public-rdf-wg@w3.org from April 2013)

From: Sandro Hawke <sandro@w3.org>
Date: Fri, 26 Apr 2013 07:34:17 -0400
To: Antoine Zimmermann <antoine.zimmermann@emse.fr>
CC: public-rdf-wg@w3.org
Message-ID: <517A6639.2070100@w3.org>
Okay, that makes sense.   I'm ambivalent.   On the one hand, I prefer 
the linked data approach (any real spec is going to be built on 
references to other specs; why not make those references machine 
readable?) but on the other hand I see your point that we shouldn't 
change things from the 2004 spec without more evidence than has been 
presented here.

          -- Sandro

On 04/26/2013 03:36 AM, Antoine Zimmermann wrote:
> Le 26/04/2013 00:41, Sandro Hawke a écrit :
>> I think you're saying that in the 2004 semantics, one can't just say
>>
>>     (a) In addition to some XSD stuff, this system also implements
>>     datatype http://example.org/daterange
>
> "http://example.org/daterange" is an IRI, not a datatype. So what 
> datatype does this system implements? Intuitively, this should mean 
> that it implements the datatype identified by 
> "http://example.org/daterange". But what this datatype is? It's not an 
> XSD datatype, so the standards do not say. There is no datatype map 
> given, so my RDF-2004 assumptions do not allow me to decide. Still, I 
> have my Linked-data assumptions that tell me I just have to look up 
> and figure out. All right, let's do that. This URL is redirecting to 
> http://example.iana.org/, which does not tell me anything useful.
> So your entailment regime is incompletely defined.
>
>>
>> Instead, by the 2004 spec, one has to say:
>>
>>     (b) In addition to some XSD stuff, this system also implements
>>     datatype http://example.org/daterange as meaning the datatype such
>>     that the the value space is all pairs of time instants, with the
>>     first element < the second elment, and the lexical space which is
>>     the concatenation of two elements from the lexical space of
>>     xs:dateTime, separated by a "..", and the mapping between the two is
>>     such that....etc, etc.
>>
>> Is that right?
>
> That's pretty much it. This could take another form, and informed 
> Linked Data specialists would take care that the IRI dereferences to a 
> description of the datatype, which would suffice as a way to indicate 
> what the IRI maps to (that is, in practice, D *can* be specified by 
> simply providing the set of IRIs and indicating that the actual 
> datatype is described in the document to which the IRIs dereference to).
>
>
>    And Pat's proposal would make it so people would be
>> saying (a) instead of (b)?
>>
>>        -- Sandro
>>
>>
>>
>> On 04/25/2013 11:05 AM, Antoine Zimmermann wrote:
>>>
>>> Le 25/04/2013 15:37, Sandro Hawke a écrit :
>>>> On 04/24/2013 10:06 AM, Antoine Zimmermann wrote:
>>>>> It seems to me that this problem is due to the removal of the notion
>>>>> of datatype map. In 2004, applications could implement the
>>>>> D-entailment they liked, with D being a partial mapping from IRI to
>>>>> datatypes.
>>>>> Now, there are just IRIs in D. The association between the IRI and 
>>>>> the
>>>>> datatype it should denote is completely unspecified. The only
>>>>> indication that the application can have to implement a datatype map
>>>>> is that XSD URIs must denote the corresponding XSD datatypes.
>>>>>
>>>>> I have troubles understanding why datatype maps should be removed. I
>>>>> don't remember any discussions saying that they should be changed 
>>>>> to a
>>>>> set. This change, which now creates issues, suddenly appear in RDF
>>>>> Semantics ED, with no apparent indication that it was motivated by
>>>>> complaints about the 2004 design.
>>>>>
>>>>> Currently, I see a downside of having a plain set, as it does not
>>>>> specify to what datatype the IRIs correspond to, while I do not see
>>>>> the positive side of having a plain set. Can someone provide
>>>>> references to evidence that this change is required or has more
>>>>> advantages than it has drawbacks?
>>>>>
>>>>
>>>> You seem to have a very different usage scenario in mind than I do.
>>>
>>> I do not have any scenario or use case in mind. In RDF 1.0, given an
>>> entailment regime and a set of triples, it was possible to determine
>>> what are the valid entailments and what are non-entailments wrt the
>>> given regime, regardless of anybody's usage scenario. In particular,
>>> given a datatype map D, anybody who's given a set of triples and use
>>> D-entailment regime would derive exactly the same triples because the
>>> D is saying how to interpret the datatype IRIs. It is not related to
>>> scenarios or use case.
>>>
>>> In the current RDF Semantics, if you have a D, you just know what IRIs
>>> are recognised as datatypes, but you have no indication about what
>>> datatypes they denote. So, say D = {http://example.com/dt}, it is not
>>> possible to know what the following triple entails:
>>>
>>>  [] <http://ex.com/p> "a"^^<http://example.com/dt> .
>>>
>>> To be able to entail anything from it, you would need to know to what
>>> datatype the IRI maps to. That's why we need somewhere, somehow, a
>>> mapping. And the mapping is affecting the entailment regime, so it
>>> makes sense to have it as a parameter of the regime.
>>>
>>> This is very different from the case where an application is making a
>>> certain usage of an IRI. For instance, displaying instances of
>>> foaf:Person in a certain way in a webpage does not change anything the
>>> the conclusions you can normatively draw from the set of triples in
>>> any entailment regime.
>>>
>>>
>>>> My primary use case (and I'm sorry I sometimes forget there are 
>>>> others)
>>>> is the the situation where n independent actors publish data in 
>>>> RDF, on
>>>> the web, to be consumed by m independent actors.   The n publishers 
>>>> each
>>>> makes a choice about which vocabulary to use; the m consumers each get
>>>> to see what vocabularies are used and then have to decide which 
>>>> IRIs to
>>>> recognize.  There are market forces at work, as publishers want to 
>>>> be as
>>>> accurate and expressive as possible, but they also want to stick to 
>>>> IRIs
>>>> that will be recognized.  Consumers want to make use of as much 
>>>> data as
>>>> possible, but every new IRI they recognize is more work, sometimes 
>>>> a lot
>>>> more work, so they want to keep the recognized set small.
>>>>
>>>> In this kind of situation, datatype IRIs are just like very other IRI;
>>>> all the "standardization" effects are the same.
>>>
>>> That would be true if we did not have the D-entailment machinery.
>>> Applications can apply specific treatments to specific IRIs, including
>>> datatype IRIs (for instance, display dates using French conventions).
>>> But if we introduce the D-entailment regime, it means we want to
>>> impose more precise constraints on how to interpret the IRIs (that is,
>>> more than just "I recognise this set of IRIs").
>>>
>>>>   It's great for both
>>>> producers and consumers if we can pick a core set of IRIs that 
>>>> producers
>>>> can assume consumers will recognize.   Things also work okay if a 
>>>> closed
>>>> group of producers and consumers agree to use a different set. But one
>>>> of the great strengths of RDF is that the set can be extended 
>>>> without a
>>>> need for prior agreement.  A producer can simply start to use some new
>>>> IRI, and consumers can dereference it, learn what it means, and change
>>>> their code to recognize it.   Of course, it's still painful (details,
>>>> details), but it's probably not as painful as switching to a new data
>>>> format with a new media type.   In fact, because it can be done
>>>> independently for each class, property, individual, and datatype, and
>>>> data can be presented many ways at once, I expect it to be vastly less
>>>> painful.
>>>
>>> What you say is perfectly true and I agree with it wholeheartedly.
>>> However, I do not think it is relevant to the D-entailment debate (or
>>> maybe only marginally).
>>>
>>>
>>>> So, given this usage scenario, I can't see how D helps anybody 
>>>> except as
>>>> a shorthand for saying "the IRIs which are recognized as datatype
>>>> identifiers".
>>>
>>> In 2004, it says more: it says "These are the datatype IRIs of my
>>> custom D-entailment regime, and these non-XSD datatype IRIs are
>>> interpret in this way, according to these datatypes". It could be done
>>> independently of the D-entailment machinery, in the internal
>>> specificities of an application, but having it in the standard allows
>>> one to refer to the normative mechanism.
>>>
>>>>
>>>> Pat, does this answer the question of how RDF gets extended to a new
>>>> datatype?    I'm happy to try to work this through in more detail, if
>>>> anyone's interested.
>>>
>>> So, to summarise what I understand about your position, you say that
>>> the D-entailment machinery isn't that much useful at all, or only in a
>>> weak version of it. Fair enough. As I said during the meeting, I'm not
>>> resisting strongly to the change but in general, I am reluctant to
>>> make any change to a standard that is not motivated by clear evidence
>>> that it improves the existing situation. If any criticism arises from
>>> our design of D-entailment, it is far easier to justify a no-change
>>> ("we want to keep backward compatibility, persistence of definitions,
>>> avoid changes to implementations, etc") rather than a change.
>>>
>>>
>>> AZ.
>>>
>>>>
>>>>       -- Sandro
>>>>
>>>>
>>>>>
>>>>> AZ.
>>>>>
>>>>> Le 24/04/2013 05:09, Pat Hayes a écrit :
>>>>>> I think we still have a datatype issue that needs a little thought.
>>>>>>
>>>>>> The D in D-entailment is a parameter. Although RDF is usually 
>>>>>> treated
>>>>>> as having its own special datatypes and the compatible XSD types as
>>>>>> being the standard D, it is quite possible to use RDF with a 
>>>>>> larger D
>>>>>> set, so that as new datatypes come along (eg geolocation datatypes,
>>>>>> or time-interval datatypes, or physical unit datatypes, to mention
>>>>>> three that I know have been suggested) and, presumably, get 
>>>>>> canonized
>>>>>> by appropriate standards bodies (maybe not the W3C, though) for use
>>>>>> by various communities, they can be smoothly incorporated into RDF
>>>>>> data without a lot of fuss and without re-writing the RDF specs.
>>>>>>
>>>>>> Do we want to impose any conditions on this process? How can a 
>>>>>> reader
>>>>>> of some RDF know which datatypes are being recognized by this RDF?
>>>>>> What do we say about how to interpret a literal whose datatype IRI
>>>>>> you don't recognize? Should it be OK to throw an error at that 
>>>>>> point,
>>>>>> or should it *not* be OK to do that? Shouid we require that RDF
>>>>>> extensions with larger D's only recognize IRIs that have been
>>>>>> standardly specified in some way? How would we say this?
>>>>>>
>>>>>> The current semantic story is that a literal
>>>>>> "foo"^^unknown:datatypeIRI  is (1) syntactically OK (2) not an error
>>>>>> but (3) has no special meaning and is treated just like an unknown
>>>>>> IRI, ie it presumably denotes something, but we don't know what. Is
>>>>>> this good enough?
>>>>>>
>>>>>> Pat
>>>>>>
>>>>>> ------------------------------------------------------------ IHMC
>>>>>> (850)434 8903 or (650)494 3973 40 South Alcaniz St.
>>>>>> (850)202 4416   office Pensacola (850)202
>>>>>> 4440   fax FL 32502 (850)291 0667
>>>>>> mobile phayesAT-SIGNihmc.us http://www.ihmc.us/users/phayes
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>>
>>>>
>>>
>>
>
>
Received on Friday, 26 April 2013 11:34:26 UTC