Re: How does RDF get extended to new datatypes? from Sandro Hawke on 2013-04-25 (public-rdf-wg@w3.org from April 2013)

From: Sandro Hawke <sandro@w3.org>
Date: Thu, 25 Apr 2013 18:41:40 -0400
To: Antoine Zimmermann <antoine.zimmermann@emse.fr>
CC: public-rdf-wg@w3.org
Message-ID: <5179B124.7060700@w3.org>
I think you're saying that in the 2004 semantics, one can't just say

    (a) In addition to some XSD stuff, this system also implements
    datatype http://example.org/daterange

Instead, by the 2004 spec, one has to say:

    (b) In addition to some XSD stuff, this system also implements
    datatype http://example.org/daterange as meaning the datatype such
    that the the value space is all pairs of time instants, with the
    first element < the second elment, and the lexical space which is
    the concatenation of two elements from the lexical space of
    xs:dateTime, separated by a "..", and the mapping between the two is
    such that....etc, etc.

Is that right?   And Pat's proposal would make it so people would be 
saying (a) instead of (b)?

       -- Sandro



On 04/25/2013 11:05 AM, Antoine Zimmermann wrote:
>
> Le 25/04/2013 15:37, Sandro Hawke a écrit :
>> On 04/24/2013 10:06 AM, Antoine Zimmermann wrote:
>>> It seems to me that this problem is due to the removal of the notion
>>> of datatype map. In 2004, applications could implement the
>>> D-entailment they liked, with D being a partial mapping from IRI to
>>> datatypes.
>>> Now, there are just IRIs in D. The association between the IRI and the
>>> datatype it should denote is completely unspecified. The only
>>> indication that the application can have to implement a datatype map
>>> is that XSD URIs must denote the corresponding XSD datatypes.
>>>
>>> I have troubles understanding why datatype maps should be removed. I
>>> don't remember any discussions saying that they should be changed to a
>>> set. This change, which now creates issues, suddenly appear in RDF
>>> Semantics ED, with no apparent indication that it was motivated by
>>> complaints about the 2004 design.
>>>
>>> Currently, I see a downside of having a plain set, as it does not
>>> specify to what datatype the IRIs correspond to, while I do not see
>>> the positive side of having a plain set. Can someone provide
>>> references to evidence that this change is required or has more
>>> advantages than it has drawbacks?
>>>
>>
>> You seem to have a very different usage scenario in mind than I do.
>
> I do not have any scenario or use case in mind. In RDF 1.0, given an 
> entailment regime and a set of triples, it was possible to determine 
> what are the valid entailments and what are non-entailments wrt the 
> given regime, regardless of anybody's usage scenario. In particular, 
> given a datatype map D, anybody who's given a set of triples and use 
> D-entailment regime would derive exactly the same triples because the 
> D is saying how to interpret the datatype IRIs. It is not related to 
> scenarios or use case.
>
> In the current RDF Semantics, if you have a D, you just know what IRIs 
> are recognised as datatypes, but you have no indication about what 
> datatypes they denote. So, say D = {http://example.com/dt}, it is not 
> possible to know what the following triple entails:
>
>  []  <http://ex.com/p>  "a"^^<http://example.com/dt> .
>
> To be able to entail anything from it, you would need to know to what 
> datatype the IRI maps to. That's why we need somewhere, somehow, a 
> mapping. And the mapping is affecting the entailment regime, so it 
> makes sense to have it as a parameter of the regime.
>
> This is very different from the case where an application is making a 
> certain usage of an IRI. For instance, displaying instances of 
> foaf:Person in a certain way in a webpage does not change anything the 
> the conclusions you can normatively draw from the set of triples in 
> any entailment regime.
>
>
>> My primary use case (and I'm sorry I sometimes forget there are others)
>> is the the situation where n independent actors publish data in RDF, on
>> the web, to be consumed by m independent actors.   The n publishers each
>> makes a choice about which vocabulary to use; the m consumers each get
>> to see what vocabularies are used and then have to decide which IRIs to
>> recognize.  There are market forces at work, as publishers want to be as
>> accurate and expressive as possible, but they also want to stick to IRIs
>> that will be recognized.  Consumers want to make use of as much data as
>> possible, but every new IRI they recognize is more work, sometimes a lot
>> more work, so they want to keep the recognized set small.
>>
>> In this kind of situation, datatype IRIs are just like very other IRI;
>> all the "standardization" effects are the same.
>
> That would be true if we did not have the D-entailment machinery. 
> Applications can apply specific treatments to specific IRIs, including 
> datatype IRIs (for instance, display dates using French conventions).
> But if we introduce the D-entailment regime, it means we want to 
> impose more precise constraints on how to interpret the IRIs (that is, 
> more than just "I recognise this set of IRIs").
>
>>   It's great for both
>> producers and consumers if we can pick a core set of IRIs that producers
>> can assume consumers will recognize.   Things also work okay if a closed
>> group of producers and consumers agree to use a different set. But one
>> of the great strengths of RDF is that the set can be extended without a
>> need for prior agreement.  A producer can simply start to use some new
>> IRI, and consumers can dereference it, learn what it means, and change
>> their code to recognize it.   Of course, it's still painful (details,
>> details), but it's probably not as painful as switching to a new data
>> format with a new media type.   In fact, because it can be done
>> independently for each class, property, individual, and datatype, and
>> data can be presented many ways at once, I expect it to be vastly less
>> painful.
>
> What you say is perfectly true and I agree with it wholeheartedly. 
> However, I do not think it is relevant to the D-entailment debate (or 
> maybe only marginally).
>
>
>> So, given this usage scenario, I can't see how D helps anybody except as
>> a shorthand for saying "the IRIs which are recognized as datatype
>> identifiers".
>
> In 2004, it says more: it says "These are the datatype IRIs of my 
> custom D-entailment regime, and these non-XSD datatype IRIs are 
> interpret in this way, according to these datatypes". It could be done 
> independently of the D-entailment machinery, in the internal 
> specificities of an application, but having it in the standard allows 
> one to refer to the normative mechanism.
>
>>
>> Pat, does this answer the question of how RDF gets extended to a new
>> datatype?    I'm happy to try to work this through in more detail, if
>> anyone's interested.
>
> So, to summarise what I understand about your position, you say that 
> the D-entailment machinery isn't that much useful at all, or only in a 
> weak version of it. Fair enough. As I said during the meeting, I'm not 
> resisting strongly to the change but in general, I am reluctant to 
> make any change to a standard that is not motivated by clear evidence 
> that it improves the existing situation. If any criticism arises from 
> our design of D-entailment, it is far easier to justify a no-change 
> ("we want to keep backward compatibility, persistence of definitions, 
> avoid changes to implementations, etc") rather than a change.
>
>
> AZ.
>
>>
>>       -- Sandro
>>
>>
>>>
>>> AZ.
>>>
>>> Le 24/04/2013 05:09, Pat Hayes a écrit :
>>>> I think we still have a datatype issue that needs a little thought.
>>>>
>>>> The D in D-entailment is a parameter. Although RDF is usually treated
>>>> as having its own special datatypes and the compatible XSD types as
>>>> being the standard D, it is quite possible to use RDF with a larger D
>>>> set, so that as new datatypes come along (eg geolocation datatypes,
>>>> or time-interval datatypes, or physical unit datatypes, to mention
>>>> three that I know have been suggested) and, presumably, get canonized
>>>> by appropriate standards bodies (maybe not the W3C, though) for use
>>>> by various communities, they can be smoothly incorporated into RDF
>>>> data without a lot of fuss and without re-writing the RDF specs.
>>>>
>>>> Do we want to impose any conditions on this process? How can a reader
>>>> of some RDF know which datatypes are being recognized by this RDF?
>>>> What do we say about how to interpret a literal whose datatype IRI
>>>> you don't recognize? Should it be OK to throw an error at that point,
>>>> or should it *not* be OK to do that? Shouid we require that RDF
>>>> extensions with larger D's only recognize IRIs that have been
>>>> standardly specified in some way? How would we say this?
>>>>
>>>> The current semantic story is that a literal
>>>> "foo"^^unknown:datatypeIRI  is (1) syntactically OK (2) not an error
>>>> but (3) has no special meaning and is treated just like an unknown
>>>> IRI, ie it presumably denotes something, but we don't know what. Is
>>>> this good enough?
>>>>
>>>> Pat
>>>>
>>>> ------------------------------------------------------------ IHMC
>>>> (850)434 8903 or (650)494 3973 40 South Alcaniz St.
>>>> (850)202 4416   office Pensacola (850)202
>>>> 4440   fax FL 32502                              (850)291 0667
>>>> mobile phayesAT-SIGNihmc.us http://www.ihmc.us/users/phayes
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>
>>
>>
>>
>
Received on Thursday, 25 April 2013 22:41:47 UTC