Re: How does RDF get extended to new datatypes? from Pat Hayes on 2013-04-30 (public-rdf-wg@w3.org from April 2013)

From: Pat Hayes <phayes@ihmc.us>
Date: Tue, 30 Apr 2013 12:58:19 -0500
To: Antoine Zimmermann <antoine.zimmermann@emse.fr>
Cc: Sandro Hawke <sandro@w3.org>, public-rdf-wg@w3.org
Message-Id: <53BE9985-E375-4B58-BC47-9327A68C9879@ihmc.us>
On Apr 26, 2013, at 2:36 AM, Antoine Zimmermann wrote:

> Le 26/04/2013 00:41, Sandro Hawke a écrit :
>> I think you're saying that in the 2004 semantics, one can't just say
>> 
>>    (a) In addition to some XSD stuff, this system also implements
>>    datatype http://example.org/daterange
> 
> "http://example.org/daterange" is an IRI, not a datatype. So what datatype does this system implements? Intuitively, this should mean that it implements the datatype identified by "http://example.org/daterange". But what this datatype is? It's not an XSD datatype, so the standards do not say. There is no datatype map given, so my RDF-2004 assumptions do not allow me to decide. Still, I have my Linked-data assumptions that tell me I just have to look up and figure out. All right, let's do that. This URL is redirecting to http://example.iana.org/, which does not tell me anything useful.
> So your entailment regime is incompletely defined.

True, in this case. But suppose we are following the 2004 spec, and we read some RDF which has a literal in it typed using this IRI. All we have is the IRI. How do we discover what datatype map is supposed to be used on this datatype? The spec provides no clue as to how to discover this. And what does it even *mean*? It means simply that we need to know what datatype this IRI is supposed to... well, to denote, in fact. That is, we have an IRI and we need to know what it is being used to denote, to refer to. That is exactly what the 2013 wording assumes: there is an IRI which evidently is being used to identify a datatype, and we are supposed to find out which datatype it refers to. (How to find out, is not specified: perhaps by using linked data principles, perhaps just by being inside the relevant user community.) But phrasing this as "finding out which datatype map it is being used with" doesn't add anything useful to the discussion. 

But, one might respond, what of the case (purely imaginary, but possible) in which the same IRI is used by one source to indicate one datatype, and by a different source to indicate a different datatype? I guess this could conceivably happen, but it could have happened in 2004 as well, and then it would have been described as two sources using different datatype maps. But again, adding "datatype map" to the discussion does not change or improve the situation, or provide any clarification about how to proceed; it simply describes the same situation in more complicated language. (Worse, in fact, the 2004 wording seems to suggest that this situation is acceptable, when in fact it is not, and should be strongly deprecated, although I confess I am having trouble finding a form of words to say this.) 

Pat

> 
>> 
>> Instead, by the 2004 spec, one has to say:
>> 
>>    (b) In addition to some XSD stuff, this system also implements
>>    datatype http://example.org/daterange as meaning the datatype such
>>    that the the value space is all pairs of time instants, with the
>>    first element < the second elment, and the lexical space which is
>>    the concatenation of two elements from the lexical space of
>>    xs:dateTime, separated by a "..", and the mapping between the two is
>>    such that....etc, etc.
>> 
>> Is that right?
> 
> That's pretty much it. This could take another form, and informed Linked Data specialists would take care that the IRI dereferences to a description of the datatype, which would suffice as a way to indicate what the IRI maps to (that is, in practice, D *can* be specified by simply providing the set of IRIs and indicating that the actual datatype is described in the document to which the IRIs dereference to).
> 
> 
>   And Pat's proposal would make it so people would be
>> saying (a) instead of (b)?
>> 
>>       -- Sandro
>> 
>> 
>> 
>> On 04/25/2013 11:05 AM, Antoine Zimmermann wrote:
>>> 
>>> Le 25/04/2013 15:37, Sandro Hawke a écrit :
>>>> On 04/24/2013 10:06 AM, Antoine Zimmermann wrote:
>>>>> It seems to me that this problem is due to the removal of the notion
>>>>> of datatype map. In 2004, applications could implement the
>>>>> D-entailment they liked, with D being a partial mapping from IRI to
>>>>> datatypes.
>>>>> Now, there are just IRIs in D. The association between the IRI and the
>>>>> datatype it should denote is completely unspecified. The only
>>>>> indication that the application can have to implement a datatype map
>>>>> is that XSD URIs must denote the corresponding XSD datatypes.
>>>>> 
>>>>> I have troubles understanding why datatype maps should be removed. I
>>>>> don't remember any discussions saying that they should be changed to a
>>>>> set. This change, which now creates issues, suddenly appear in RDF
>>>>> Semantics ED, with no apparent indication that it was motivated by
>>>>> complaints about the 2004 design.
>>>>> 
>>>>> Currently, I see a downside of having a plain set, as it does not
>>>>> specify to what datatype the IRIs correspond to, while I do not see
>>>>> the positive side of having a plain set. Can someone provide
>>>>> references to evidence that this change is required or has more
>>>>> advantages than it has drawbacks?
>>>>> 
>>>> 
>>>> You seem to have a very different usage scenario in mind than I do.
>>> 
>>> I do not have any scenario or use case in mind. In RDF 1.0, given an
>>> entailment regime and a set of triples, it was possible to determine
>>> what are the valid entailments and what are non-entailments wrt the
>>> given regime, regardless of anybody's usage scenario. In particular,
>>> given a datatype map D, anybody who's given a set of triples and use
>>> D-entailment regime would derive exactly the same triples because the
>>> D is saying how to interpret the datatype IRIs. It is not related to
>>> scenarios or use case.
>>> 
>>> In the current RDF Semantics, if you have a D, you just know what IRIs
>>> are recognised as datatypes, but you have no indication about what
>>> datatypes they denote. So, say D = {http://example.com/dt}, it is not
>>> possible to know what the following triple entails:
>>> 
>>> [] <http://ex.com/p>  "a"^^<http://example.com/dt> .
>>> 
>>> To be able to entail anything from it, you would need to know to what
>>> datatype the IRI maps to. That's why we need somewhere, somehow, a
>>> mapping. And the mapping is affecting the entailment regime, so it
>>> makes sense to have it as a parameter of the regime.
>>> 
>>> This is very different from the case where an application is making a
>>> certain usage of an IRI. For instance, displaying instances of
>>> foaf:Person in a certain way in a webpage does not change anything the
>>> the conclusions you can normatively draw from the set of triples in
>>> any entailment regime.
>>> 
>>> 
>>>> My primary use case (and I'm sorry I sometimes forget there are others)
>>>> is the the situation where n independent actors publish data in RDF, on
>>>> the web, to be consumed by m independent actors.   The n publishers each
>>>> makes a choice about which vocabulary to use; the m consumers each get
>>>> to see what vocabularies are used and then have to decide which IRIs to
>>>> recognize.  There are market forces at work, as publishers want to be as
>>>> accurate and expressive as possible, but they also want to stick to IRIs
>>>> that will be recognized.  Consumers want to make use of as much data as
>>>> possible, but every new IRI they recognize is more work, sometimes a lot
>>>> more work, so they want to keep the recognized set small.
>>>> 
>>>> In this kind of situation, datatype IRIs are just like very other IRI;
>>>> all the "standardization" effects are the same.
>>> 
>>> That would be true if we did not have the D-entailment machinery.
>>> Applications can apply specific treatments to specific IRIs, including
>>> datatype IRIs (for instance, display dates using French conventions).
>>> But if we introduce the D-entailment regime, it means we want to
>>> impose more precise constraints on how to interpret the IRIs (that is,
>>> more than just "I recognise this set of IRIs").
>>> 
>>>>  It's great for both
>>>> producers and consumers if we can pick a core set of IRIs that producers
>>>> can assume consumers will recognize.   Things also work okay if a closed
>>>> group of producers and consumers agree to use a different set. But one
>>>> of the great strengths of RDF is that the set can be extended without a
>>>> need for prior agreement.  A producer can simply start to use some new
>>>> IRI, and consumers can dereference it, learn what it means, and change
>>>> their code to recognize it.   Of course, it's still painful (details,
>>>> details), but it's probably not as painful as switching to a new data
>>>> format with a new media type.   In fact, because it can be done
>>>> independently for each class, property, individual, and datatype, and
>>>> data can be presented many ways at once, I expect it to be vastly less
>>>> painful.
>>> 
>>> What you say is perfectly true and I agree with it wholeheartedly.
>>> However, I do not think it is relevant to the D-entailment debate (or
>>> maybe only marginally).
>>> 
>>> 
>>>> So, given this usage scenario, I can't see how D helps anybody except as
>>>> a shorthand for saying "the IRIs which are recognized as datatype
>>>> identifiers".
>>> 
>>> In 2004, it says more: it says "These are the datatype IRIs of my
>>> custom D-entailment regime, and these non-XSD datatype IRIs are
>>> interpret in this way, according to these datatypes". It could be done
>>> independently of the D-entailment machinery, in the internal
>>> specificities of an application, but having it in the standard allows
>>> one to refer to the normative mechanism.
>>> 
>>>> 
>>>> Pat, does this answer the question of how RDF gets extended to a new
>>>> datatype?    I'm happy to try to work this through in more detail, if
>>>> anyone's interested.
>>> 
>>> So, to summarise what I understand about your position, you say that
>>> the D-entailment machinery isn't that much useful at all, or only in a
>>> weak version of it. Fair enough. As I said during the meeting, I'm not
>>> resisting strongly to the change but in general, I am reluctant to
>>> make any change to a standard that is not motivated by clear evidence
>>> that it improves the existing situation. If any criticism arises from
>>> our design of D-entailment, it is far easier to justify a no-change
>>> ("we want to keep backward compatibility, persistence of definitions,
>>> avoid changes to implementations, etc") rather than a change.
>>> 
>>> 
>>> AZ.
>>> 
>>>> 
>>>>      -- Sandro
>>>> 
>>>> 
>>>>> 
>>>>> AZ.
>>>>> 
>>>>> Le 24/04/2013 05:09, Pat Hayes a écrit :
>>>>>> I think we still have a datatype issue that needs a little thought.
>>>>>> 
>>>>>> The D in D-entailment is a parameter. Although RDF is usually treated
>>>>>> as having its own special datatypes and the compatible XSD types as
>>>>>> being the standard D, it is quite possible to use RDF with a larger D
>>>>>> set, so that as new datatypes come along (eg geolocation datatypes,
>>>>>> or time-interval datatypes, or physical unit datatypes, to mention
>>>>>> three that I know have been suggested) and, presumably, get canonized
>>>>>> by appropriate standards bodies (maybe not the W3C, though) for use
>>>>>> by various communities, they can be smoothly incorporated into RDF
>>>>>> data without a lot of fuss and without re-writing the RDF specs.
>>>>>> 
>>>>>> Do we want to impose any conditions on this process? How can a reader
>>>>>> of some RDF know which datatypes are being recognized by this RDF?
>>>>>> What do we say about how to interpret a literal whose datatype IRI
>>>>>> you don't recognize? Should it be OK to throw an error at that point,
>>>>>> or should it *not* be OK to do that? Shouid we require that RDF
>>>>>> extensions with larger D's only recognize IRIs that have been
>>>>>> standardly specified in some way? How would we say this?
>>>>>> 
>>>>>> The current semantic story is that a literal
>>>>>> "foo"^^unknown:datatypeIRI  is (1) syntactically OK (2) not an error
>>>>>> but (3) has no special meaning and is treated just like an unknown
>>>>>> IRI, ie it presumably denotes something, but we don't know what. Is
>>>>>> this good enough?
>>>>>> 
>>>>>> Pat
>>>>>> 
>>>>>> ------------------------------------------------------------ IHMC
>>>>>> (850)434 8903 or (650)494 3973 40 South Alcaniz St.
>>>>>> (850)202 4416   office Pensacola (850)202
>>>>>> 4440   fax FL 32502                              (850)291 0667
>>>>>> mobile phayesAT-SIGNihmc.us http://www.ihmc.us/users/phayes
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>>>> 
>>>> 
>>> 
>> 
> 
> 
> -- 
> Antoine Zimmermann
> ISCOD / LSTI - Institut Henri Fayol
> École Nationale Supérieure des Mines de Saint-Étienne
> 158 cours Fauriel
> 42023 Saint-Étienne Cedex 2
> France
> Tél:+33(0)4 77 42 66 03
> Fax:+33(0)4 77 42 66 66
> http://zimmer.aprilfoolsreview.com/
> 
> 

------------------------------------------------------------
IHMC                                     (850)434 8903 or (650)494 3973   
40 South Alcaniz St.           (850)202 4416   office
Pensacola                            (850)202 4440   fax
FL 32502                              (850)291 0667   mobile
phayesAT-SIGNihmc.us       http://www.ihmc.us/users/phayes
Received on Tuesday, 30 April 2013 17:58:51 UTC