Re: How does RDF get extended to new datatypes? from Antoine Zimmermann on 2013-04-25 (public-rdf-wg@w3.org from April 2013)

From: Antoine Zimmermann <antoine.zimmermann@emse.fr>
Date: Thu, 25 Apr 2013 17:05:15 +0200
To: public-rdf-wg@w3.org
Message-ID: <5179462B.2030707@emse.fr>
Le 25/04/2013 15:37, Sandro Hawke a écrit :
> On 04/24/2013 10:06 AM, Antoine Zimmermann wrote:
>> It seems to me that this problem is due to the removal of the notion
>> of datatype map. In 2004, applications could implement the
>> D-entailment they liked, with D being a partial mapping from IRI to
>> datatypes.
>> Now, there are just IRIs in D. The association between the IRI and the
>> datatype it should denote is completely unspecified. The only
>> indication that the application can have to implement a datatype map
>> is that XSD URIs must denote the corresponding XSD datatypes.
>>
>> I have troubles understanding why datatype maps should be removed. I
>> don't remember any discussions saying that they should be changed to a
>> set. This change, which now creates issues, suddenly appear in RDF
>> Semantics ED, with no apparent indication that it was motivated by
>> complaints about the 2004 design.
>>
>> Currently, I see a downside of having a plain set, as it does not
>> specify to what datatype the IRIs correspond to, while I do not see
>> the positive side of having a plain set. Can someone provide
>> references to evidence that this change is required or has more
>> advantages than it has drawbacks?
>>
>
> You seem to have a very different usage scenario in mind than I do.

I do not have any scenario or use case in mind. In RDF 1.0, given an 
entailment regime and a set of triples, it was possible to determine 
what are the valid entailments and what are non-entailments wrt the 
given regime, regardless of anybody's usage scenario. In particular, 
given a datatype map D, anybody who's given a set of triples and use 
D-entailment regime would derive exactly the same triples because the D 
is saying how to interpret the datatype IRIs. It is not related to 
scenarios or use case.

In the current RDF Semantics, if you have a D, you just know what IRIs 
are recognised as datatypes, but you have no indication about what 
datatypes they denote. So, say D = {http://example.com/dt}, it is not 
possible to know what the following triple entails:

  []  <http://ex.com/p>  "a"^^<http://example.com/dt> .

To be able to entail anything from it, you would need to know to what 
datatype the IRI maps to. That's why we need somewhere, somehow, a 
mapping. And the mapping is affecting the entailment regime, so it makes 
sense to have it as a parameter of the regime.

This is very different from the case where an application is making a 
certain usage of an IRI. For instance, displaying instances of 
foaf:Person in a certain way in a webpage does not change anything the 
the conclusions you can normatively draw from the set of triples in any 
entailment regime.


> My primary use case (and I'm sorry I sometimes forget there are others)
> is the the situation where n independent actors publish data in RDF, on
> the web, to be consumed by m independent actors.   The n publishers each
> makes a choice about which vocabulary to use; the m consumers each get
> to see what vocabularies are used and then have to decide which IRIs to
> recognize.  There are market forces at work, as publishers want to be as
> accurate and expressive as possible, but they also want to stick to IRIs
> that will be recognized.  Consumers want to make use of as much data as
> possible, but every new IRI they recognize is more work, sometimes a lot
> more work, so they want to keep the recognized set small.
>
> In this kind of situation, datatype IRIs are just like very other IRI;
> all the "standardization" effects are the same.

That would be true if we did not have the D-entailment machinery. 
Applications can apply specific treatments to specific IRIs, including 
datatype IRIs (for instance, display dates using French conventions).
But if we introduce the D-entailment regime, it means we want to impose 
more precise constraints on how to interpret the IRIs (that is, more 
than just "I recognise this set of IRIs").

>   It's great for both
> producers and consumers if we can pick a core set of IRIs that producers
> can assume consumers will recognize.   Things also work okay if a closed
> group of producers and consumers agree to use a different set.   But one
> of the great strengths of RDF is that the set can be extended without a
> need for prior agreement.  A producer can simply start to use some new
> IRI, and consumers can dereference it, learn what it means, and change
> their code to recognize it.   Of course, it's still painful (details,
> details), but it's probably not as painful as switching to a new data
> format with a new media type.   In fact, because it can be done
> independently for each class, property, individual, and datatype, and
> data can be presented many ways at once, I expect it to be vastly less
> painful.

What you say is perfectly true and I agree with it wholeheartedly. 
However, I do not think it is relevant to the D-entailment debate (or 
maybe only marginally).


> So, given this usage scenario, I can't see how D helps anybody except as
> a shorthand for saying "the IRIs which are recognized as datatype
> identifiers".

In 2004, it says more: it says "These are the datatype IRIs of my custom 
D-entailment regime, and these non-XSD datatype IRIs are interpret in 
this way, according to these datatypes". It could be done independently 
of the D-entailment machinery, in the internal specificities of an 
application, but having it in the standard allows one to refer to the 
normative mechanism.

>
> Pat, does this answer the question of how RDF gets extended to a new
> datatype?    I'm happy to try to work this through in more detail, if
> anyone's interested.

So, to summarise what I understand about your position, you say that the 
D-entailment machinery isn't that much useful at all, or only in a weak 
version of it. Fair enough. As I said during the meeting, I'm not 
resisting strongly to the change but in general, I am reluctant to make 
any change to a standard that is not motivated by clear evidence that it 
improves the existing situation. If any criticism arises from our design 
of D-entailment, it is far easier to justify a no-change ("we want to 
keep backward compatibility, persistence of definitions, avoid changes 
to implementations, etc") rather than a change.


AZ.

>
>       -- Sandro
>
>
>>
>> AZ.
>>
>> Le 24/04/2013 05:09, Pat Hayes a écrit :
>>> I think we still have a datatype issue that needs a little thought.
>>>
>>> The D in D-entailment is a parameter. Although RDF is usually treated
>>> as having its own special datatypes and the compatible XSD types as
>>> being the standard D, it is quite possible to use RDF with a larger D
>>> set, so that as new datatypes come along (eg geolocation datatypes,
>>> or time-interval datatypes, or physical unit datatypes, to mention
>>> three that I know have been suggested) and, presumably, get canonized
>>> by appropriate standards bodies (maybe not the W3C, though) for use
>>> by various communities, they can be smoothly incorporated into RDF
>>> data without a lot of fuss and without re-writing the RDF specs.
>>>
>>> Do we want to impose any conditions on this process? How can a reader
>>> of some RDF know which datatypes are being recognized by this RDF?
>>> What do we say about how to interpret a literal whose datatype IRI
>>> you don't recognize? Should it be OK to throw an error at that point,
>>> or should it *not* be OK to do that? Shouid we require that RDF
>>> extensions with larger D's only recognize IRIs that have been
>>> standardly specified in some way? How would we say this?
>>>
>>> The current semantic story is that a literal
>>> "foo"^^unknown:datatypeIRI  is (1) syntactically OK (2) not an error
>>> but (3) has no special meaning and is treated just like an unknown
>>> IRI, ie it presumably denotes something, but we don't know what. Is
>>> this good enough?
>>>
>>> Pat
>>>
>>> ------------------------------------------------------------ IHMC
>>> (850)434 8903 or (650)494 3973 40 South Alcaniz St.
>>> (850)202 4416   office Pensacola (850)202
>>> 4440   fax FL 32502                              (850)291 0667
>>> mobile phayesAT-SIGNihmc.us http://www.ihmc.us/users/phayes
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>
>
>
>

-- 
Antoine Zimmermann
ISCOD / LSTI - Institut Henri Fayol
École Nationale Supérieure des Mines de Saint-Étienne
158 cours Fauriel
42023 Saint-Étienne Cedex 2
France
Tél:+33(0)4 77 42 66 03
Fax:+33(0)4 77 42 66 66
http://zimmer.aprilfoolsreview.com/
Received on Thursday, 25 April 2013 15:05:53 UTC