W3C home > Mailing lists > Public > public-rdf-wg@w3.org > June 2011

Re: long-range datatyping and rdfa/microdata

From: Pat Hayes <phayes@ihmc.us>
Date: Thu, 9 Jun 2011 21:05:08 -0500
Cc: public-rdf-wg@w3.org
Message-Id: <71715816-D99C-46A1-A754-03A7A63B0EBB@ihmc.us>
To: antoine.zimmermann@insa-lyon.fr
Your analysis is exactly on the mark. 

On Jun 9, 2011, at 4:01 AM, Antoine Zimmermann wrote:

> Right, the fact that any single data value has to be explicitly typed is a problem in RDF.
> 
> Often, one would like to write:
> 
> ex:prop rdfs:range xsd:decimal .
> ex:sub ex:prop "42" .
> 
> and infer that "42" is a decimal number. However, what one gets from these two triples is that "42" is a sequence of 2 characters AND a decimal, which is inconsistent.

This is clear in this example. But in general, a property range is a class, not a datatype. What happens when the range is just a class and has no associated L2V mapping? Also, a property can have many ranges. What happens if two of them are datatype classes? Which one gets control over the interpretation of the literal string?  Also, a range can be inferred from other data, and not explicitly stated in a triple. Does it also re-interpret literals strings under these circumstances?

> 
> Overcoming this in the RDF data model is really hard. The problem is that literals are universal identifiers, just like URIs. So "42", no matter where it appears, is always identifying the same thing, namely the sequence of characters '4' and '2'.
> If "42" could be interpreted as a decimal, then it would be a decimal everywere, for everybody.
> 
> So the following would not be possible:
> 
> ex:prop rdfs:range xsd:decimal .
> ex:sub ex:prop "42" .
> ex:password rdfs:range xsd:string .
> ex:sub2 ex:password "42" .
> 
> Here, certainly some people would expect the first "42" to be denoting the number, while the second is just two characters. But this implicitly assumes that the denotation of literals is contextual: it would depend on which predicate is used in the triple. While it would be possible, in principle, to define a language where this makes sense, it does not fit at all with the RDF data model.

It *could* be made to fit, with some work. But the result is very fragile. For example, it is no longer the case that merging two graphs gives a well-formed graph, since they might provide clashing ranges for a single property, resulting in a graph that cannot be interpreted consistently. 

> 
> One way of addressing this issue would be to consider "42" as syntactic sugar for a typed literal with an "undefined type", which I could represent like this:
> 
> ex:sub ex:prop "42"^^[] .
> 
> But this would mean that the following graph serialisation:
> 
> ex:sub ex:prop "42" .
> ex:sub ex:prop "42" .
> 
> effectively contains 2 distinct triples, not 1.
> I doubt this is the direction we want to take.

Well, we could alleviate this by having a special 'blank datatype' syntax, say a prefixed ^, so that "42" is the string, but ^"42" is undefined, and takes its type from the property of any triple it occurs in. But this still does not deal with the complications outlined above. 

The real problem is that all these coercion ideas work quite well when the graphs are simple and tidy, with a single range defined which is a datatype and so on; but they fall apart when the graph is less tidy or simple. So either we impose fairly drastic well-formedness conditions on graphs (which now have to be checked for legality almost every time any triple is added or removed) or else we have to allow that there will be graphs that simply do not make semantic sense. Either route seems kind of un-RDF-ish. 

Pat

> 
> 
> AZ.
> 
> Le 08/06/2011 18:02, Dan Brickley a écrit :
>> Hi folks
>> 
>> Firstly, apologies I couldn't make today's call. I've spent my RDF'ing
>> time this week talking to a lot of people about schema.org,
>> rdfa/microdata etc.
>> 
>> I want to bring something up  related to that: back in RDFCore WG we
>> called it "long range" data-typing, but didn't figure out a way to
>> make it work. I'd appreciate if someone could articulate the
>> connection to current discussion on literals, and suggest if there are
>> ways we could make it work in 2011.
>> 
>> The idea is that many properties are deployed as if their values take
>> string form, but we know from the schema that the values can be
>> interpreted e.g. as integers or dates.
>> 
>> RDF's datatyping mechanism puts a lot of burden on instance data, and
>> in some contexts (eg. Website markup) this can be problematic. So for
>> example http://schema.org/docs/datamodel.html chooses Microdata over
>> RDFa and lists 'datatypes' as one of the complexity burdens of RDFa
>> markup.
>> 
>> In practice I don't think a lot of sites will enjoy marking up each
>> property value occurence with a datatype, ... and so vocabulary
>> designers are tending not to make datatyping explicit.
>> 
>> So for example in FOAF we have foaf:age, which Peter Mika originally asked for.
>> 
>> http://xmlns.com/foaf/0.1/#term_age "The age property is a
>> relationship between a Agent and an integer string representing their
>> age in years. "
>> 
>> This can be used in RDFa as so:<p>blah blah<span
>> property="foaf:age">39</span>  blah</p>.
>> 
>> If we try to persuade publishers to put datatype="xsd:integer"
>> alongside each age, ... we'll have a hard time. So is there anything
>> we can do at the schema level?  Mumble mumble range mumble...
>> 
>> Pat - can you remember why we couldn't make this work in the semantics
>> last time?
>> 
>> cheers,
>> 
>> Dan
>> 
>> (another possibility is to do something in RDFa's profile mechanism,
>> http://www.w3.org/TR/rdfa-core/#s_profiles )
>> 
> 
> 
> -- 
> Antoine Zimmermann
> Researcher at:
> Laboratoire d'InfoRmatique en Image et Systèmes d'information
> Database Group
> 7 Avenue Jean Capelle
> 69621 Villeurbanne Cedex
> France
> Tel: +33(0)4 72 43 61 74 - Fax: +33(0)4 72 43 87 13
> Lecturer at:
> Institut National des Sciences Appliquées de Lyon
> 20 Avenue Albert Einstein
> 69621 Villeurbanne Cedex
> France
> antoine.zimmermann@insa-lyon.fr
> http://zimmer.aprilfoolsreview.com/
> 
> 

------------------------------------------------------------
IHMC                                     (850)434 8903 or (650)494 3973   
40 South Alcaniz St.           (850)202 4416   office
Pensacola                            (850)202 4440   fax
FL 32502                              (850)291 0667   mobile
phayesAT-SIGNihmc.us       http://www.ihmc.us/users/phayes
Received on Friday, 10 June 2011 02:05:41 GMT

This archive was generated by hypermail 2.3.1 : Tuesday, 26 March 2013 16:25:44 GMT