Re: long-range datatyping and rdfa/microdata from Dan Brickley on 2011-06-10 (public-rdf-wg@w3.org from June 2011)

From: Dan Brickley <danbri@danbri.org>
Date: Fri, 10 Jun 2011 09:03:11 +0200
To: Pat Hayes <phayes@ihmc.us>
Cc: RDF WG <public-rdf-wg@w3.org>
Message-ID: <BANLkTin7CCOHJ6C8U9TXEx7y+Z+9FExshw@mail.gmail.com>
On 10 June 2011 03:32, Pat Hayes <phayes@ihmc.us> wrote:
> On Jun 8, 2011, at 11:02 AM, Dan Brickley wrote:
>> Firstly, apologies I couldn't make today's call. I've spent my RDF'ing
>> time this week talking to a lot of people about schema.org,
>> rdfa/microdata etc.
>>
>> I want to bring something up  related to that: back in RDFCore WG we
>> called it "long range" data-typing, but didn't figure out a way to
>> make it work.
>
> I often tell people who say disparaging or condescending things about standardization that it took the RDFCore WG longer to decide how to write the number three than it takes to make a baby. One of the main reasons for this was trying something like a dozen different ways to try to make this 'long-range' datatyping work, none of which turned out to be viable for one reason or another.

(Well, that state of affairs could also be evidence in support of
saying disparaging things about 'committee design'. But anyway... :)

>> I'd appreciate if someone could articulate the connection to current discussion on literals
>
> I dont think it has any particular connection with the current discussion.

Ok. Sometimes a little change nearby in spec-space can have
unanticipated consequences elsewhere. For example perhaps also OWL2 is
more tolerant of certain scruffy arrangements than OWL1 is, making
people more tolerant of having mixed properties (foaf:age sometimes
carrying strings, sometimes integers). Or perhaps SPARQL 1.1 has some
relevant improvements / changes too? It's worth doing a quick mental
scan at least. Assuming we can remember the design choices from
2003/4. At least they are all publically archived, even if spread over
100s of mails...

>> , and suggest if there are ways we could make it work in 2011.
>
> I don't see why it should be any easier now than it was back then.

Yup, I think the core of the problem remains: what happens if the
typing information "falls off" and gets detached.

>> The idea is that many properties are deployed as if their values take
>> string form, but we know from the schema that the values can be
>> interpreted e.g. as integers or dates.
>>
>> RDF's datatyping mechanism puts a lot of burden on instance data, and
>> in some contexts (eg. Website markup) this can be problematic. So for
>> example http://schema.org/docs/datamodel.html chooses Microdata over
>> RDFa and lists 'datatypes' as one of the complexity burdens of RDFa
>> markup.
>>
>> In practice I don't think a lot of sites will enjoy marking up each
>> property value occurence with a datatype, ... and so vocabulary
>> designers are tending not to make datatyping explicit.
>>
>> So for example in FOAF we have foaf:age, which Peter Mika originally asked for.
>>
>> http://xmlns.com/foaf/0.1/#term_age "The age property is a
>> relationship between a Agent and an integer string representing their
>> age in years. "
>>
>> This can be used in RDFa as so: <p>blah blah <span
>> property="foaf:age">39</span> blah</p>.
>>
>> If we try to persuade publishers to put datatype="xsd:integer"
>> alongside each age, ... we'll have a hard time. So is there anything
>> we can do at the schema level?  Mumble mumble range mumble...
>
> Why not just define foaf:age so that its value is a string representing an integer, rather than an actual number? That is what the documentation cited above actually says :-)

Yes, that is exactly what we do for now. It seems preferable to the
uglification of the instance data. But it leaves me with a creeping
concern that we will be missing out on goodies such as ability for
SPARQL stores to answer questions like "find people said to have age <
30". And presumably similar mechanisms in OWL.

One design here is simply to create a new documentation construct to
make a triple pattern in RDFS/OWL that corresponds to writing things
in English. Currently we say only in English "this property is written
as a string, but that string can be cast to an integer". In its
current form this deployment still is unfair on non-English speakers.
An unsung value of machine readable schemas is that they provide
multi-lingual documentation. A tool could scan through the RDFS/OWL
for some vocabulary and generate an overview in any human natural
language. This might seem trivial but it's a wonderful thing, and
maybe more realistic than visions of intelligent agents scouring the
'net on our behalf. So I would like a pattern for "OK we write it as a
string, but it can be cast to an integer" to be written in the RDFS
somehow.

Note that this wouldn't ever change the values of foaf:age to be
integers. Rather it tells consumers "Ok, if you use some new property
of your choosing, you can usefully populate its values with the
cast-to-integer values instead"; or server as a hint to databases /
aggregators to build an integer index based on this content.

>> Pat - can you remember why we couldn't make this work in the semantics
>> last time?
>
> The chief problem, as I recollect, was nonmonotonicity. If you just have a plain literal as a property value, it denotes a string. But if you add a triple assigning a datatype to the property range, that plain literal isn't plain any more, and it denotes something different. That breaks the underlying RDF semantic model.
>
> There are ways we can try to wiggle around this. We could for example say that
>
> :a :p "string" .
> :p rdfs:range :Type .
>
> *entails*
>
> :a :p "string"^^:Type .
>
> ie this is true **in addition** to the first triple, rather than replacing or re-interpreting it. I suspect this might not fly, however.

That's close to my sketch above, except I was thinking we'd make up a
new documentation property rather than put more work onto rdfs:range.

> I think I may have some old notes buried somewhere on the various ideas that were tried. If the WG thinks this can is worth re-opening, I could try to dig them out.

I'd be interested to see those notes, at least.

Oh and here is an entirely different design, that handles datatyping
as pre-processing instead (ie. follows Richard's advice to bury the
problem in syntactic sugar):

RDFa 1.1 has the notion of a profile. This is a document that says
"ok, in this profile, we will use 'title' to mean
'http://purl.org/dc/elements/title' and 'name' to mean
'http://xmlns.com/foaf/0.1/name'", so that instance data in RDFa can
just use "property='name'" or "property='title'" and leave it to the
profile author to handle the long boring URIs.

So if the profile could also do the work of remembering the datatype
annotations, it could express that users of the profile who write
'age' are expanding to the RDF property
'http://xmlns.com/foaf/0.1/age', and that it is of datatype integer.
I've brought this up with RDFa folk, hope to find out if it works and
report back.

cheers,

Dan
Received on Friday, 10 June 2011 07:03:39 UTC