Re: long-range datatyping and rdfa/microdata

On 2011-06-10, at 08:03, Dan Brickley wrote:
...
>>> The idea is that many properties are deployed as if their values take
>>> string form, but we know from the schema that the values can be
>>> interpreted e.g. as integers or dates.
>>> 
>>> RDF's datatyping mechanism puts a lot of burden on instance data, and
>>> in some contexts (eg. Website markup) this can be problematic. So for
>>> example http://schema.org/docs/datamodel.html chooses Microdata over
>>> RDFa and lists 'datatypes' as one of the complexity burdens of RDFa
>>> markup.
>>> 
>>> In practice I don't think a lot of sites will enjoy marking up each
>>> property value occurence with a datatype, ... and so vocabulary
>>> designers are tending not to make datatyping explicit.
>>> 
>>> So for example in FOAF we have foaf:age, which Peter Mika originally asked for.
>>> 
>>> http://xmlns.com/foaf/0.1/#term_age "The age property is a
>>> relationship between a Agent and an integer string representing their
>>> age in years. "
>>> 
>>> This can be used in RDFa as so: <p>blah blah <span
>>> property="foaf:age">39</span> blah</p>.
>>> 
>>> If we try to persuade publishers to put datatype="xsd:integer"
>>> alongside each age, ... we'll have a hard time. So is there anything
>>> we can do at the schema level?  Mumble mumble range mumble...
>> 
>> Why not just define foaf:age so that its value is a string representing an integer, rather than an actual number? That is what the documentation cited above actually says :-)
> 
> Yes, that is exactly what we do for now. It seems preferable to the
> uglification of the instance data. But it leaves me with a creeping
> concern that we will be missing out on goodies such as ability for
> SPARQL stores to answer questions like "find people said to have age <
> 30". And presumably similar mechanisms in OWL.

I don't know about OWL, but in SPARQL you can write:

SELECT ?person
WHERE {
  ?person foaf:age ?age .
  FILTER(xsd:integer(?age) < 30)
}

Chances are that most stores don't optimise that (3store did, but it was unusual in a number of ways), but nothing stops you spotting that as a common pattern and building an index of 〈xsd:integer(object), subject〉 for the foaf:age predicate. 

I'd rather put the burden on the people consuming the data, than the people generating.

> One design here is simply to create a new documentation construct to
> make a triple pattern in RDFS/OWL that corresponds to writing things
> in English. Currently we say only in English "this property is written
> as a string, but that string can be cast to an integer". In its
> current form this deployment still is unfair on non-English speakers.
> An unsung value of machine readable schemas is that they provide
> multi-lingual documentation. A tool could scan through the RDFS/OWL
> for some vocabulary and generate an overview in any human natural
> language. This might seem trivial but it's a wonderful thing, and
> maybe more realistic than visions of intelligent agents scouring the
> 'net on our behalf. So I would like a pattern for "OK we write it as a
> string, but it can be cast to an integer" to be written in the RDFS
> somehow.
> 
> Note that this wouldn't ever change the values of foaf:age to be
> integers. Rather it tells consumers "Ok, if you use some new property
> of your choosing, you can usefully populate its values with the
> cast-to-integer values instead"; or server as a hint to databases /
> aggregators to build an integer index based on this content.

Note, in the SQL world you typically have to manually construct indexes. It's not the end of the world.

- Steve

-- 
Steve Harris, CTO, Garlik Limited
1-3 Halford Road, Richmond, TW10 6AW, UK
+44 20 8439 8203  http://www.garlik.com/
Registered in England and Wales 535 7233 VAT # 849 0517 11
Registered office: Thames House, Portsmouth Road, Esher, Surrey, KT10 9AD

Received on Friday, 10 June 2011 12:15:11 UTC