- From: Dan Brickley <danbri@danbri.org>
- Date: Fri, 10 Jun 2011 09:03:11 +0200
- To: Pat Hayes <phayes@ihmc.us>
- Cc: RDF WG <public-rdf-wg@w3.org>
On 10 June 2011 03:32, Pat Hayes <phayes@ihmc.us> wrote: > On Jun 8, 2011, at 11:02 AM, Dan Brickley wrote: >> Firstly, apologies I couldn't make today's call. I've spent my RDF'ing >> time this week talking to a lot of people about schema.org, >> rdfa/microdata etc. >> >> I want to bring something up related to that: back in RDFCore WG we >> called it "long range" data-typing, but didn't figure out a way to >> make it work. > > I often tell people who say disparaging or condescending things about standardization that it took the RDFCore WG longer to decide how to write the number three than it takes to make a baby. One of the main reasons for this was trying something like a dozen different ways to try to make this 'long-range' datatyping work, none of which turned out to be viable for one reason or another. (Well, that state of affairs could also be evidence in support of saying disparaging things about 'committee design'. But anyway... :) >> I'd appreciate if someone could articulate the connection to current discussion on literals > > I dont think it has any particular connection with the current discussion. Ok. Sometimes a little change nearby in spec-space can have unanticipated consequences elsewhere. For example perhaps also OWL2 is more tolerant of certain scruffy arrangements than OWL1 is, making people more tolerant of having mixed properties (foaf:age sometimes carrying strings, sometimes integers). Or perhaps SPARQL 1.1 has some relevant improvements / changes too? It's worth doing a quick mental scan at least. Assuming we can remember the design choices from 2003/4. At least they are all publically archived, even if spread over 100s of mails... >> , and suggest if there are ways we could make it work in 2011. > > I don't see why it should be any easier now than it was back then. Yup, I think the core of the problem remains: what happens if the typing information "falls off" and gets detached. >> The idea is that many properties are deployed as if their values take >> string form, but we know from the schema that the values can be >> interpreted e.g. as integers or dates. >> >> RDF's datatyping mechanism puts a lot of burden on instance data, and >> in some contexts (eg. Website markup) this can be problematic. So for >> example http://schema.org/docs/datamodel.html chooses Microdata over >> RDFa and lists 'datatypes' as one of the complexity burdens of RDFa >> markup. >> >> In practice I don't think a lot of sites will enjoy marking up each >> property value occurence with a datatype, ... and so vocabulary >> designers are tending not to make datatyping explicit. >> >> So for example in FOAF we have foaf:age, which Peter Mika originally asked for. >> >> http://xmlns.com/foaf/0.1/#term_age "The age property is a >> relationship between a Agent and an integer string representing their >> age in years. " >> >> This can be used in RDFa as so: <p>blah blah <span >> property="foaf:age">39</span> blah</p>. >> >> If we try to persuade publishers to put datatype="xsd:integer" >> alongside each age, ... we'll have a hard time. So is there anything >> we can do at the schema level? Mumble mumble range mumble... > > Why not just define foaf:age so that its value is a string representing an integer, rather than an actual number? That is what the documentation cited above actually says :-) Yes, that is exactly what we do for now. It seems preferable to the uglification of the instance data. But it leaves me with a creeping concern that we will be missing out on goodies such as ability for SPARQL stores to answer questions like "find people said to have age < 30". And presumably similar mechanisms in OWL. One design here is simply to create a new documentation construct to make a triple pattern in RDFS/OWL that corresponds to writing things in English. Currently we say only in English "this property is written as a string, but that string can be cast to an integer". In its current form this deployment still is unfair on non-English speakers. An unsung value of machine readable schemas is that they provide multi-lingual documentation. A tool could scan through the RDFS/OWL for some vocabulary and generate an overview in any human natural language. This might seem trivial but it's a wonderful thing, and maybe more realistic than visions of intelligent agents scouring the 'net on our behalf. So I would like a pattern for "OK we write it as a string, but it can be cast to an integer" to be written in the RDFS somehow. Note that this wouldn't ever change the values of foaf:age to be integers. Rather it tells consumers "Ok, if you use some new property of your choosing, you can usefully populate its values with the cast-to-integer values instead"; or server as a hint to databases / aggregators to build an integer index based on this content. >> Pat - can you remember why we couldn't make this work in the semantics >> last time? > > The chief problem, as I recollect, was nonmonotonicity. If you just have a plain literal as a property value, it denotes a string. But if you add a triple assigning a datatype to the property range, that plain literal isn't plain any more, and it denotes something different. That breaks the underlying RDF semantic model. > > There are ways we can try to wiggle around this. We could for example say that > > :a :p "string" . > :p rdfs:range :Type . > > *entails* > > :a :p "string"^^:Type . > > ie this is true **in addition** to the first triple, rather than replacing or re-interpreting it. I suspect this might not fly, however. That's close to my sketch above, except I was thinking we'd make up a new documentation property rather than put more work onto rdfs:range. > I think I may have some old notes buried somewhere on the various ideas that were tried. If the WG thinks this can is worth re-opening, I could try to dig them out. I'd be interested to see those notes, at least. Oh and here is an entirely different design, that handles datatyping as pre-processing instead (ie. follows Richard's advice to bury the problem in syntactic sugar): RDFa 1.1 has the notion of a profile. This is a document that says "ok, in this profile, we will use 'title' to mean 'http://purl.org/dc/elements/title' and 'name' to mean 'http://xmlns.com/foaf/0.1/name'", so that instance data in RDFa can just use "property='name'" or "property='title'" and leave it to the profile author to handle the long boring URIs. So if the profile could also do the work of remembering the datatype annotations, it could express that users of the profile who write 'age' are expanding to the RDF property 'http://xmlns.com/foaf/0.1/age', and that it is of datatype integer. I've brought this up with RDFa folk, hope to find out if it works and report back. cheers, Dan
Received on Friday, 10 June 2011 07:03:39 UTC