Re: Datasets and contextual/temporal semantics from Dan Brickley on 2011-10-13 (public-rdf-wg@w3.org from October 2011)

From: Dan Brickley <danbri@danbri.org>
Date: Thu, 13 Oct 2011 15:21:13 +0100
To: Pat Hayes <phayes@ihmc.us>
Cc: Richard Cyganiak <richard@cyganiak.de>, RDF Working Group WG <public-rdf-wg@w3.org>
Message-ID: <CAFNgM+YE1Ld6iZdjYVQCGEuDw-L44PB1PAjt=e4XYJ389vORkQ@mail.gmail.com>
On 13 October 2011 14:29, Pat Hayes <phayes@ihmc.us> wrote:
> On Oct 13, 2011, at 6:10 AM, Richard Cyganiak wrote:

> Indeed, and that was DELIBERATE. A contextual logic (in the sense you are using it) simply does not work as a Web logic. For some discussion of this point, see  http://www.ihmc.us/users/phayes/IKL/GUIDE/GUIDE.html#LogicForInt . In fact, a contextual logic does not work for ontologies in general. If the truth of an assertion depends on the context in which it is asserted, and if this context is not available when it is read, then it is USELESS. Or maybe worse than useless.

Are you suggesting it is really practical and feasible for every
assertion to be so explicit as to never need a 'best-before' date?
Particularly in such a nuance-free language like RDF, I find this hard
to believe. We can go the slippery slope towards only ever describing
events, since their descriptions don't go stale, but in an open world
(where relevant facts may always be missing), the utility of having a
big pile of event descriptions is often questionable.

>> Many of our problems stem from that.
>>
>> I'll give examples.
>>
>>   :G2010 {:alice :age 29.}
>>   :G2011 {:alice :age 30.}
>>
>> Individually, each of those graphs are true (at a certain point in time). If taken together, an inconsistency is inferred (assuming :age is a functional property):
>>
>>   :alice :age 29, 30.
>>
>> By merging the two graphs, we have discarded the contextual information.
>
> In RDF, that "contextual information" was never there in the first place. This is BAD RDF.

You may as well call the Web "bad"; but it's not going away. And nor
is simple factual data published in Web pages --- a big use case for
our stuff.

Practical example: (repeating something just aired during the F2F/telecon)

 * in early FOAF stuff we tried to urge people towards
decontextualised data that won't go stale. So for example here, to
describe date of birth / events, rather than 'age'.
 * FOAF now has age? Why --- because Peter Mika asked for it, because
he was involved with sites (e.g. MySpace) who are publishing the 'age'
of users in HTML.
 * Should we be mailing MySpace and telling them to publish date/year
of birth instead of age? Maybe it'd be good for The Youth to be forced
to do more mental arithmetic? But standards != advocacy; we can't fix
the world from a committee.
 * with the rise of RDFa (and microformats, microdata etc) many
factual assertions will come from such (database-driven) sites.

So "bad RDF" is perhaps not the most helpful perspective here.

Is there any value in going from sites publishing stuff like
  <p>Dan is 39</p>
to
<p typeof="Person"><a href="http://danbri.org/" rel="homepage"
property="firstName">Dan</a> is <span rel="age">30</a></p> ?

... I think so. But it puts work onto the consumer of the data: we
need to remember where we got it. And maybe a whole pile of other info
too. Anyone doing data aggregation is familiar with such requirements,
even if they are hard to express in logical languages. This doesn't
make either bad; but we have work to do bridging between the logical
and data-hacking perspectives.

And maybe this also puts some work onto the RDF community: that we
should make some experiments (yes, research + hacking, not standards)
around annotating properties, to indicate that our property 'age' is
more """volatile""" than our property 'dateOfBirth'. And perhaps even
specifically that 'age' goes stale relatively quickly (in whatever
level of detail suits application demands). For some
as-yet-undocumented notion of """volatile""".

>> This shows that the graph merge operation is *not truth-preserving* – not *valid* in the formal sense – *if* the merged graphs have different contexts.
>
> No, it shows that they don't have contexts. Graph merging is truth preserving, precisely because RDF is *not* a contextual logic.

RDF is not a contextual logic; it is and should remain a simple minded
language that can be used to make fairly basic assertions about a/the
world. RDF's cartoon universe has no notion of time nor change.
However the people using RDF have to build systems that bridge this
simplified perspective back into our real lives, software
applications, ever-changing datasets etc., where time and change are
constantly messing with us.

This is (as I think Richard articulated quite nicely) at the heart of
our problem. RDF's worldview is super-super-simplified. To live with
this simplicity, we need some tricks, techniques and so on. What we
have to figure out, is which of those tricks and techniques are
(something like) data-hacking folklore and which can be specified
using the other instruments of W3C committee-dom, namely testcases,
computer languages, semantics specs and so on.

It will do us no good at all to just stand here and say "don't use
properties like 'age' ...". What we can say is "if you use properties
like 'age', ... consider managing and sharing your data with the
following conventions.".

This theme btw underpins some of my concerns with Sandro's advocacy
for a simple "I got these triples from this IRI" version of
WebArch-for-SemWeb. In too many real-world scenarios, we'll want to
keep a whole packet of information telling us where a bunch of data
came from. And it might have come from the same basic IRI several
times under varying circumstances. (Specs like
http://www.w3.org/TR/HTTP-in-RDF10/ are a good start at keeping that
"how I transacted with the Web, and what I got back" data diary.)

All this doesn't mean that data can only ever be considered in
contexts. Just that we need to get better, much better, at providing
all kinds of hints to help application developer and consuming apps
flatten things down from contextualised and quoted representations,
into simple flat truthy assertions. We will make different flattenings
under different circumstances, depending on risk scenarios, data
availability and other worldly constraints. This is natural and
healthy, and leaves RDF as simple propositional content while
admitting that there is (e.g. via SPARQL) a rich set of data
management practices around it that absolutely do need to deal
pragmatically with time, change and provenance.

cheers,

Dan
Received on Thursday, 13 October 2011 14:21:52 UTC