Re: Graphs and Being and Time from Dan Brickley on 2011-02-23 (public-rdf-wg@w3.org from February 2011)

From: Dan Brickley <danbri@danbri.org>
Date: Wed, 23 Feb 2011 21:43:59 +0100
To: Pat Hayes <phayes@ihmc.us>
Cc: public-rdf-wg@w3.org
Message-ID: <AANLkTi=L1PKsK0SWBmds-yERunZuAqRauhqyyOw_6EoN@mail.gmail.com>

On 23 February 2011 21:13, Pat Hayes <phayes@ihmc.us> wrote:
[...]
> What we need is the notion of a 'graph token' (or some other terminology: see below for more on terminology), meaning an actual representation of an RDF graph.
[...]

I like this thinking. A natural term here might be 'document'. I've
tended to approach things along lines that "documents [in certain
social/technical/protocol settings] express claims about the world".
Documents have concrete formats, and security-related characteristics
eg. they can be hashed. And beyond this, document types have
constraints that relate to communication needs rather than abstract
models of the world. In pure RDF/OWL we can say in general that people
have 'given names'; however when thinking about document types, things
are more imperative --- something is a properly shaped ShippingOrder
if it is a document that carries the right kinds of claim, for eg.
including some claims about the "given name" of its creator. A common
developer frustration with practical RDF and OWL is that it is so
passive and declarative and non-commital: our schemas and ontologies
are essentially dictionaries, and don't force data publishers to stick
to any particular content structures. You can always omit data, always
add data, ... RDF only really cares about the truth and not
contradicting yourself. Document-centric schema languages, by
contrast, have lots more ways of screwing up: missing data, missing
patterns of data, ... failings of communication rather than failings
of description. So these are legitimate, well grounded developer
expectations which RDF at the moment just bypasses. Instead of
relatively predictable doc format or OO structures, we offer only a
chaotic bundle of triples; unconstrained by any human or machine
protocol except in that they are gently nudged towards being truthful.

As Ed Dumbill put it (http://times.usefulinc.com/#13:13 via
http://danbri.org/words/page/27?sioc_type=user&sioc_id=22 )

"Processing RDF is therefore a matter of poking around in this graph.
Once a program has read in some RDF, it has a ball of spaghetti on its
hands. You may like to think of RDF in the same way as a hashtable
data structure — you can stick whatever you want in there, in whatever
order you want."

I think this goes to the heart of developer frustration with RDF.

We have had an awkward narrative gap since 1997 regarding how RDF
relates to XML and other document-oriented schema systems. By making
the notion of a document more explicit in the RDF specs, I think we
get a bit closer to bridging that gap and explaining how the two
approaches can be complementary. RDF lets us define abstract
structures for modeling the world; specific uses of RDF in document
formats encode a set of claims about the world. And other non-RDF doc
formats often do the same, often using schema languages that focus on
the document side rather than the abstract model. Following this line
of thinking we can rebuild some of the machinery of DTDs and XML
schema over the top of RDF's machinery, by defining types of RDF
document in terms of the patterns of claims it encodes.

Ok that's a bit much to 'standardise' but it's my sense for where
we're collectively heading anyway. Just as practical RDF systems
always invent some way of keeping track of where claims/triples came
from, they often have some ad hoc mechanisms for poking around in some
RDF graph instance to see what kind of a description it is: is it an
addressbook description, info about a calendar, some chunk of a
thesaurus or perhaps some claims about a geographical point of
interest? By making the notion of document explicit in RDF, we open
things up for these unarticulated but useful concepts to be used in
code, documentation and data. Without it, we have this kind of awkward
situation where we have documents that try to ignore the 'bunch of
bytes' side of their character and pretend to be purely mathematical
entities. RDF in practice has these two sides to its personality, and
I think we'll do better explaining them if we name and describe them
in the specs...

cheers,

Dan

Received on Wednesday, 23 February 2011 20:44:34 UTC