draft use case for named graphs from FOAF work

From: Dan Brickley <danbri@danbri.org> · Date: Wed, 16 Mar 2011 17:17:20 +0100

This is a draft of a FOAF use case regarding named graphs. In current
form it mostly 'here is some
stuff we tried', but points towards details that could be worked out.

http://www.w3.org/2011/rdf-wg/track/actions/4

I think this closes my immediate action... ie. 'Draft a use case for
named graphs from FOAF work'

cheers,

Dan

--------- 8< -------------

This is a *draft* of a use case from FOAF project regarding named
graphs. At this stage it
outlines past experience, rather than constituting specific
requirements for new standards.

Background

Since 2000, a network of cross-referenced "FOAF files" have been published
in the public Web. These are typically in RDF/XML, and typically
described people and associated
entities (groups, documents, images). Initially these were hand-crafted, then
more were generated using utilities such as foaf-a-matic, and since
2003 many more have
been automatically published from social network sites such as
LiveJournal, My Opera, Hi5.

Since the earliest FOAF aggregators, it became important to indicate
the source, provenance and
authorship of FOAF-based RDF documents. Several mechanisms have been explored.

Signed RDF

Provenance via PGP-signed documents, rdfweb-dev list (now foaf-dev),
Mon Aug 21 01:07:26 UTC 2000
http://lists.foaf-project.org/pipermail/foaf-dev/2000-August/004213.html

In these experiments, two things were explored:

 1. representing the 'who signed whose key' from the PGP 'web of trust' in RDF.
 2. representing in RDF the claim that some (typically but not
necessarily RDF-based) document had been signed.

The latter practice was explored in more detail. Neither became widely adopted.

The central idiom used (see http://xmlns.com/wot/0.1/ WOT vocabulary)
was for a document to itself include a
wot:assurance link that pointed from its own URI to the URI of a
document that was emitted from PGP after signature.

So for example, Dan's doc http://danbri.org/foaf.rdf might have a
wot:assurance link to http://danbri.org/foaf.rdf.asc

Edd Dumbill documents this early work at
http://usefulinc.com/foaf/signingFoafFiles and in his second IBM
Developer Works
article, at http://www.ibm.com/developerworks/xml/library/x-foaf2.html

Edd Dumbill
01 Aug 2002
XML Watch: Support online communities with FOAF
How the Friend-of-a-Friend vocabulary addresses issues of
accountability and privacy

... for example:

<foaf:Person>
  <foaf:name>Edd Dumbill</foaf:name>
  <foaf:mbox rdf:resource="mailto:edd@xml.com" />
  <!-- personal, PGP signed, details here -->
  <rdfs:seeAlso>
    <rdf:Description rdf:about="http://example.org/edd/personal.rdf">
       <wot:assurance rdf:resource="http://example.org/edd/personal.rdf.asc" />
    </rdf:Description>
  </rdfs:seeAlso>
</rdf:Person>

Edd's article also explores the requirements around storing this kind of data.

Note that at this time, RDF query had not been standardised, and most
systems offered only basic triple-pattern matching. Further
that typical RDF storage systems did not include any quads mechanism.
Edd's article shows how he implemented a provenance system
on top of Dave Beckett's Redland DB (indicating authorship / source of
triples, results of PGP checking etc). Subsequent changes to
Redland made it possible to use its built-in provenance mechanism, and
standardisation of SPARQL allowed queries to be expressed that
make constraints against the source of triples.

Recent developments: WebID / FOAF+SSL

The FOAF "Web of Trust" Vocabulary is an early experiment, and ripe
for revision, particularly to address new work around FOAF+SSL aka
WebID
protocol, which uses the X509 family of technology rather than OpenPGP
/ GPG.  Some wiki-style notes towards this revision are
at http://openetherpad.org/fXuqiu8nem

W3C has chartered a WebID incubator group at
http://www.w3.org/2005/Incubator/webid/

Related more general discussions continue on the foaf-protocols list,
see http://lists.foaf-project.org/mailman/listinfo/foaf-protocols

Drafting Requirements

It may be that we have no specific technical requirements. This draft
is a move towards writing down some detailed scenarios
to determine what we need.

Firstly, SPARQL in current form should be enough for us to ask
questions that relate to information provenance.

Simple case:

"How old is Dan, according to people who are his colleagues" can be
expressed in SPARQL.

The PGP/crypto-related techniques above can be used to add assurance
to certain layers of data from the SPARQL store.

A W3C spec for Named Graphs beyond SPARQL might allow us to serialize
complex datasets that interconnect claims with supporting evidence
about those claims.

Working through the "How old is Dan, according to people who are his
colleagues" case:

1. We want values ?y for foaf:age property of the thing ?x whose
foaf:homepage is http://danbri.org/, as asserted by people who have a
foaf:workplaceHomepage ?h with the same
as the current true value of that property.

2. We want to serialize to some standard form our entire repository of
relevant information, including who-said-what metadata, such that it
could be reconstituted
elsewhere and the same query be successfully run.

3. We want where possible to cryptographically assure this kind of
activity [vague,...].

A version of the original FOAF requirements draft, from 2000 is
available at http://www.foaf-project.org/original-intro and has some
relevant use cases:

" While RDF is defined in terms of a rather abstract information
model, our needs are rather practical. We want to be able to ask the
Web sensible questions and common kinds of thing (documents,
organisations, people) and get back sensible results.
"Find me today's web page recommendations made by people who work for
Medical organisations".
"Find me recent publications by people I've co-authored documents with."
"Show me critiques of this web page, and the home pages of the author
of that critique""

Although old, these original use cases are not yet fully met by the
current Semantic Web landscape. They may be worth rethinking, but for
now these are
offered as the draft RDF WG FOAF 'named graphs' use case: keeping
track of several sources that combined can answer such queries.

* The very earliest FOAF aggregators made the simplifying assumption
that we could believe all triples, and that they could be harmlessly
merged.
* This soon proved ineffective. The next phase of FOAF aggregators
partitioned triples by source, but also tended to believe each
publisher (often
 using claims expressed in terms of the foaf:PersonalProfileDocument class)
* recent Social Web trends (openid/oauth), FOAF+SSL and the original
wot:assurance work point towards stronger checking of document claims
and provenance.

Expectation here is that RDF WG named graph mechanism will make it
possible for aggregates of FOAF-related RDF to be shared in standard
form, as well as what
we can do now which is expose them via SPARQL.

TODO: work through the dan/age or 2000-era cases with full test cases.