Re: RDFa Security and Persistence from Dan Brickley on 2009-02-19 (public-rdf-in-xhtml-tf@w3.org from February 2009)

From: Dan Brickley <danbri@danbri.org>
Date: Thu, 19 Feb 2009 04:04:36 +0100
To: Manu Sporny <msporny@digitalbazaar.com>
Cc: Ian Hickson <ian@hixie.ch>, RDFa mailing list <public-rdf-in-xhtml-tf@w3.org>
Message-ID: <499CCC44.9030705@danbri.org>

On 19/2/09 03:43, Manu Sporny wrote:

> http://rdfa.info/wiki/developer-faq#How_does_one_prevent_bad_triples_from_corrupting_a_local_triple_store.3F

I've just tweaked this to add:

"Also note that the phrase "triple store" is somewhat dated. Practically
all RDF storage systems (since RDFCore in 2004 and a few years before)
have effectively been "quad stores". While the core RDF specs are
described in terms of triples, database systems for managing triples
have almost always kept track of the source (or "provenance") of each
piece of data. For this reason when RDF's data access / query language,
SPARQL, was created, it included within the language a mechanism for
querying this extra information. This ability to explicitly represent
(and query) the source of each RDF data graph gives some extra machinery
for dealing with trust. We might, for example, pose a SPARQL query that
was targetted only at graphs tagged as trusted. RDF stores are no longer
a simplistic melting pot in which data from multiple sources gets
indecipherably tangled."

For a practical account from someone who added provenance, ie. quads, to
an early RDF triple storage system that didn't (at the time) keep track
of it, see http://www.ibm.com/developerworks/xml/library/x-rdfprov.html
by Edd Dumbill.

Excerpting here:

"""21 Jul 2003
When you start aggregating data from around the Web, keeping track of
where it came from is vital. In this article, Edd Dumbill looks into the
contexts feature of the Redland Resource Description Format (RDF)
application framework and creates an RDF Site Summary (RSS) 1.0
aggregator as a demonstration.

In Listing 6 of my second article on FOAF (see Resources), I
demonstrated FOAFbot, a community support agent I wrote that aggregates
people's FOAF files and answers questions about them. FOAFbot has the
ability to record who said what about whom. When asked what my name was,
FOAFbot responded:
edd@xml.com's name is 'Edd Dumbill',
according to Dave Beckett, Edd Dumbill,
Jo Walsh, Kip Hampton, Matt Biddulph,
Dan Brickley, and anonymous source Anon47

The idea behind FOAFbot is that if you can verify that a fact is
recorded by several different people (whom you trust), you are more
likely to believe it to be true.
Here's another use for tracking provenance of such metadata. One of the
major abuses of search engines early on in their history was meta tag
spamming. Web sites would put false metadata into their pages to boost
their search engine ranking. For this reason, search engines stopped
paying attention to meta tags because they were most likely lies.
Instead, search engines such as Google found other more sophisticated
metrics to rank page relevance.
Looking toward the future of the Web, it will become vital to avoid
abuses such as meta tag spamming. Tim Berners-Lee's vision for a
Semantic Web (see Resources) aims for a Web where most data is
machine-readable, in order to automate much of the information
processing currently done by humans.
The potential difficulties of metadata abuse are even larger on the
Semantic Web: A Web site would no longer be restricted to making claims
only about itself. It could also make claims about other sites. It would
be possible, for instance, for one bookstore to make false claims about
the prices offered by a competitor.
I won't go into detail on the various security and trust mechanisms that
will prevent this sort of semantic vandalism, but I will focus on the
foundation that will make them possible: tracking provenance.[...]"""

Edd's article also mentions "(Incidentally, I owe a debt of gratitude to
Dave Beckett, the creator of Redland. When I was writing FOAFbot last
year, Redland did not have support for contexts, so I ended up
implementing them in a very roundabout fashion. In response to my
requests, Dave added support for contexts into his toolkit.)" ... worth
repeating here, as it shows the way RDF toolkits have matured over the
years in response to just the kind of practical concern Ian (and Edd)
raises, around spam, trust and aggregation.

Hope this helps with the use cases...

cheers,

Dan

Received on Thursday, 19 February 2009 03:05:18 UTC