Re: RDFa Security and Persistence

On 19/2/09 03:43, Manu Sporny wrote:

> http://rdfa.info/wiki/developer-faq#How_does_one_prevent_bad_triples_from_corrupting_a_local_triple_store.3F

I've just tweaked this to add:

"Also note that the phrase "triple store" is somewhat dated. Practically 
all RDF storage systems (since RDFCore in 2004 and a few years before) 
have effectively been "quad stores". While the core RDF specs are 
described in terms of triples, database systems for managing triples 
have almost always kept track of the source (or "provenance") of each 
piece of data. For this reason when RDF's data access / query language, 
SPARQL, was created, it included within the language a mechanism for 
querying this extra information. This ability to explicitly represent 
(and query) the source of each RDF data graph gives some extra machinery 
for dealing with trust. We might, for example, pose a SPARQL query that 
was targetted only at graphs tagged as trusted. RDF stores are no longer 
a simplistic melting pot in which data from multiple sources gets 
indecipherably tangled."

For a practical account from someone who added provenance, ie. quads, to 
an early RDF triple storage system that didn't (at the time) keep track 
of it, see http://www.ibm.com/developerworks/xml/library/x-rdfprov.html 
by Edd Dumbill.

Excerpting here:

"""21 Jul 2003
When you start aggregating data from around the Web, keeping track of 
where it came from is vital. In this article, Edd Dumbill looks into the 
contexts feature of the Redland Resource Description Format (RDF) 
application framework and creates an RDF Site Summary (RSS) 1.0 
aggregator as a demonstration.

In Listing 6 of my second article on FOAF (see Resources), I 
demonstrated FOAFbot, a community support agent I wrote that aggregates 
people's FOAF files and answers questions about them. FOAFbot has the 
ability to record who said what about whom. When asked what my name was, 
FOAFbot responded:
edd@xml.com's name is 'Edd Dumbill',
according to Dave Beckett, Edd Dumbill,
Jo Walsh, Kip Hampton, Matt Biddulph,
Dan Brickley, and anonymous source Anon47

The idea behind FOAFbot is that if you can verify that a fact is 
recorded by several different people (whom you trust), you are more 
likely to believe it to be true.
Here's another use for tracking provenance of such metadata. One of the 
major abuses of search engines early on in their history was meta tag 
spamming. Web sites would put false metadata into their pages to boost 
their search engine ranking. For this reason, search engines stopped 
paying attention to meta tags because they were most likely lies. 
Instead, search engines such as Google found other more sophisticated 
metrics to rank page relevance.
Looking toward the future of the Web, it will become vital to avoid 
abuses such as meta tag spamming. Tim Berners-Lee's vision for a 
Semantic Web (see Resources) aims for a Web where most data is 
machine-readable, in order to automate much of the information 
processing currently done by humans.
The potential difficulties of metadata abuse are even larger on the 
Semantic Web: A Web site would no longer be restricted to making claims 
only about itself. It could also make claims about other sites. It would 
be possible, for instance, for one bookstore to make false claims about 
the prices offered by a competitor.
I won't go into detail on the various security and trust mechanisms that 
will prevent this sort of semantic vandalism, but I will focus on the 
foundation that will make them possible: tracking provenance.[...]"""

Edd's article also mentions "(Incidentally, I owe a debt of gratitude to 
Dave Beckett, the creator of Redland. When I was writing FOAFbot last 
year, Redland did not have support for contexts, so I ended up 
implementing them in a very roundabout fashion. In response to my 
requests, Dave added support for contexts into his toolkit.)" ... worth 
repeating here, as it shows the way RDF toolkits have matured over the 
years in response to just the kind of practical concern Ian (and Edd) 
raises, around spam, trust and aggregation.

Hope this helps with the use cases...

cheers,

Dan

Received on Thursday, 19 February 2009 03:05:18 UTC