- From: Dan Brickley <danbri@danbri.org>
- Date: Thu, 19 Feb 2009 04:04:36 +0100
- To: Manu Sporny <msporny@digitalbazaar.com>
- Cc: Ian Hickson <ian@hixie.ch>, RDFa mailing list <public-rdf-in-xhtml-tf@w3.org>
On 19/2/09 03:43, Manu Sporny wrote: > http://rdfa.info/wiki/developer-faq#How_does_one_prevent_bad_triples_from_corrupting_a_local_triple_store.3F I've just tweaked this to add: "Also note that the phrase "triple store" is somewhat dated. Practically all RDF storage systems (since RDFCore in 2004 and a few years before) have effectively been "quad stores". While the core RDF specs are described in terms of triples, database systems for managing triples have almost always kept track of the source (or "provenance") of each piece of data. For this reason when RDF's data access / query language, SPARQL, was created, it included within the language a mechanism for querying this extra information. This ability to explicitly represent (and query) the source of each RDF data graph gives some extra machinery for dealing with trust. We might, for example, pose a SPARQL query that was targetted only at graphs tagged as trusted. RDF stores are no longer a simplistic melting pot in which data from multiple sources gets indecipherably tangled." For a practical account from someone who added provenance, ie. quads, to an early RDF triple storage system that didn't (at the time) keep track of it, see http://www.ibm.com/developerworks/xml/library/x-rdfprov.html by Edd Dumbill. Excerpting here: """21 Jul 2003 When you start aggregating data from around the Web, keeping track of where it came from is vital. In this article, Edd Dumbill looks into the contexts feature of the Redland Resource Description Format (RDF) application framework and creates an RDF Site Summary (RSS) 1.0 aggregator as a demonstration. In Listing 6 of my second article on FOAF (see Resources), I demonstrated FOAFbot, a community support agent I wrote that aggregates people's FOAF files and answers questions about them. FOAFbot has the ability to record who said what about whom. When asked what my name was, FOAFbot responded: edd@xml.com's name is 'Edd Dumbill', according to Dave Beckett, Edd Dumbill, Jo Walsh, Kip Hampton, Matt Biddulph, Dan Brickley, and anonymous source Anon47 The idea behind FOAFbot is that if you can verify that a fact is recorded by several different people (whom you trust), you are more likely to believe it to be true. Here's another use for tracking provenance of such metadata. One of the major abuses of search engines early on in their history was meta tag spamming. Web sites would put false metadata into their pages to boost their search engine ranking. For this reason, search engines stopped paying attention to meta tags because they were most likely lies. Instead, search engines such as Google found other more sophisticated metrics to rank page relevance. Looking toward the future of the Web, it will become vital to avoid abuses such as meta tag spamming. Tim Berners-Lee's vision for a Semantic Web (see Resources) aims for a Web where most data is machine-readable, in order to automate much of the information processing currently done by humans. The potential difficulties of metadata abuse are even larger on the Semantic Web: A Web site would no longer be restricted to making claims only about itself. It could also make claims about other sites. It would be possible, for instance, for one bookstore to make false claims about the prices offered by a competitor. I won't go into detail on the various security and trust mechanisms that will prevent this sort of semantic vandalism, but I will focus on the foundation that will make them possible: tracking provenance.[...]""" Edd's article also mentions "(Incidentally, I owe a debt of gratitude to Dave Beckett, the creator of Redland. When I was writing FOAFbot last year, Redland did not have support for contexts, so I ended up implementing them in a very roundabout fashion. In response to my requests, Dave added support for contexts into his toolkit.)" ... worth repeating here, as it shows the way RDF toolkits have matured over the years in response to just the kind of practical concern Ian (and Edd) raises, around spam, trust and aggregation. Hope this helps with the use cases... cheers, Dan
Received on Thursday, 19 February 2009 03:05:18 UTC