Hi Søren & Antoine,

I do not think this needs named graphs, and it isn't really a quadruple (at least logically).
Sure, Søren's database table example has four columns, but two of them (Source stations.xml and Subject #322301) together try to describe a resource.

Why not identify this resource as "stations.xml#322301" in a single column?
This comes closer to a URI reference of the Subject.

But, what kind of source ist identified by "stations.xml"?
There may be millions of files named "stations.xml" all over the world (and how many of them on Søren 's file server?)
In Søren's document [1] we can read on page 7 as an example, that "Austria reports two new Airbase stations. .. They could store the information in an XML file called stations.xml".
I guess this is some agency from Austria, so the resource may be described as something like "someAgency.at/airBaseStations.xml#322301".

Further more, they report new stations, so there might be multiple historical versions of stations.xml.
Resource now becomes "someAgency.at/2009/airBaseStations.xml#322301".

Well, this is not yet a URI, the protocol is missing. So I just name it:

(For some reasons I would prefer to name it "http://someAgency.at/2009/airBaseStations/322301", but this is not important here).

Well, this does not resolve, I just invented it.
Actually, a URI reference of a RDF resource does not need to resolve.
This is something we expect from linking open data, but not from RDF in general.

As Søren is writing about plans for 2010, we might say that this Austrian agency should make plans to publish such stations using resolving URIs in 2010.

You may say this plan is not realistic, but we shoud express it.
Otherwise we get stuck in the inherited architecture.
And may be it is realistic, I discussed with some people from umweltbundesamt.at just one week ago ...

What I want to say is: if you identify the resource by a single URI reference then the table example would not need four columns, but only three.
And now the Subject identifier even includes the providing agency and a version, just like any URI reference in Semantic Web should.

The named graph pattern in this case makes everything more complicated than it is.

Something related about EEA-GEMET-JRC:
Søren says they established API and replication two years ago because they did not know better at that time.
Today Søren knows better. Why then do you make plans based on this depricated architecture?
Such plans will get you embroiled in redundancy and replication deaper and deaper, what a mess!
Why don't you make plans to establish a linked data architecture for a federated vocabulary EEA-GEMET-JRC?

Best wishes & regards,

[1] http://svn.eionet.europa.eu/repositories/Reportnet/docs/Plans%20for%20SEIS%20and%20Reportnet%202010.odt

Antoine Isaac schrieb:
Hello Søren,

I like these questions. The force me to sharpen my arguments, and they give ideas to improve our plans.

Great! Thanks for the anwers, it's very interesting to hear.

Yes, that's correct. We have a source on every triple. They serve two purposes. We know which triples to throw out when we do a reharvesting of a source. And they can be used to determine trustworthiness of a statement the same way a user with a webbrowser looks at the webpage's URL. I'm not very pleased with the last purpose, but it seems to be the only mechanism people can't lie about. I've heard about named graphs, but haven't figured out their purpose yet.

Well, they're kind of multi-purpose. Practically, they transform triples into quadruples, and thus allow to track the source of statements.
The problem is that they're not part of the official set of semantic web standards, even they are mentioned in SPARQL [1] and are implemented in one form or the other in almost all RDF stores.

About the more general provenance issue, you might be interested in following the work of the W3C provenance group, which has just been created [2] (nothing there, yet). I'd expect them formalize some interesting practices on those aspects...

We have our own database structure, we add inferred triples and as long as we only have subPropertyOf and subClassOf, it is manageable. We don't get an explosion of triples. But we know we're getting to the end of its capacities and yesterday we launched a study of Virtuoso and Jena. If they don't work, we'll look at some more.

I hope you'll find the good one! By the way, if you've not found them yet, there are some benchmarks available around, like [3]. They are very context-specific, though, I'd be curious to know whether such stuff is actually useful to real implementors.

I would actually prefer to be able to launch a distributed SPARQL query that automatically understood sameAs statements across servers, but I've not seen anybody advertising SPARQL as being able to do that.

Yes, for the moment it would be up to SPARQL endpoints to perform the appropriate distribution and aggregation of results. Needless to say, the efforts in that field (I know that [4] are working on this) at very early stage.

Additionally, some of our member organisations are writing their RDF in notepad. I don't think I could get them to set up a SPARQL service. We're also aware of the principle of following a resource URL with your webbrowser to see a factsheet of the resource, but it doesn't really work because we're mainly interested in bulk operations on resources.




[1] http://www.w3.org/TR/rdf-sparql-query/#rdfDataset
[2] http://www.w3.org/2005/Incubator/prov/
[3] http://www4.wiwiss.fu-berlin.de/bizer/BerlinSPARQLBenchmark/
[4] http://www.openrdf.org/

Thomas Bandholtz, thomas.bandholtz@innoq.com, http://www.innoq.com 
innoQ Deutschland GmbH, Halskestr. 17, D-40880 Ratingen, Germany
Phone: +49 228 9288490 Mobile: +49 178 4049387 Fax: +49 228 9288491