Hi Søren & Antoine,
I do not think this needs named graphs, and it isn't really a quadruple
(at least logically).
Sure, Søren's database table example has four columns, but two
of them (Source stations.xml and Subject #322301)
together try to describe a resource.
Why not identify this resource as "stations.xml#322301" in a
This comes closer to a URI reference of the Subject.
But, what kind of source ist identified by "stations.xml"?
There may be millions of files named "stations.xml" all over the world
(and how many of them on Søren 's file server?)
In Søren's document  we can read on page 7 as an example, that
"Austria reports two new Airbase stations. .. They could store the
information in an XML file called stations.xml".
I guess this is some agency from Austria, so the resource may
be described as something like "someAgency.at/airBaseStations.xml#322301".
Further more, they report new stations, so there might be
multiple historical versions of stations.xml.
Resource now becomes "someAgency.at/2009/airBaseStations.xml#322301".
Well, this is not yet a URI, the protocol is missing. So I just name it:
(For some reasons I would prefer to name it
"http://someAgency.at/2009/airBaseStations/322301", but this is not
Well, this does not resolve, I just invented it.
Actually, a URI reference of a RDF resource does not need to resolve.
This is something we expect from linking open data, but not from RDF in
As Søren is writing about plans for 2010, we might say that
this Austrian agency should make plans to publish such stations using
resolving URIs in 2010.
You may say this plan is not realistic, but we shoud express it.
Otherwise we get stuck in the inherited architecture.
And may be it is realistic, I discussed with some people from
umweltbundesamt.at just one week ago ...
What I want to say is: if you identify the resource by a single URI
reference then the table example would not need four columns, but only
And now the Subject identifier even includes the providing agency and a
version, just like any URI reference in Semantic Web should.
The named graph pattern in this case makes everything more complicated
than it is.
Something related about EEA-GEMET-JRC:
Søren says they established API and replication two years ago because
they did not know better at that time.
Today Søren knows better. Why then do you make plans based on this
Such plans will get you embroiled in redundancy and replication deaper
and deaper, what a mess!
Why don't you make plans to establish a linked data architecture for a
federated vocabulary EEA-GEMET-JRC?
Best wishes & regards,
Antoine Isaac schrieb:
I like these questions. The force me to
sharpen my arguments, and they give ideas to improve our plans.
Great! Thanks for the anwers, it's very interesting to hear.
Yes, that's correct. We have a source on
every triple. They serve two purposes. We know which triples to throw
out when we do a reharvesting of a source. And they can be used to
determine trustworthiness of a statement the same way a user with a
webbrowser looks at the webpage's URL. I'm not very pleased with the
last purpose, but it seems to be the only mechanism people can't lie
about. I've heard about named graphs, but haven't figured out their
Well, they're kind of multi-purpose. Practically, they transform
triples into quadruples, and thus allow to track the source of
The problem is that they're not part of the official set of semantic
web standards, even they are mentioned in SPARQL  and are
implemented in one form or the other in almost all RDF stores.
About the more general provenance issue, you might be interested in
following the work of the W3C provenance group, which has just been
created  (nothing there, yet). I'd expect them formalize some
interesting practices on those aspects...
We have our own database structure, we add
inferred triples and as long as we only have subPropertyOf and
subClassOf, it is manageable. We don't get an explosion of triples. But
we know we're getting to the end of its capacities and yesterday we
launched a study of Virtuoso and Jena. If they don't work, we'll look
at some more.
I hope you'll find the good one! By the way, if you've not found them
yet, there are some benchmarks available around, like . They are
very context-specific, though, I'd be curious to know whether such
stuff is actually useful to real implementors.
I would actually prefer to be able to launch a distributed SPARQL query
that automatically understood sameAs statements across servers, but
I've not seen anybody advertising SPARQL as being able to do that.
Yes, for the moment it would be up to SPARQL endpoints to perform the
appropriate distribution and aggregation of results. Needless to say,
the efforts in that field (I know that  are working on this) at very
Additionally, some of our member
organisations are writing their RDF in notepad. I don't think I could
get them to set up a SPARQL service. We're also aware of the principle
of following a resource URL with your webbrowser to see a factsheet of
the resource, but it doesn't really work because we're mainly
interested in bulk operations on resources.
Thomas Bandholtz, email@example.com, http://www.innoq.com
innoQ Deutschland GmbH, Halskestr. 17, D-40880 Ratingen, Germany
Phone: +49 228 9288490 Mobile: +49 178 4049387 Fax: +49 228 9288491