Re: URIs and Named Graphs from Onno Paap on 2007-07-09 (semantic-web@w3.org from July 2007)

From: Onno Paap <onno.paap@gmail.com>
Date: Mon, 9 Jul 2007 10:59:57 +0200
To: "Hans Teijgeler" <hans.teijgeler@quicknet.nl>, "Alan Ruttenberg" <alanr@mumble.net>, SW-forum <semantic-web@w3.org>, "Benjamins, Robin" <rxbenjam@bechtel.com>
Message-ID: <387b739f0707090159n6fe6ced2u30862f27c453b2a1@mail.gmail.com>
Hans Teijgeler asked me to address Alan Ruttenberg's question:

>SPARQL allows queries that span multiple named graphs - why would you need
>to have different end points for the different partitions, which I
>understand to be equivalent to named graphs?

I will take an example from the SPARQL spec, which is just to make the point

about our question on named graphs.
So please don't solve it using different data constructs or the like.

Suppose in one quad store with endpoint address
http://example.org
there are these named graphs:

# Graph: http://example.org/bob
@prefix foaf: <http://xmlns.com/foaf/0.1/> .

_:a foaf:name "Bob" .
_:a foaf:mbox <mailto:bob@newcorp.example.org <bob@newcorp.example.org>> .

# Graph: http://example.org/archive_200706/bob
@prefix foaf: <http://xmlns.com/foaf/0.1/> .

_:a foaf:name "Bob" .
_:a foaf:mbox <mailto:bob@oldcorp.example.org <bob@oldcorp.example.org>> .

Suppose there is another quad store with endpoint address:
http://xyz-corp.com
there is a resource locator like:

_:s  foaf:name     "Alice" .
_:s  foaf:mbox     <mailto:alice@work.example <alice@work.example>> .
_:s  foaf:knows    <http://example.org#a <http://example.org/#a>> .

The problem is that I cannot formulate a Sparql query like this:

Query directed at quad store endpoint address: http://example.org
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
SELECT ?knows_mbox
FROM NAMED <http://example.org/bob>
WHERE
{
  <http://example.org#a <http://example.org/#a>> foaf:mbox ?knows_mbox .
}

because that FROM NAMED <http://example.org/bob> clause in the SPARQL query
cannot be created from the data at http://xyz-corp.com since
the named graph name is not known at that side.
Question is: how can we also embed the named graph name in the
resource locator at the http://xyz-corp.com side?
If it wasn't illegal it could be something like:
(illegal) _:s  foaf:knows    <http://example.org#http://example.org/bob#a> .

The solution would be not to use named graphs but to use different endpoints
instead.
But when making different archives or contexts that is a support unfriendly
and costly task.

The named graph concept would seem to be not usable in
a federation of quad stores, when used as an archive or context key?
Onno Paap


On 7/8/07, Hans Teijgeler <hans.teijgeler@quicknet.nl> wrote:
>
> Hi Alan,
>
> Of course I want to answer your questions!
> I'll respond below.
>
> Thanks for your help!
> Hans
>
> -----Original Message-----
> From: Alan Ruttenberg [mailto:alanr@mumble.net]
> Sent: Sunday, July 08, 2007 0:05
> To: Hans Teijgeler
> Cc: SW-forum; Paap, Onno
> Subject: Re: URIs and Named Graphs
>
> Hello Hans,
>
> I'm trying to understand your scenario, and have some questions.
>
> - Why would you have hundreds of quad stores instead of a single larger
> quad
> store with more qualifications on the queries?
> <HT> That has to do with data ownership. On a project we have a hierarchy
> of
> data consolidation and integration, from many individual applications (e.g
> .
> process simulation, stress analysis, pump sizing) via the responsible
> discipline group on a project (e.g. Process Engineering) to the
> consolidating project store, and then to the quad store of the plant
> owner/operator. The latter may have many projects underway, and all
> projects
> use and produce data that needs to be integrated. These apps, disciplines
> and (sub)projects often are spread around the globe.
> The data at a lower level in the hierarchy are usually not accessible to
> the
> higher levels, because they are "work in progress" and should not be used
> by
> anybody else. Data, and their custodianship, can be "handed over" to the
> next level of consolidation. That hand-over involves a physical relocation
> from one quad store to another. The URI changes, the fragment identifier
> stays the same or gets a suffix, separated by a middle dot (00B7). We keep
> track of that location by storing that in, you guessed it, a partition
> "redirects" (one of the nine).
> One other reason for the many quad stores is the situation that the
> suppliers on a project (often in the hundreds) need to share their data
> with
> many other customers.
>
> - What motivates the uses if URI#fragIDs in the first place?
> <HT> Our data model is close to 5NF, very generic and fine-grained. On a
> refinery we have zillions of physical objects that all have their lifetime
> information recorded. Any chunk of information is attributed to a
> "temporal
> part" of the object involved, so we have zillions of objects with zillions
> of temporal parts.
>
> - SPARQL allows queries that span multiple named graphs - why would you
> need
> to have different end points for the different partitions, which I
> understand to be equivalent to named graphs?
> <HT> I don't know whether or not that is equivalent. If we have the
> partition "sent messages" we will have many triples in it, forming many
> graphs that may or may not have any connection to other graphs in that
> same
> partition. But all triples will have a URI for "sent messages" as their
> 4th
> column. Would you still call that named graph?
> I will ask Onno Paap (on cc), our "head techie", to respond on your
> "SPARQL
> vs multiple named graphs" remark. That is out of my league :-(
>
> - What would you expect the behavior to be for something like
> URI#partition#fragID? The only behavior I am aware of for hashes is in the
> context of http GET, where only the URI before the hash is sent as the
> target of the GET. Do you depend on this behavior already? If so, to
> accomplish what?
> <HT> No, I was just wondering how we can fetch a particular fragment ID
> from
> a particular "partition", and then dereference it.
>
> - What do browsers have to do with the scenario?
> <HT> Not much, only indirectly, granted.
>
> You might not want to answer these questions - in that case consider them
> as
> an indication of whether you are adequately communicating your problem to
> an
> audience familiar with SW technologies, as I consider myself to be.
> <HT> I hope I have improved on my communication skills :-)
>
> Regards,
> Alan
>
>
> On Jul 5, 2007, at 7:57 AM, Hans Teijgeler wrote:
>
> > Hi,
> >
> > We ran into a problem for which I ask advice from this esteemed forum.
> >
> > First some background information: we use the SW technologies in
> > conjunction with a generic data model to create a distributed data
> > base for each engineering project, involving large numbers (in the
> > hundreds) of quad stores per project.
> >
> > To give an example of using a data model "underneath" OWL: normally
> > you may see things like an <owl:Class rdf:ID="Car"/>.
> > For us that would be: <part2:ClassOfInanimatePhysicalObject
> > rdf:ID="Car"/> where ClassOfInanimatePhysicalObject is an entity type
> > in our data model and an owl:Class.
> > If an application has data that must be shared, that data is mapped at
> > the source from its proprietary format to ISO 15926-7 format, and
> > stored in a quad store that we call a "Façade".
> > Only "owned" data are stored, other data a fetched with SPARQL for
> > other Façades.
> > Data can be "handed over" to another Façade, thus also handing over
> > custody for that data.
> > Quad stores that participate in a given project are known to a "CPF"
> > server (Confederation of Participating Façades), where we distrubute
> > SPARQL queries, consolidate query results, whilst controling access
> > rights.
> >
> > For the Façades we use RAP, and want to use the 4th column of their
> > Named Graphs for dividing the quad store into partitions like 'active
> > data', 'archive', and the like. Actually we have nine such partitions,
> > but I won't annoy you with the details.
> >
> > We use URI#fragID's all over the place.
> >
> > The question is how we can dereference any such fragment identifiers
> > inside a particular partition without having to have nine endpoints
> > (which is costly and harder to manage).
> >
> > It would be nice if we could use composite fragment identifiers like
> > URI#partition#fragID, but the second hash # would not be allowed. If
> > we would use something like URI#partition__fragID that would be
> > well-formed, but hardly usable with generic browsers (I guess).
> >
> > Please shed some light on this.
> >
> > Regards,
> > Hans
> >
> > ____________________
> > OntoConsult
> > Hans Teijgeler
> > ISO 15926 specialist
> > Netherlands
> > +31-72-509 2005
> > www.InfowebML.ws
> > hans.teijgeler@quicknet.nl
> >
> > No virus found in this outgoing message.
> > Checked by AVG Free Edition.
> > Version: 7.5.476 / Virus Database: 269.10.0/886 - Release Date: 04-
> > Jul-07 13:40
>
> No virus found in this incoming message.
> Checked by AVG Free Edition.
> Version: 7.5.476 / Virus Database: 269.10.2/890 - Release Date: 07-Jul-07
> 15:26
>
>
> No virus found in this outgoing message.
> Checked by AVG Free Edition.
> Version: 7.5.476 / Virus Database: 269.10.2/890 - Release Date: 07-Jul-07
> 15:26
>
>
>
Received on Monday, 9 July 2007 12:57:52 UTC