Re: URIs and Named Graphs from Alan Ruttenberg on 2007-07-24 (semantic-web@w3.org from July 2007)

From: Alan Ruttenberg <alanruttenberg@gmail.com>
Date: Tue, 24 Jul 2007 03:02:07 -0400
To: Onno Paap <onno.paap@gmail.com>
Cc: Hans Teijgeler <hans.teijgeler@quicknet.nl>, SW-forum Web <semantic-web@w3.org>, "Benjamins, Robin" <rxbenjam@bechtel.com>, Eric Prud'hommeaux <eric@w3.org>
Message-Id: <A1325951-99FB-4402-8EEB-A61C2929E48E@gmail.com>
Hello Hans, Onno

Apologies for the delay in responding - a combination of other  
obligations and my spam filter misfiling Onno's response.

Some thoughts:

1) Indeed it is the case that there seems no way in SPARQL to  
identify a resource as a pair of endpoint and named graph for the  
purposes narrowing a query. It appears to me that one can either rely  
on the endpoint's default graph, and within that refer to named  
graphs, or create named graphs from documents (i.e. single graphs)  
that are dereferenced. Perhaps this should be addressed with the  
spec. I am ccing Eric Prud'homme in case he has some thoughts about  
this.

2) Note however that at FROM statement can take a URL which is a  
SPARQL query against an endpoint. So you can have a query which  
http://xyz-corp.com processes:
SELECT ?knows_mbox
FROM <http://example.org:/sparql/?query=PREFIX%20foaf%3A%20%3Chttp%3A% 
2F%2Fxmlns.com%2Ffoaf%2F0.1%2F%3E%0ACONSTRUCT%20%7B%20%3Chttp%3A%2F% 
2Fexample.org%23a%3E%20foaf%3Ambox%20%3Fknows_mbox%20.%20%7D%0AFROM% 
20NAMED%20%3Chttp%3A%2F%2Fexample.org%2Fbob%3E%0AWHERE%0A%7B%0A%20%20% 
3Chttp%3A%2F%2Fexample.org%23a%3E%20foaf%3Ambox%20%3Fknows_mbox%20.%0A 
%7D&format=application%2Frdf%2Bxml>

WHERE

{
   <http://example.org#a> foaf:mbox ?knows_mbox .
}



Arguably, the SPARQL spec might have made the syntax for this a  
little more friendly, by allowing a full SPARQL query in the usual  
syntax inside the FROM<>.

3) You write: "The solution would be not to use named graphs but to  
use different endpoints instead. But when making different archives  
or contexts that is a support unfriendly  and costly task."

Supposing you did this, do you suppose that you will be asking http:// 
xyz-corp.com to do a query that involves something like FROM <http:// 
example.org/bob/> where http://example.org/bob is a new endpoint? If  
so this query would naively involve transferring all the triples from  
that endpoint to http://xyz-corp.com, which seems undesirable.

4) Some of your example is still not entirely clear.  In Onno's  
example, do i understand it that line of thinking is something along  
the lines of "I want to know the mbox of people that Alice knows",  
and you want to ask this query against the source of the "Alice  
knows ?x" endpoint, namely http://xyz-corp.com ?
If so,  where is the extra information that the endpoint to do this  
query against is http://example.org/ and the named graph http:// 
example.org/bob come from - some external source of information?
I think I am asking you to more explicitly lay out the series of  
events leading up to the query you want to make. I'm a little  
confused because you say "Query directed at quad store endpoint  
address: http://example.org", but then "how can we also embed the  
named graph name in the
resource locator at the http://xyz-corp.com side?"

5) Are you open the possibility that your http server at http:// 
example.org/ could be augmented with CGI that accepts a GET of a URI  
of specific syntax, processes this as a local query to the quad  
store, and returns the result as RDF? I'm not sure you couldn't do  
the same thing with (2) but this might allow for some more  
flexibility in syntax.

6) Hans asks: "What if we archive an identifier, say #MPO-12345, and  
change it into
#archive:MPO-12345 Would that fly in an RDF environment?".

Syntactically this works. I still don't understand the full  
"dereferencing/query" story so I'm not sure whether this will address  
your problem or not.

-Alan


On Jul 9, 2007, at 4:59 AM, Onno Paap wrote:

> Hans Teijgeler asked me to address Alan Ruttenberg's question:
>
> >SPARQL allows queries that span multiple named graphs - why would  
> you need
> >to have different end points for the different partitions, which I
> >understand to be equivalent to named graphs?
>
> I will take an example from the SPARQL spec, which is just to make  
> the point
> about our question on named graphs.
> So please don't solve it using different data constructs or the like.
>
> Suppose in one quad store with endpoint address
> http://example.org
> there are these named graphs:
>
> # Graph: http://example.org/bob
> @prefix foaf: <http://xmlns.com/foaf/0.1/> .
>
> _:a foaf:name "Bob" .
> _:a foaf:mbox <mailto:bob@newcorp.example.org> .
>
> # Graph: http://example.org/archive_200706/bob
> @prefix foaf: <http://xmlns.com/foaf/0.1/> .
>
> _:a foaf:name "Bob" .
> _:a foaf:mbox <mailto:bob@oldcorp.example.org> .
>
> Suppose there is another quad store with endpoint address:
> http://xyz-corp.com
> there is a resource locator like:
>
> _:s  foaf:name     "Alice" .
> _:s  foaf:mbox     <mailto:alice@work.example> .
> _:s  foaf:knows    <http://example.org#a > .
>
> The problem is that I cannot formulate a Sparql query like this:
>
> Query directed at quad store endpoint address: http://example.org
> PREFIX foaf: <http://xmlns.com/foaf/0.1/>
> SELECT ?knows_mbox
> FROM NAMED <http://example.org/bob>
> WHERE
> {
>   <http://example.org#a> foaf:mbox ?knows_mbox .
> }
>
> because that FROM NAMED <http://example.org/bob> clause in the  
> SPARQL query
> cannot be created from the data at http://xyz-corp.com since
> the named graph name is not known at that side.
>
> Question is: how can we also embed the named graph name in the
> resource locator at the http://xyz-corp.com side?
> If it wasn't illegal it could be something like:
> (illegal) _:s  foaf:knows    <http://example.org#http://example.org/ 
> bob#a> .
> The solution would be not to use named graphs but to use different  
> endpoints instead.
> But when making different archives or contexts that is a support  
> unfriendly
> and costly task.
>
> The named graph concept would seem to be not usable in
> a federation of quad stores, when used as an archive or context key?
>
> Onno Paap
>
>
> On 7/8/07, Hans Teijgeler <hans.teijgeler@quicknet.nl> wrote: Hi Alan,
>
> Of course I want to answer your questions!
> I'll respond below.
>
> Thanks for your help!
> Hans
>
> -----Original Message-----
> From: Alan Ruttenberg [mailto:alanr@mumble.net]
> Sent: Sunday, July 08, 2007 0:05
> To: Hans Teijgeler
> Cc: SW-forum; Paap, Onno
> Subject: Re: URIs and Named Graphs
>
> Hello Hans,
>
> I'm trying to understand your scenario, and have some questions.
>
> - Why would you have hundreds of quad stores instead of a single  
> larger quad
> store with more qualifications on the queries?
> <HT> That has to do with data ownership. On a project we have a  
> hierarchy of
> data consolidation and integration, from many individual  
> applications (e.g.
> process simulation, stress analysis, pump sizing) via the responsible
> discipline group on a project (e.g. Process Engineering) to the
> consolidating project store, and then to the quad store of the plant
> owner/operator. The latter may have many projects underway, and all  
> projects
> use and produce data that needs to be integrated. These apps,  
> disciplines
> and (sub)projects often are spread around the globe.
> The data at a lower level in the hierarchy are usually not  
> accessible to the
> higher levels, because they are "work in progress" and should not  
> be used by
> anybody else. Data, and their custodianship, can be "handed over"  
> to the
> next level of consolidation. That hand-over involves a physical  
> relocation
> from one quad store to another. The URI changes, the fragment  
> identifier
> stays the same or gets a suffix, separated by a middle dot (00B7).  
> We keep
> track of that location by storing that in, you guessed it, a partition
> "redirects" (one of the nine).
> One other reason for the many quad stores is the situation that the
> suppliers on a project (often in the hundreds) need to share their  
> data with
> many other customers.
>
> - What motivates the uses if URI#fragIDs in the first place?
> <HT> Our data model is close to 5NF, very generic and fine-grained.  
> On a
> refinery we have zillions of physical objects that all have their  
> lifetime
> information recorded. Any chunk of information is attributed to a  
> "temporal
> part" of the object involved, so we have zillions of objects with  
> zillions
> of temporal parts.
>
> - SPARQL allows queries that span multiple named graphs - why would  
> you need
> to have different end points for the different partitions, which I
> understand to be equivalent to named graphs?
> <HT> I don't know whether or not that is equivalent. If we have the
> partition "sent messages" we will have many triples in it, forming  
> many
> graphs that may or may not have any connection to other graphs in  
> that same
> partition. But all triples will have a URI for "sent messages" as  
> their 4th
> column. Would you still call that named graph?
> I will ask Onno Paap (on cc), our "head techie", to respond on your  
> "SPARQL
> vs multiple named graphs" remark. That is out of my league :-(
>
> - What would you expect the behavior to be for something like
> URI#partition#fragID? The only behavior I am aware of for hashes is  
> in the
> context of http GET, where only the URI before the hash is sent as the
> target of the GET. Do you depend on this behavior already? If so, to
> accomplish what?
> <HT> No, I was just wondering how we can fetch a particular  
> fragment ID from
> a particular "partition", and then dereference it.
>
> - What do browsers have to do with the scenario?
> <HT> Not much, only indirectly, granted.
>
> You might not want to answer these questions - in that case  
> consider them as
> an indication of whether you are adequately communicating your  
> problem to an
> audience familiar with SW technologies, as I consider myself to be.
> <HT> I hope I have improved on my communication skills :-)
>
> Regards,
> Alan
>
>
> On Jul 5, 2007, at 7:57 AM, Hans Teijgeler wrote:
>
> > Hi,
> >
> > We ran into a problem for which I ask advice from this esteemed  
> forum.
> >
> > First some background information: we use the SW technologies in
> > conjunction with a generic data model to create a distributed data
> > base for each engineering project, involving large numbers (in the
> > hundreds) of quad stores per project.
> >
> > To give an example of using a data model "underneath" OWL: normally
> > you may see things like an <owl:Class rdf:ID="Car"/>.
> > For us that would be: <part2:ClassOfInanimatePhysicalObject
> > rdf:ID="Car"/> where ClassOfInanimatePhysicalObject is an entity  
> type
> > in our data model and an owl:Class.
> > If an application has data that must be shared, that data is  
> mapped at
> > the source from its proprietary format to ISO 15926-7 format, and
> > stored in a quad store that we call a "Façade".
> > Only "owned" data are stored, other data a fetched with SPARQL for
> > other Façades.
> > Data can be "handed over" to another Façade, thus also handing over
> > custody for that data.
> > Quad stores that participate in a given project are known to a "CPF"
> > server (Confederation of Participating Façades), where we distrubute
> > SPARQL queries, consolidate query results, whilst controling access
> > rights.
> >
> > For the Façades we use RAP, and want to use the 4th column of their
> > Named Graphs for dividing the quad store into partitions like  
> 'active
> > data', 'archive', and the like. Actually we have nine such  
> partitions,
> > but I won't annoy you with the details.
> >
> > We use URI#fragID's all over the place.
> >
> > The question is how we can dereference any such fragment identifiers
> > inside a particular partition without having to have nine endpoints
> > (which is costly and harder to manage).
> >
> > It would be nice if we could use composite fragment identifiers like
> > URI#partition#fragID, but the second hash # would not be allowed. If
> > we would use something like URI#partition__fragID that would be
> > well-formed, but hardly usable with generic browsers (I guess).
> >
> > Please shed some light on this.
> >
> > Regards,
> > Hans
> >
> > ____________________
> > OntoConsult
> > Hans Teijgeler
> > ISO 15926 specialist
> > Netherlands
> > +31-72-509 2005
> > www.InfowebML.ws
> > hans.teijgeler@quicknet.nl
> >
> > No virus found in this outgoing message.
> > Checked by AVG Free Edition.
> > Version: 7.5.476 / Virus Database: 269.10.0/886 - Release Date: 04-
> > Jul-07 13:40
>
> No virus found in this incoming message.
> Checked by AVG Free Edition.
> Version: 7.5.476 / Virus Database: 269.10.2/890 - Release Date: 07- 
> Jul-07
> 15:26
>
>
> No virus found in this outgoing message.
> Checked by AVG Free Edition.
> Version: 7.5.476 / Virus Database: 269.10.2/890 - Release Date: 07- 
> Jul-07
> 15:26
>
>
>
Received on Tuesday, 24 July 2007 07:02:09 UTC