RE: URIs and Named Graphs from Hans Teijgeler on 2007-07-24 (semantic-web@w3.org from July 2007)

From: Hans Teijgeler <hans.teijgeler@quicknet.nl>
Date: Tue, 24 Jul 2007 10:38:27 +0200
To: "'Alan Ruttenberg'" <alanruttenberg@gmail.com>, "'Onno Paap'" <onno.paap@gmail.com>
Cc: "'SW-forum Web'" <semantic-web@w3.org>, "'Benjamins, Robin'" <rxbenjam@bechtel.com>, "'Eric Prud'hommeaux'" <eric@w3.org>
Message-ID: <000601c7cdce$0521b280$6c7ba8c0@hans>
Hi Alan,

Thanks for your response!
Onno is on a short vacation. Upon return he will undoubtedly respond.

My response to your 6):
We compartimentize our RAP quad store as follows:
- data		the actual payload that is accessible for anyone  
			with the proper access rights
- populations	archive of incoming RDF/XML files converted to  
			triples that are subsequently loaded in the 'data'  
			section whilst avoiding duplicates
- handovers		archive of triples, mapped to RDF/XML files, that  
			are physically handed over to another store; 
			this hand-over includes transfer of custodianship
- sentmsg		archive of messages, including documents, sent to  
			other stores (see Note 1)
- recdmsg		archive of messages, including documents, that have

			been received from other stores (see Note 2)
Note 1 - Since these documents are in fact sanpshots of the data in our
	   own store and in other stores, and since it is predictably so 
	   that this data will not remain available for the next 50 years  
	   or so (the lifetime of a plant), we felt the need to archive  
	   the message with its documents. Since this would constitute a 
	   duplication of data, we store it in a section that is separated 
	   from the normal 'data' section, to which access is more 
	   constrained. So, for example, a node in the 'data' section
	   could be myEndpoint#MPO-12345, and is used in a sent document
	   it would be stored in the 'sentmsg' section as 
	   myEndpoint#sentmsg:MPO-12345.
Note 2 - Actually the recipient only gets a skeleton message with a key  
	   to pull the entire graph of the full message and its documents 
	   from the 'sentmsg' section of the sending store.

Regards,
Hans

PS That SPARQL query you quoted under 2) is kind of hard to read :-) 



-----Original Message-----
From: semantic-web-request@w3.org [mailto:semantic-web-request@w3.org] On
Behalf Of Alan Ruttenberg
Sent: Tuesday, July 24, 2007 9:02
To: Onno Paap
Cc: Hans Teijgeler; SW-forum Web; Benjamins, Robin; Eric Prud'hommeaux
Subject: Re: URIs and Named Graphs


Hello Hans, Onno

Apologies for the delay in responding - a combination of other obligations
and my spam filter misfiling Onno's response.

Some thoughts:

1) Indeed it is the case that there seems no way in SPARQL to identify a
resource as a pair of endpoint and named graph for the purposes narrowing a
query. It appears to me that one can either rely on the endpoint's default
graph, and within that refer to named graphs, or create named graphs from
documents (i.e. single graphs) that are dereferenced. Perhaps this should be
addressed with the spec. I am ccing Eric Prud'homme in case he has some
thoughts about this.

2) Note however that at FROM statement can take a URL which is a SPARQL
query against an endpoint. So you can have a query which http://xyz-corp.com
processes:
SELECT ?knows_mbox
FROM <http://example.org:/sparql/?query=PREFIX%20foaf%3A%20%3Chttp%3A%
2F%2Fxmlns.com%2Ffoaf%2F0.1%2F%3E%0ACONSTRUCT%20%7B%20%3Chttp%3A%2F%
2Fexample.org%23a%3E%20foaf%3Ambox%20%3Fknows_mbox%20.%20%7D%0AFROM%
20NAMED%20%3Chttp%3A%2F%2Fexample.org%2Fbob%3E%0AWHERE%0A%7B%0A%20%20%
3Chttp%3A%2F%2Fexample.org%23a%3E%20foaf%3Ambox%20%3Fknows_mbox%20.%0A
%7D&format=application%2Frdf%2Bxml>

WHERE

{
   <http://example.org#a> foaf:mbox ?knows_mbox .
}



Arguably, the SPARQL spec might have made the syntax for this a little more
friendly, by allowing a full SPARQL query in the usual syntax inside the
FROM<>.

3) You write: "The solution would be not to use named graphs but to use
different endpoints instead. But when making different archives or contexts
that is a support unfriendly  and costly task."

Supposing you did this, do you suppose that you will be asking http://
xyz-corp.com to do a query that involves something like FROM <http://
example.org/bob/> where http://example.org/bob is a new endpoint? If so this
query would naively involve transferring all the triples from that endpoint
to http://xyz-corp.com, which seems undesirable.

4) Some of your example is still not entirely clear.  In Onno's example, do
i understand it that line of thinking is something along the lines of "I
want to know the mbox of people that Alice knows", and you want to ask this
query against the source of the "Alice knows ?x" endpoint, namely
http://xyz-corp.com ?
If so,  where is the extra information that the endpoint to do this query
against is http://example.org/ and the named graph http:// example.org/bob
come from - some external source of information?
I think I am asking you to more explicitly lay out the series of events
leading up to the query you want to make. I'm a little confused because you
say "Query directed at quad store endpoint
address: http://example.org", but then "how can we also embed the named
graph name in the resource locator at the http://xyz-corp.com side?"

5) Are you open the possibility that your http server at http://
example.org/ could be augmented with CGI that accepts a GET of a URI of
specific syntax, processes this as a local query to the quad store, and
returns the result as RDF? I'm not sure you couldn't do the same thing with
(2) but this might allow for some more flexibility in syntax.

6) Hans asks: "What if we archive an identifier, say #MPO-12345, and change
it into
#archive:MPO-12345 Would that fly in an RDF environment?".

Syntactically this works. I still don't understand the full
"dereferencing/query" story so I'm not sure whether this will address your
problem or not.

-Alan


On Jul 9, 2007, at 4:59 AM, Onno Paap wrote:

> Hans Teijgeler asked me to address Alan Ruttenberg's question:
>
> >SPARQL allows queries that span multiple named graphs - why would
> you need
> >to have different end points for the different partitions, which I 
> >understand to be equivalent to named graphs?
>
> I will take an example from the SPARQL spec, which is just to make the 
> point about our question on named graphs.
> So please don't solve it using different data constructs or the like.
>
> Suppose in one quad store with endpoint address http://example.org 
> there are these named graphs:
>
> # Graph: http://example.org/bob
> @prefix foaf: <http://xmlns.com/foaf/0.1/> .
>
> _:a foaf:name "Bob" .
> _:a foaf:mbox <mailto:bob@newcorp.example.org> .
>
> # Graph: http://example.org/archive_200706/bob
> @prefix foaf: <http://xmlns.com/foaf/0.1/> .
>
> _:a foaf:name "Bob" .
> _:a foaf:mbox <mailto:bob@oldcorp.example.org> .
>
> Suppose there is another quad store with endpoint address:
> http://xyz-corp.com
> there is a resource locator like:
>
> _:s  foaf:name     "Alice" .
> _:s  foaf:mbox     <mailto:alice@work.example> .
> _:s  foaf:knows    <http://example.org#a > .
>
> The problem is that I cannot formulate a Sparql query like this:
>
> Query directed at quad store endpoint address: http://example.org 
> PREFIX foaf: <http://xmlns.com/foaf/0.1/> SELECT ?knows_mbox FROM 
> NAMED <http://example.org/bob> WHERE {
>   <http://example.org#a> foaf:mbox ?knows_mbox .
> }
>
> because that FROM NAMED <http://example.org/bob> clause in the SPARQL 
> query cannot be created from the data at http://xyz-corp.com since the 
> named graph name is not known at that side.
>
> Question is: how can we also embed the named graph name in the 
> resource locator at the http://xyz-corp.com side?
> If it wasn't illegal it could be something like:
> (illegal) _:s  foaf:knows    <http://example.org#http://example.org/ 
> bob#a> .
> The solution would be not to use named graphs but to use different 
> endpoints instead.
> But when making different archives or contexts that is a support 
> unfriendly and costly task.
>
> The named graph concept would seem to be not usable in a federation of 
> quad stores, when used as an archive or context key?
>
> Onno Paap
>
>
> On 7/8/07, Hans Teijgeler <hans.teijgeler@quicknet.nl> wrote: Hi Alan,
>
> Of course I want to answer your questions!
> I'll respond below.
>
> Thanks for your help!
> Hans
>
> -----Original Message-----
> From: Alan Ruttenberg [mailto:alanr@mumble.net]
> Sent: Sunday, July 08, 2007 0:05
> To: Hans Teijgeler
> Cc: SW-forum; Paap, Onno
> Subject: Re: URIs and Named Graphs
>
> Hello Hans,
>
> I'm trying to understand your scenario, and have some questions.
>
> - Why would you have hundreds of quad stores instead of a single 
> larger quad store with more qualifications on the queries?
> <HT> That has to do with data ownership. On a project we have a 
> hierarchy of data consolidation and integration, from many individual 
> applications (e.g.
> process simulation, stress analysis, pump sizing) via the responsible 
> discipline group on a project (e.g. Process Engineering) to the 
> consolidating project store, and then to the quad store of the plant 
> owner/operator. The latter may have many projects underway, and all 
> projects use and produce data that needs to be integrated. These apps, 
> disciplines and (sub)projects often are spread around the globe.
> The data at a lower level in the hierarchy are usually not accessible 
> to the higher levels, because they are "work in progress" and should 
> not be used by anybody else. Data, and their custodianship, can be 
> "handed over"
> to the
> next level of consolidation. That hand-over involves a physical 
> relocation from one quad store to another. The URI changes, the 
> fragment identifier stays the same or gets a suffix, separated by a 
> middle dot (00B7).
> We keep
> track of that location by storing that in, you guessed it, a partition 
> "redirects" (one of the nine).
> One other reason for the many quad stores is the situation that the 
> suppliers on a project (often in the hundreds) need to share their 
> data with many other customers.
>
> - What motivates the uses if URI#fragIDs in the first place?
> <HT> Our data model is close to 5NF, very generic and fine-grained.  
> On a
> refinery we have zillions of physical objects that all have their 
> lifetime information recorded. Any chunk of information is attributed 
> to a "temporal part" of the object involved, so we have zillions of 
> objects with zillions of temporal parts.
>
> - SPARQL allows queries that span multiple named graphs - why would 
> you need to have different end points for the different partitions, 
> which I understand to be equivalent to named graphs?
> <HT> I don't know whether or not that is equivalent. If we have the 
> partition "sent messages" we will have many triples in it, forming 
> many graphs that may or may not have any connection to other graphs in 
> that same partition. But all triples will have a URI for "sent 
> messages" as their 4th column. Would you still call that named graph?
> I will ask Onno Paap (on cc), our "head techie", to respond on your 
> "SPARQL vs multiple named graphs" remark. That is out of my league :-(
>
> - What would you expect the behavior to be for something like 
> URI#partition#fragID? The only behavior I am aware of for hashes is in 
> the context of http GET, where only the URI before the hash is sent as 
> the target of the GET. Do you depend on this behavior already? If so, 
> to accomplish what?
> <HT> No, I was just wondering how we can fetch a particular fragment 
> ID from a particular "partition", and then dereference it.
>
> - What do browsers have to do with the scenario?
> <HT> Not much, only indirectly, granted.
>
> You might not want to answer these questions - in that case consider 
> them as an indication of whether you are adequately communicating your 
> problem to an audience familiar with SW technologies, as I consider 
> myself to be.
> <HT> I hope I have improved on my communication skills :-)
>
> Regards,
> Alan
>
>
> On Jul 5, 2007, at 7:57 AM, Hans Teijgeler wrote:
>
> > Hi,
> >
> > We ran into a problem for which I ask advice from this esteemed
> forum.
> >
> > First some background information: we use the SW technologies in 
> > conjunction with a generic data model to create a distributed data 
> > base for each engineering project, involving large numbers (in the
> > hundreds) of quad stores per project.
> >
> > To give an example of using a data model "underneath" OWL: normally 
> > you may see things like an <owl:Class rdf:ID="Car"/>.
> > For us that would be: <part2:ClassOfInanimatePhysicalObject
> > rdf:ID="Car"/> where ClassOfInanimatePhysicalObject is an entity
> type
> > in our data model and an owl:Class.
> > If an application has data that must be shared, that data is
> mapped at
> > the source from its proprietary format to ISO 15926-7 format, and 
> > stored in a quad store that we call a "Façade".
> > Only "owned" data are stored, other data a fetched with SPARQL for 
> > other Façades.
> > Data can be "handed over" to another Façade, thus also handing over 
> > custody for that data.
> > Quad stores that participate in a given project are known to a "CPF"
> > server (Confederation of Participating Façades), where we distrubute 
> > SPARQL queries, consolidate query results, whilst controling access 
> > rights.
> >
> > For the Façades we use RAP, and want to use the 4th column of their 
> > Named Graphs for dividing the quad store into partitions like
> 'active
> > data', 'archive', and the like. Actually we have nine such
> partitions,
> > but I won't annoy you with the details.
> >
> > We use URI#fragID's all over the place.
> >
> > The question is how we can dereference any such fragment identifiers 
> > inside a particular partition without having to have nine endpoints 
> > (which is costly and harder to manage).
> >
> > It would be nice if we could use composite fragment identifiers like 
> > URI#partition#fragID, but the second hash # would not be allowed. If 
> > we would use something like URI#partition__fragID that would be 
> > well-formed, but hardly usable with generic browsers (I guess).
> >
> > Please shed some light on this.
> >
> > Regards,
> > Hans
> >
> > ____________________
> > OntoConsult
> > Hans Teijgeler
> > ISO 15926 specialist
> > Netherlands
> > +31-72-509 2005
> > www.InfowebML.ws
> > hans.teijgeler@quicknet.nl
> >
> > No virus found in this outgoing message.
> > Checked by AVG Free Edition.
> > Version: 7.5.476 / Virus Database: 269.10.0/886 - Release Date: 04-
> > Jul-07 13:40
>
> No virus found in this incoming message.
> Checked by AVG Free Edition.
> Version: 7.5.476 / Virus Database: 269.10.2/890 - Release Date: 07-
> Jul-07
> 15:26
>
>
> No virus found in this outgoing message.
> Checked by AVG Free Edition.
> Version: 7.5.476 / Virus Database: 269.10.2/890 - Release Date: 07-
> Jul-07
> 15:26
>
>
>


No virus found in this incoming message.
Checked by AVG Free Edition. 
Version: 7.5.476 / Virus Database: 269.10.14/912 - Release Date: 22-Jul-07
19:02
 

No virus found in this outgoing message.
Checked by AVG Free Edition. 
Version: 7.5.476 / Virus Database: 269.10.14/912 - Release Date: 22-Jul-07
19:02
Received on Tuesday, 24 July 2007 08:41:56 UTC