RE: URIs and Named Graphs from Hans Teijgeler on 2007-07-11 (semantic-web@w3.org from July 2007)

From: Hans Teijgeler <hans.teijgeler@quicknet.nl>
Date: Wed, 11 Jul 2007 14:50:46 +0200
To: "'Ioachim Drugus'" <sw@semanticsoft.net>
Cc: "'Alan Ruttenberg'" <alanr@mumble.net>, "'SW-forum'" <semantic-web@w3.org>, "'Benjamins, Robin'" <rxbenjam@bechtel.com>, "'Onno Paap'" <onno.paap@gmail.com>
Message-ID: <000a01c7c3ba$1dc8a2d0$6c7ba8c0@hans>
Ioachim,

I regret to have to say that this wasn't very helpful:
1) reification is out of the question
2) even if that filter would work, I'd hate to see that 100 million or one
trillion triples would be returned before I could fetch what I actually
needed.
3) with 100,000 classes that would be rather problematic :-)

Thanks anyway for your effort!

Regards,
Hans

-----Original Message-----
From: semantic-web-request@w3.org [mailto:semantic-web-request@w3.org] On
Behalf Of Ioachim Drugus
Sent: Tuesday, July 10, 2007 22:36
To: Onno Paap
Cc: Hans Teijgeler; Alan Ruttenberg; SW-forum; Benjamins, Robin
Subject: Re: URIs and Named Graphs


Here are 3 ideas on how to solve this.

This is a very complex project and based on the realities of the project you
might have thought about them but discarded each one separately. 
But, you might want to combine two or all three to obtain a good solution.
Our group have been focused on development of generic tools until now - so
are our solutions (and we don't know the project and did not work with
quards before).

I would attribute all the resources in one Façade to one *class* (it seems,
in quads they call this *index* and use it in the forth column of a table of
resources - I looked here
http://www.semantic-conference.com/2007/sessions/ps4.html).

1.Reification

In RDF you can reify a triple. Suppose you have the triple

exproducts:item10245  exterms:weight "2.4"^^xsd:decimal .

It becomes

_:tripleid   rdf:type        rdf:Statement .
_:tripleid   rdf:subject     exproducts:item10245 . 
_:tripleid   rdf:predicate   exterms:weight . 
_:tripleid   rdf:object      "2.4"^^xsd:decimal .

But when you reify you can also add other properties of the triple, say, the
creator

_:tripleid rdf:type rdf:Statement .

_:tripleid   rdf:subject     exproducts:item10245 . 
_:tripleid   rdf:predicate   exterms:weight . 
_:tripleid   rdf:object      "2.4"^^xsd:decimal .
_:tripleid   dc:creator      exstaff:85740 . 

You can add a new property

_:tripleid   :class       "Business Analyze"^^xsd:String .

And now the triple will contain also its class.

True, now you would have to change the logic of SPARQL query. But this looks
like the easy part.

2.  A solution less costly would be to preserve the class as it is currently
in Facades, but add a custom function to the filter constraints of SPARQL
queries which would allow to filter resources by their class. Say, by using
the  function  /inClass(?subj, ?obj, ?pred, /"archive"/),  /the query below
will return all triples of the class
"archive":

SELECT *

WHERE{

*/ ?subj ?obj ?pred/*

FILTER(_/inClass/_(*/?subj, ?obj, ?pred,/* "arhive"))

}

3. It is possible that each class is a named graph. 
But then you need to add a custom function to identify the graph where
resides the triple and when a named graph of a repository is queried, the
query should return all the triples from all repositories which correspond
to the class described by the named graph. 
It will help to have a table of synonyms, where each named graph would have
a name and a URI. 

Probably, this could be done easiest by using our Semantic Server which can
maintain resources in repositories accross the web and resource meta-data as
well as the repository hosts and their metadata. But this would take some
reorganization of the whole project.

I hope, this helps

Ioachim
http://semanticsoft.net:8080/semanticwebtools.html


Onno Paap wrote:
>
> Hans Teijgeler asked me to address Alan Ruttenberg's question:
>
> >SPARQL allows queries that span multiple named graphs - why would you
> need
> >to have different end points for the different partitions, which I 
> >understand to be equivalent to named graphs?
>
> I will take an example from the SPARQL spec, which is just to make the 
> point about our question on named graphs.
> So please don't solve it using different data constructs or the like.
>
> Suppose in one quad store with endpoint address http://example.org 
> <http://example.org/> there are these named graphs:
>
> # Graph: http://example.org/bob
> @prefix foaf: <http://xmlns.com/foaf/0.1/> .
>
> _:a foaf:name "Bob" .
> _:a foaf:mbox <mailto:bob@newcorp.example.org> .
>
> # Graph: http://example.org/archive_200706/bob
> @prefix foaf: <http://xmlns.com/foaf/0.1/> .
>
> _:a foaf:name "Bob" .
> _:a foaf:mbox <mailto:bob@oldcorp.example.org> .
>
> Suppose there is another quad store with endpoint address:
> http://xyz-corp.com <http://xyz-corp.com/> there is a resource locator 
> like:
>
> _:s  foaf:name     "Alice" .
> _:s  foaf:mbox     <mailto:alice@work.example> .
> _:s  foaf:knows    <http://example.org#a <http://example.org/#a> > .  
>
> The problem is that I cannot formulate a Sparql query like this:
>
> Query directed at quad store endpoint address: http://example.org 
> <http://example.org/> PREFIX foaf: <http://xmlns.com/foaf/0.1/> SELECT 
> ?knows_mbox FROM NAMED <http://example.org/bob> WHERE {
>   <http://example.org#a <http://example.org/#a>> foaf:mbox ?knows_mbox .
> }
>
> because that FROM NAMED <http://example.org/bob> clause in the SPARQL 
> query cannot be created from the data at http://xyz-corp.com 
> <http://xyz-corp.com/> since the named graph name is not known at that 
> side.
>
> Question is: how can we also embed the named graph name in the 
> resource locator at the http://xyz-corp.com <http://xyz-corp.com/> side?
> If it wasn't illegal it could be something like:
> (illegal) _:s  foaf:knows    
> <http://example.org#http://example.org/bob#a> .
>
> The solution would be not to use named graphs but to use different 
> endpoints instead.
> But when making different archives or contexts that is a support 
> unfriendly and costly task.
>
> The named graph concept would seem to be not usable in a federation of 
> quad stores, when used as an archive or context key?
>
> Onno Paap
>
>  
> On 7/8/07, *Hans Teijgeler* <hans.teijgeler@quicknet.nl 
> <mailto:hans.teijgeler@quicknet.nl>> wrote:
>
>     Hi Alan,
>
>     Of course I want to answer your questions!
>     I'll respond below.
>
>     Thanks for your help!
>     Hans
>
>     -----Original Message-----
>     From: Alan Ruttenberg [mailto:alanr@mumble.net
>     <mailto:alanr@mumble.net>]
>     Sent: Sunday, July 08, 2007 0:05
>     To: Hans Teijgeler
>     Cc: SW-forum; Paap, Onno
>     Subject: Re: URIs and Named Graphs
>
>     Hello Hans,
>
>     I'm trying to understand your scenario, and have some questions.
>
>     - Why would you have hundreds of quad stores instead of a single
>     larger quad
>     store with more qualifications on the queries?
>     <HT> That has to do with data ownership. On a project we have a
>     hierarchy of
>     data consolidation and integration, from many individual
>     applications (e.g.
>     process simulation, stress analysis, pump sizing) via the responsible
>     discipline group on a project (e.g. Process Engineering) to the
>     consolidating project store, and then to the quad store of the plant
>     owner/operator. The latter may have many projects underway, and
>     all projects
>     use and produce data that needs to be integrated. These apps,
>     disciplines
>     and (sub)projects often are spread around the globe.
>     The data at a lower level in the hierarchy are usually not
>     accessible to the
>     higher levels, because they are "work in progress" and should not
>     be used by
>     anybody else. Data, and their custodianship, can be "handed over"
>     to the
>     next level of consolidation. That hand-over involves a physical
>     relocation
>     from one quad store to another. The URI changes, the fragment
>     identifier
>     stays the same or gets a suffix, separated by a middle dot (00B7).
>     We keep
>     track of that location by storing that in, you guessed it, a partition
>     "redirects" (one of the nine).
>     One other reason for the many quad stores is the situation that the
>     suppliers on a project (often in the hundreds) need to share their
>     data with
>     many other customers.
>
>     - What motivates the uses if URI#fragIDs in the first place?
>     <HT> Our data model is close to 5NF, very generic and
>     fine-grained. On a
>     refinery we have zillions of physical objects that all have their
>     lifetime
>     information recorded. Any chunk of information is attributed to a
>     "temporal
>     part" of the object involved, so we have zillions of objects with
>     zillions
>     of temporal parts.
>
>     - SPARQL allows queries that span multiple named graphs - why
>     would you need
>     to have different end points for the different partitions, which I
>     understand to be equivalent to named graphs?
>     <HT> I don't know whether or not that is equivalent. If we have the
>     partition "sent messages" we will have many triples in it, forming
>     many
>     graphs that may or may not have any connection to other graphs in
>     that same
>     partition. But all triples will have a URI for "sent messages" as
>     their 4th
>     column. Would you still call that named graph?
>     I will ask Onno Paap (on cc), our "head techie", to respond on
>     your "SPARQL
>     vs multiple named graphs" remark. That is out of my league :-(
>
>     - What would you expect the behavior to be for something like
>     URI#partition#fragID? The only behavior I am aware of for hashes
>     is in the
>     context of http GET, where only the URI before the hash is sent as the
>     target of the GET. Do you depend on this behavior already? If so, to
>     accomplish what?
>     <HT> No, I was just wondering how we can fetch a particular
>     fragment ID from
>     a particular "partition", and then dereference it.
>
>     - What do browsers have to do with the scenario?
>     <HT> Not much, only indirectly, granted.
>
>     You might not want to answer these questions - in that case
>     consider them as
>     an indication of whether you are adequately communicating your
>     problem to an
>     audience familiar with SW technologies, as I consider myself to be.
>     <HT> I hope I have improved on my communication skills :-)
>
>     Regards,
>     Alan
>
>
>     On Jul 5, 2007, at 7:57 AM, Hans Teijgeler wrote:
>
>     > Hi,
>     >
>     > We ran into a problem for which I ask advice from this esteemed
>     forum.
>     >
>     > First some background information: we use the SW technologies in
>     > conjunction with a generic data model to create a distributed data
>     > base for each engineering project, involving large numbers (in the
>     > hundreds) of quad stores per project.
>     >
>     > To give an example of using a data model "underneath" OWL: normally
>     > you may see things like an <owl:Class rdf:ID="Car"/>.
>     > For us that would be: <part2:ClassOfInanimatePhysicalObject
>     > rdf:ID="Car"/> where ClassOfInanimatePhysicalObject is an entity
>     type
>     > in our data model and an owl:Class.
>     > If an application has data that must be shared, that data is
>     mapped at
>     > the source from its proprietary format to ISO 15926-7 format, and
>     > stored in a quad store that we call a "Façade".
>     > Only "owned" data are stored, other data a fetched with SPARQL for
>     > other Façades.
>     > Data can be "handed over" to another Façade, thus also handing over
>     > custody for that data.
>     > Quad stores that participate in a given project are known to a "CPF"
>     > server (Confederation of Participating Façades), where we distrubute
>     > SPARQL queries, consolidate query results, whilst controling access
>     > rights.
>     >
>     > For the Façades we use RAP, and want to use the 4th column of their
>     > Named Graphs for dividing the quad store into partitions like
>     'active
>     > data', 'archive', and the like. Actually we have nine such
>     partitions,
>     > but I won't annoy you with the details.
>     >
>     > We use URI#fragID's all over the place.
>     >
>     > The question is how we can dereference any such fragment identifiers
>     > inside a particular partition without having to have nine endpoints
>     > (which is costly and harder to manage).
>     >
>     > It would be nice if we could use composite fragment identifiers like
>     > URI#partition#fragID, but the second hash # would not be allowed. If
>     > we would use something like URI#partition__fragID that would be
>     > well-formed, but hardly usable with generic browsers (I guess).
>     >
>     > Please shed some light on this.
>     >
>     > Regards,
>     > Hans
>     >
>     > ____________________
>     > OntoConsult
>     > Hans Teijgeler
>     > ISO 15926 specialist
>     > Netherlands
>     > +31-72-509 2005
>     > www.InfowebML.ws <http://www.InfowebML.ws>
>     > hans.teijgeler@quicknet.nl <mailto:hans.teijgeler@quicknet.nl>
>     >
>     > No virus found in this outgoing message.
>     > Checked by AVG Free Edition.
>     > Version: 7.5.476 / Virus Database: 269.10.0/886 - Release Date: 04-
>     > Jul-07 13:40
>
>     No virus found in this incoming message.
>     Checked by AVG Free Edition.
>     Version: 7.5.476 / Virus Database: 269.10.2/890 - Release Date:
>     07-Jul-07
>     15:26
>
>
>     No virus found in this outgoing message.
>     Checked by AVG Free Edition.
>     Version: 7.5.476 / Virus Database: 269.10.2/890 - Release Date:
>     07-Jul-07
>     15:26
>
>
>


No virus found in this incoming message.
Checked by AVG Free Edition. 
Version: 7.5.476 / Virus Database: 269.10.2/894 - Release Date: 10-Jul-07
17:44
 

No virus found in this outgoing message.
Checked by AVG Free Edition. 
Version: 7.5.476 / Virus Database: 269.10.2/894 - Release Date: 10-Jul-07
17:44
Received on Wednesday, 11 July 2007 12:51:38 UTC