Implementing statement grouping, contexts, quads and scopes (was: Re: Out of context, in context, out of subject ????) from Alberto Reggiori on 2002-06-21 (www-rdf-interest@w3.org from June 2002)

From: Alberto Reggiori <areggiori@webweaving.org>
Date: Fri, 21 Jun 2002 13:34:49 +0200
To: Didier <didier@phpapp.org>
CC: www-rdf-interest@w3.org, www-rdf-logic@w3.org
Message-ID: <3D130F59.88A418BC@webweaving.org>
Didier wrote:

> Jonathan Borden proposed a solution to layer somthing like OWL on RDF
> with unaserted triples. The problem is always the same: what we do introduce
> as new stuff in the semweb is 'context is different for everybody'. This
> stands
> also for wondering what semantic level (thinking of layering) a statement
> refers to (or its property refers to).

too right Didier! :-) Let me present my own practical experience about using
'statement groups' (or 'contexts' or whatever jargon you want use to name
that!) in a real-world RDF application I have been writing in the last two
years . In very simple terms, I used the concept of group to 'scope' triples
when inserted, retracted or retrieved from a store (where for scoping here I
mean something like the 'variable scope' of usual programming language). Each
group is seen as resource and then can be furthered described. The same
statement can be part of several different "groups"; each statement in each
group exists independently from the others. Retract a statement from a group
does not affect the others; asserting the same statement twice in the same
group is an error. Each statement can be asserted, retracted, checked and
retrieved by specifying a group or none; if no group is specified the "world
set" is assumed. All this practically resulted in an additional bit to be
stored per statement i.e. quads

For example, in my application I need to aggregate and federate multilingual
DC/DCQ RDF descriptions of resources coming from several different education
repositories across Europe; each resource has a URL that could change and that
can not be used to identify the actual resource. Instead, I use 'scoped' RDF
descriptions where each bNode and statement is put into a specific statement
group. In other words, the resulting RDF description is about a kind of "proxy"
resource that represents the HTML file being described. The actual identifier
of the statement group uses a specific notation to distinguish and aggregate
different descriptions and allows to track "who said what". One additional
requirement in the system is that each Web resource is part of a Collection
(Web collection/site) having several others attributes.

Here is the XML/RDF snip about how I represented the information flow above in
my application e.g. the repository www.infoguide.dk posts a description of a
Web resource about 'Basketball' (in the Danish) :

<?xml version="1.0" encoding="UTF-8" ?>
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
                   xmlns:dc="http://purl.org/dc/elements/1.1/"
                   xmlns:dcq="http://purl.org/dc/terms/"
                   xmlns:dct="http://purl.org/dc/dcmitype/">
<rdf:Description rdf:bagID="urn:rdf:etb:uni-c:www.infoguide.dk:973">
   <dc:title rdf:parseType="Resource">
      <rdf:value>Basketball</rdf:value>
      <dc:language rdf:parseType="Resource">
         <dcq:RFC1766>da</dcq:RFC1766>
      </dc:language>
   </dc:title>
   <dc:identifier
rdf:resource="http://www.laer-it.dk/fag/idr/eks/basket/basket.htm"></dc:identifier>

   <dc:language rdf:parseType="Resource">
      <dcq:RFC1766>da</dcq:RFC1766>
   </dc:language>
</rdf:Description>
<dct:Resource  rdf:about="urn:rdf:etb:uni-c:www.infoguide.dk:973">
   <dcq:created rdf:parseType="Resource">
     <dcq:W3CDTF>2001-11-05</dcq:W3CDTF>
   </dcq:created>
   <dcq:isPartOf
rdf:resource="urn:rdf:etb:uni-c:www.infoguide.dk"></dcq:isPartOf>
</dct:Resource>
</rdf:RDF>

When the record get stored, the rdf:bagID gets mapped to some API operation the
triggers the insertion of statements (coming from the first description) into
the specific 'urn:rdf:etb:uni-c:www.infoguide.dk:973' statement group which
actually represents the "proxy" resource.
Now imagine that the same repository posts an Italian title for the same
resource but at the same time decided to make available an Italian version of
the HTML document as well i.e. a second URL exists for the resource. Then, a
new record would get posted as follows:

<?xml version="1.0" encoding="UTF-8" ?>
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
                   xmlns:dc="http://purl.org/dc/elements/1.1/"
                   xmlns:dcq="http://purl.org/dc/terms/"
                   xmlns:dct="http://purl.org/dc/dcmitype/">
<rdf:Description rdf:bagID="urn:rdf:etb:uni-c:www.infoguide.dk:973">
   <dc:title rdf:parseType="Resource">
      <rdf:value>Pallacanestro</rdf:value>
      <dc:language rdf:parseType="Resource">
         <dcq:RFC1766>it</dcq:RFC1766>
      </dc:language>
   </dc:title>
   <dc:identifier
rdf:resource="http://www.laer-it.dk/fag/idr/eks/basket/basket_it.htm"></dc:identifier>

   <dc:language rdf:parseType="Resource">
      <dcq:RFC1766>it</dcq:RFC1766>
   </dc:language>
</rdf:Description>
<dct:Resource  rdf:about="urn:rdf:etb:uni-c:www.infoguide.dk:973">
   <dcq:modified rdf:parseType="Resource">
     <dcq:W3CDTF>2001-11-08</dcq:W3CDTF>
   </dcq:modified>
</dct:Resource>
</rdf:RDF>

After some time, imagine that the user program found that the resource with
identifier 'urn:rdf:etb:uni-c:www.infoguide.dk:973' is about 'sports', and it
needs to display the title and the URL of the resource; then the following
query might be run on the underlying storage

SELECT
        ?title,
        ?identifier
WHERE
                 ( ?URN, <rdf:type>, <dct:Resource>),
                 ( ?x, <dc:title>, ?tt, ?URN),
                 ( ?tt, <rdf:value>, ?title, ?URN),
                 ( ?x, <dc:identifier>, ?identifier, ?URN)
USING
                 rdf for <http://www.w3.org/1999/02/22-rdf-syntax-ns#>,
                 rdfs for <http://www.w3.org/2000/01/rdf-schema#>,
                 dc for <http://purl.org/dc/elements/1.1/>,
                 dcq for <http://purl.org/dc/terms/>,
                 dct for <http://purl.org/dc/dcmitype/>,
                 etb for <http://eun.org/etb/elements/>,
                 etbthes for <http://eun.org/etb/thesaurus/elements/>

The above is an example of a RDQL/SquishQL query, with an extension to use a
fourth component (quads) to match specific triples in a specific statement
group. The result would be the table:

?title                       ?identifier
----------------------------------------------------------------------------------

Basketball            http://www.laer-it.dk/fag/idr/eks/basket/basket.htm
Pallacanestro      http://www.laer-it.dk/fag/idr/eks/basket/basket_it.htm

The above example is actually simplifying a lot the real scenario, where
instead the end application needs also to display multilingual sensitive titles
and so on.....

Sorry if I have been annoying you with this long example, but what I want to
say here is that I found "statement grouping" (or contexts) as a very crucial
and practical artifact to model complex layered RDF architectures.

Are there other people here trying similar things? Is the "grouping" thing :)
really needed?
Is my interpretation of grouping/context wrong?

> This also is similar to context as introduced in N3. You do need a
> solution to
> express what semantic context a statement belongs to ?

I am not up to speed on N3 but my guts feeling is that grouping triples in the
general sense is very much similar to give a 'context' in N3; I do not want to
argue whether concepts like 'context', 'statement group', 'dark triple' or
'provenance'  are the same or which syntax has which feature (or even that the
XML/RDF has a problem with contexts!) but I just want to underline the fact
that "quads" (s,p,o,c) turned out to be a recurrent concept that many system
end up to implement at the end of the day :)

> I think that the triple:model property introduced by Michael Sintek and
> Stephen Decker is enough simple and extendable to represent what you need.
> Basic contexts may be represented with it. But you can also subProperty it.
> This is simple and efficient.

everybody does that in a different way I presume! :) I do my way and I know
people on this list doing the same in completely different ways......
Another good practical example about how to represent and use "context" is
described by Graham Klyne in
http://public.research.mimesweeper.com/RDF/RDFContexts.html

> What people needs on the sem web: semantic grounding. Lets use subProperting
> and well-chosen facets of contextualization to define properties that
> expresses
> contexts (dark contexts, assertionnal ones, terminologicals...).

Is there any RDF vocabulary about that? Can't we build a RDF Schema that covers
dark, white or grey facets of high order statements? :-))

> With this property, statements may belong to multiple contexts enabling also
> classes as instances, and other semantic level differences. The only
> problem is
> that you must reify. But is that a problem as we will use
> human-readable/codable
> languages that will be translated to a very very very ugly/verbose/awful
> RDF/XML
> syntax that will be machine readable/processable and, in those tasks,
> what is the
> most important, 'many-context-assignable'.

yes, it is a well known problem with the current XML/RDF syntax and the RDF
Core is working on that I think :) On the other side, I could not wait too long
and in my application while parsing input records I skip all the reification
triples :))  (I do not store them);  instead I use an internal flag per triple
to keep track of the rdf:Bag which statements belongs to; the storage size is
much much smaller and the information is store anyway :-)

Can't reification be made optional?

my .03 euros

Alberto
Received on Friday, 21 June 2002 07:28:16 UTC