Re: Named Containers : a framework for aggregation and query from Seaborne, Andy on 2004-10-11 (public-rdf-dawg@w3.org from October to December 2004)

From: Seaborne, Andy <andy.seaborne@hp.com>
Date: Mon, 11 Oct 2004 15:55:13 +0100
To: Tom Adams <tom@tucanatech.com>
Cc: DAWG list <public-rdf-dawg@w3.org>
Message-ID: <416A9ED1.60904@hp.com>
Tom,

Comments inline - sorry for the delay:

Tom Adams wrote:
> Andy,
> 
> I like this a lot, I have some other general comments that may live 
> here, but probably not... I'll address them in reply to the appropriate 
> mails when I catch up on my mail.
> 
> 
>><snip/>
>>==== FROM
>>
>>This is as much about "protocol" as query but its needed for the local
>>query case where there isn't a protocol layer.
> 
> 
> I'm not sure I like the idea of putting this in the protocol, perhaps 
> establishing a default context (case 3 below) would make this valid, 
> but generally I like being able to address the container using a query. 
> I'll need to think about this some more...

See the FROM discussions elsewhere.  My current suggestion is that FROM is a 
"hint" to the query processor and we do not define how the query processor must 
connect the URI to the data (graph, graph of named containers).  Seems to be a 
very system-dependent sort of thing.  Speciifally its isn't foced to be teh URi 
of the graph - coudl be the URI of a document with a graph in it.  e.g. XHTML&RDF.

I do like having the FROM in the query - I also like to override it (protocol, 
execute this query against that DB despite what it might name in FROM (e.g. 
testing)).

> 
>>FROM establishes the data for a query.  How URIs of named containers 
>>get
>>handled is up to the implementation but some systems will load URLs and
>>files, some will attach to databases and some will do nothing much
>>because the system environment handles getting to some collection of
>>named containers.  There is no requirement to load URLs across the web.
> 
> 
> We need to be sure we separate (or rather don't preclude it) the naming 
> of containers from the protocol that is used to communicate with them. 

+1

And indeed I am not suggesting any transfer of anything other than pure RDF 
graphs.  No defined way to move the collection around.

> In Kowari/TKS we currently use the scheme of the model URI to determine 
> the transport protocol, which has the consequence of binding model 
> names to not only protocols but also servers (host is included in the 
> URI).

Intersting - so moving graph across servers chnages their URI?

I see details of naming as a system issue as there are several different 
approachs and I don't think we need to pick one.

In Jena, we have a cache and look there for anything mentioned so strictly we 
don't go straight to the URI.  Works better offline :-)

> 
> 
>>== Case 1: "FROM <u1> <u2>"
>>
>>Build a data context with two named containers named <u1> and <u2>.
> 
> 
> Do we make this an implicit AND?

Do an RDF merge to create an RDF graph.  Union (distinct bNodes) and duplicate 
surpression.

reading , I see that Kowari allows complex expressions in the "from" clause - 
using AND and OR as intersection and union.  It does duplicate surpression on 
the result doesn't it?

> 
> 
>>== Case 2: "FROM <u1>"
>>
>>Build a data context with one named containers.  Accessing the 
>>container
>>via SOURCE and accessing the aggregation sees the same RDF graph down 
>>to
>>the bNodes. If there is no SOURCE in the query, this is just querying
>>the graph identified by <u1> by however the system does it.
>>
>>== Case 3: No FROM in query.
>>
>>The implementation has to set the query data context.  This can be a
>>single graph or a collection of named containers.
>>
>>If there is no name information, SOURCE ?src ( ?x ?y ?z ) can be 
>>either:
>>
>>3a/ fail - ?src can't be bound
>>
>>3b/ match as if its a single graph but ?src is not bound.
>>
>>Note: its not possible to create a mix of named and unnamed containers
>>in the query data.  That is intentional.  Implementations may choose to
>>allow this but there would be no test cases.  Same goes for ?src being 
>>a
>>bNode and having some vocabulary to describe the container or container
>>graph.
>>
>>I'd expect the case of no FROM, and getting the query context from
>>outside to be common in the local case.
>>
>>
>>== Case 4: "FROM <u> <u>" (same URI)
>>
>>This highlights the case where two URIs name the same graph; in more
>>general cases this would have to be done outside the query language 
>>FROM
>>statement.
>>
>>For the same URI case, this is can go one of two ways:
>>
>>4a/ Creates a data context with two named containers that do not share
>>bNodes.  It's like reading in the file twice.
>>
>>4b/ Creates a data context with two named containers that name the same
>>graph.  bNodes are the same.
>>
>>4c/ Make it illegal.
>>
>>Because the same URI is used, its possible to get indistinguishable
>>query results - that's an argument in favour of 4c.
>>
>>
>>==== Systems
>><snip/>
>>== Kowari/TKS
>>
>>The "from" keyword in Kowari allows the creation of a target graph
>>through the union and intersection of sets of statements.  If bNodes 
>>are
>>kept distinct, union is RDF-merge because Kowari works on sets: the
>>union will do the duplicate suppression (could someone confirm this
>>please?)
> 
> 
> Yes, you're right, Kowari does work on sets so duplicates are 
> redundant. bNodes are unique within a server, that is, no two models on 
> the server will contain the same identifier for different bNodes.
> 
> Kowari allocates node IDs for bNodes at a server level, and makes no 
> attempt to keep them globally unique.

We (Jena) do the same (we use java.rmi.server.UID).  Do you allow bNodes to be 
moved between graphs?  If you create a union are they the same bNodes in the 
union as the original?

Server-wide isn't ideal - we have come across cases where we want a "distributed 
graph", one graph across several machines (or even JVMs).  There is no standard 
xfer format that can do this because of bNodes.  We could convert to using UUIDs 
(but not as URIs - a separate nameing space from all URIs) if this becomes a 
serious roadblock but we are loath to do anything non-standard.


A query example would be wanting to use a bNode found in one query, in the next. 
  Very reasonably locally (doing .listsProperties is a query after all).  Fails 
remotely.

> 
> 
>>In addition, the "in" keyword allows a pattern to be applied to a named
>>graph.  It appears that the graph name can't be a variable.
> 
> 
> This is a very handy feature, but no, the graph name (aka model URI) 
> cannot be a variable I don't believe. I'll give it a shot and see if we 
> can bind a value to it.

I'll be interested in the outcome of the experiment.

> 
> Cheers,
> Tom
Received on Monday, 11 October 2004 14:55:30 UTC