Re: Ideas for store for IFP smushing from Phil Dawes on 2004-08-09 (www-rdf-interest@w3.org from August 2004)

From: Phil Dawes <pdawes@users.sourceforge.net>
Date: Mon, 9 Aug 2004 15:43:39 +0000
To: Leo Sauermann <leo@gnowsis.com>
Cc: Phil Dawes <pdawes@users.sourceforge.net>, www-rdf-interest@w3.org
Message-ID: <16663.39851.213196.582517@gargle.gargle.HOWL>
Hi Leo,

Leo Sauermann writes:
 > 
 > I had a discussion with Joe Geldart about exactly this thing on friday.
 > 
 > I think smushing is a good idea to integrate data-stores and based on 
 > IFPs, it should be a good technique.
 > But I would base it on OWL rules that say something like 
 > "owl:SameIndividualAs" when IFP's are the same.
 > 

Do you mean leave the triples as-is, but add SIA statements to store?

 > sure, the OWL approach would be harder to program as you would have to 
 > do the evaluation during read access of the database and not write 
 > access. Smushing on write is easier.
 > 

Do you have an algorithm for doing an SIA-inclusive query? I'd be
eager to see if it would meet my performance requirements. Note that
I'm merely exploring the :uname option - I'm not emotionally attached
to it in any way.

 > about properties and classes:
 > you cannot dump all URIs, thats not wise. you will lose properties and 
 > classes. It is impossible to differentiate between identification URIs 
 > and vocabulary uris.

The plan is not to loose the URIs. Just make them an IFP via the
:uname property.

 > so I would suggest not to trash the uris but to add new data to existing 
 > URIs, when you have a match by IFP (although that may also have easter 
 > eggs, the most elegant way would be the OWL way)

Problem with this approach is that you either nominate a 'primary
uri', or duplicate all the statements. I am exploring the indirection
between resource and URI that allows you to have 1 resource with
multiple URIs. Representing the URIs as literals to the :uname
property neatly side-steps the triple-bloat problem.


 > also, I am in the "religious group" of URI resoluters, thereby i like to 
 > parse urls to check what server they are hosted on and to query more 
 > information about the resource by contacting the server (f.e. over 
 > URIQA). From this view, I do not like the identification of resources by 
 > IFP.

True, and I agree that QAable URIs seem the most adequate solution to
this problem. Unfortunately (from this standpoint) there is already a
large number of resources that don't have URIs, and this number is
likely to grow massively I think. I need a way to work with this data.

Why grow massively? Because in a decentralized world it's easier to
reference resources using IFPs rather than agreeing on URIs.

E.g. I know a person that I call 'fred' and who has an email address
fred@example.com. Unfortunately that person doesn't have a foaf file,
or a URI to identify himself.

I can add to my data:
..
   <foaf:knows foaf:nick="Fred" foaf:mbox="fred@example.com"/>
..

If fred (or someone else) ever publishes a foaf file, my data will
automatically link with his via IFP.  
This disconnection simply isn't possible using URIs - either fred has
come up with a URI when I need it, and then subsequently use it in his
foaf file, or I make up a URI, and he has to know about it and use it
later when he authors his foaf file.


 > I like to write URLs on things and give them to people, so that they can 
 > get curious, and get more information by entering the url somewhere.
 > 

You still can with the :uname approach - the URI is not lost, merely
relegated to an IFP.

Cheers,

Phil

 > 
 > Es begab sich aber zu der Zeit 06.08.2004 14:01,  da Phil Dawes schrieb:
 > 
 > >Hi RDF Interest,
 > >
 > >Have been thinking a lot recently about techniques for smushing IFP
 > >based data (inc. foaf, doap etc..), and Sandro's uname paper[1] got me
 > >thinking about the optimisation possibilites of an extra layer of
 > >indirection.
 > >
 > >I'm putting together a prototype store (derived from the design of
 > >Steve Harris' excellent 3store) that doesnt expose URIs to the user
 > >except through explicit queries containing (?foo :uname ?uri). This
 > >disconnection between resource and URI offers some reasonably compact
 > >strategies for IFP and owl:sameAs smushing (since the resource -> URI
 > >can be 1:N without duplicating triples). It also provides optimisation
 > >possibilities for cases where the client isn't interested in URIs, but
 > >just the structure of the data and its literals. (I've found this to
 > >be the case when obtaining RDF infromation for e.g. displaying in a
 > >UI).
 > >
 > >Instead of BNodes, I'm using generated internal ids with a limited
 > >lifespan (remain constant between smushing passes). This is mainly
 > >because you can't use bnodes for properties, but also because it
 > >enables a client to efficiently submit multiple queries using the
 > >short-lived internal URIs for speed.
 > >
 > >The downside to this approach is that I can't think of a way to
 > >efficiently undo a smushing pass, which you'd want to do if e.g. you
 > >unasserted an IFP.
 > >
 > >Has anybody else explored these possibilities? Anything I ought to
 > >consider?
 > >
 > >Cheers,
 > >
 > >Phil
 > >
 > >[1] http://www.w3.org/2001/12/uname/
 > >
 > >  
 > >
 > 
 > 
 > 
 >
Received on Monday, 9 August 2004 19:15:49 UTC