RE: ungetable http URIs from Butler, Mark on 2003-12-08 (www-rdf-dspace@w3.org from December 2003)

From: Butler, Mark <Mark_Butler@hplb.hpl.hp.com>
Date: Mon, 8 Dec 2003 15:23:56 -0000
To: SIMILE public list <www-rdf-dspace@w3.org>
Message-ID: <E864E95CB35C1C46B72FEA0626A2E808206335@0-mail-br1.hpl.hp.com>
Hi Stefano

> > I think there are two reasons why getable / ungetable has been
> <snip>
> 
> Ok. Question: is that so bad? I mean, URIs might be designed to be 
> gettable, but not there yet... or contain stuff that Haystack could 
> parse, but not understand. What is the default behaviour of Haystack 
> when it encounters a 404? what is Haystack expecting? or what 
> would be 
> better to serve from those resource dereference?

I think people from the Haystack team would be the best people to answer
this question, especially as I may have described Haystack's behavior
incorrectly here, so perhaps they could give some background here?

I would characterise the Haystack behavior as unusual, in that other RDF
applications encounter RDF where URLs do not exist, so they just do not
try to retrieve URLs. However that's not to say those applications have
got it right - it is arguable that Haystack has got it right and the
others have got it wrong. 

The difference here is Haystack has adopted a particular processing
model. One thing I've noted before is there is no standard processing
model for the Semantic Web, so I have a worry this could make some of
the interoperability and seamless discovery described in some of the SW
scenarios hard to achieve. By processing model, I mean what do you do
once you've received a piece of RDF e.g.

- Do you retrieve the schema from the namespace?

- Do you retrieve resources from URLs used in the RDF? 

- If so how do you treat the resources - as additional subgraphs of RDF
or as resources, such as HTML pages or JPGs that are described by the
RDF? Do you distinguish by MIME types etc

- Do you query some kind of webservice to determine information
additional information about a URL? Processing models like this may be
necessary to allow discovery of metadata about resources where the
metadata has not been created by the resource owner. 

> > Assuming we stick with URLs then there are two ways to overcome this
> > problem a) put something behind the URL or b) make the URL 
> to a URN, so
> > Haystack doesn't try to retrieve anything so the revised 
> data tries to
> > take approach a).
> 
> I'm sorry, I can parse the above but I can't get around the logic 
> behind it. Can you elaborate more on the alternatives you envision?

I'll try to separate the how from the why. First how

a) we have metadata URLs like 
http://web.mit.edu/simile/metadata/metadatasubset#instance1 and
http://web.mit.edu/simile/metadata/metadatasubset#instance2
so to avoid the problems of not being able to retrieve these resources
we place a piece of HTML at
http://web.mit.edu/simile/metadata/metadatasubset

b) we change the metadata URLs to
http://web.mit.edu/simile/metadata/metadatasubset#instance1
to
urn:x-mit-hp-w3c-simile/metadatas/metadatasubset:instance1
so software knows there is no further information available. 

As for the why, the advantage of approach a) over b) is we can put some
data at 
http://web.mit.edu/simile/metadata/metadatasubset#instance1 
in the future.

However the advantage of b) over a) is we can regard a) as a cheat, as
the data at 
http://web.mit.edu/simile/metadata/metadatasubset
is human readable rather than machine readable. At least with b) we have
unambiguously indicated there is no information available at that URL.

> RE: spreading data over multiple namespaces
> 
> I'm not sure I follow here, either. 
> if you are looking up the 
> concept, 
> you might just want the RDFSchema for that concept, maybe 
> with all the 
> RDF references that that concept builds upon or references. Doesn't 
> have to be the entire infoset.

Well I re-read my argument, and you are right, it doesn't really stand
up, because as you note it might be sufficient to retrieve a schema
subset. 

From my previous use of RDF, I've built up a prejudice that namespaces
are a source of complexity, so I was trying to argue that we should
follow Occam's Razor here and reduce the number of namespaces where
possible. However as you note really this is to about the role people
assign to separators like hashes, and when you look at it from that
point of view it becomes very arbitary.  

> whether "/" or "#" or "?" 
> is better than the other, well, it's highly debatable and this is 
> probably not the right place either.

agreed :)
 
> Don't get me wrong, I'm a strong advocate of semi-structured 
> repository 
> and you know that, but still I think that RDF is a perfect candidate 
> for relational technology.... where general XML is definately not.

I'm not quite sure what you mean here by "RDF is a perfect candidate for
relational technology" - can you give more details?

> Can you explain the rationale bethind this? RDF is XML, XML 
> is a tree, 
> so you need a tree-oriented database? is that the syllogism in place?

I think I would avoid strongly avoid the use of the word syllogism when
discussing RDF at the moment after the Shirky article :) 

But yes, the argument that I was trying to advance was

- libraries deal with heterogeneous, semi-structured data
- they have a history of using hierarchical databases
- due to fashion, although hierarchical databases were popular in the
60's and 70's, they were largely superseded by relational databases. 
- recently there has been renewed interest in hierarchical databases as
they can be used to store XML. 
- Therefore some projects in the library community are using XML
databases, as they are modern hierarchical databases, but also because
the library community is increasingly working with XML.

For some relevant links, see
OpenIsis: Open source database for libraries. 
http://openisis.org/Doc/
UCAI GORT project. 
http://gort.ucsd.edu/ucai/
Harvard TED: Templated Database. 
http://hul.harvard.edu/ois/systems/ted/index.html

Of course, the relationship between XML and RDF is another story. 

> What do you mean with "persistant RDF approach".

We use an RDF model to store all the data structures of the application,
but instead of storing that model in main memory we persist it in a
relational database. 
 
> > For example Jena uses JDBC to persist RDF models using 
> databases like
> > MySQL or Postgres, then Joseki makes those databases 
> querable via the
> > web.
> 
> That's what I was thinking. I think it makes perfect sense to use 
> relational technology for RDF, given its nature.

You might want to take a look at how Jena stores RDF models in databases
- see 
http://jena.sourceforge.net/DB/
as the resulting databases look quite different to a relational
database, even though they are represented using one. 

Dr Mark H. Butler
Research Scientist                HP Labs Bristol
mark-h_butler@hp.com
Internet: http://www-uk.hpl.hp.com/people/marbut/
Received on Monday, 8 December 2003 10:29:25 UTC