W3C home > Mailing lists > Public > www-rdf-dspace@w3.org > December 2003

Re: ungetable http URIs

From: Stefano Mazzocchi <stefano@apache.org>
Date: Thu, 4 Dec 2003 11:20:37 -0800
Message-Id: <F1429293-268E-11D8-AB98-000393D2CB02@apache.org>
To: SIMILE public list <www-rdf-dspace@w3.org>

On 1 Dec 2003, at 10:28, Butler, Mark wrote:

>> Joseki is not an infrastructure for placing meaningful
>> documents at URLs -
>> you just need Apache for that.
> Yes, understood.
> However the problem with just using Apache is at the moment we have
> approximately 34 files, all around 20 megabytes in size. So if we are 
> to
> expose each individual URL via Apache, then we potentially have a lot 
> of
> work to create the files that correspond to each URL, even if we use an
> automated process as just dealing with collections this big is 
> unwieldy. I'd
> have a strong preference against doing this for the demo, because it 
> seems
> to me it' just make work - there's nothing at those URLs that's not in 
> the
> serialized data, right?

Please excuse my ignorance on the topics (I'm trying to get up to 
speed, but I have still a good way to go), but it seems to me that 
between URL gettability and flat huge xml files served straight from 
disk by Apache HTTPd, there is a *long* range in between, depending on 
what granularity you want to access that data.

Let me understand this first: what kind of data would you want to 
access? what you would want to *get* from those URLs? what kind of 
granularity are you envisioning?

I'm asking because it would be possible to "fragment" the various bulky 
XML files in a sort of big tree of concepts... each one creating 
dynamically, say, an RDF-schema fragment upon request on a particular 
granular URLs... getting deeper into the URL path would restrict the 
information returned.

This means that if you want, say, the entire RDF schema, you access


or if you want just a concept you can do


note that I personally dislike the # notation that RDF/XML keeps using 
so much exactly because anchors are not supposed to be driven server 
side and prevent this kins of dynamic operations on the server side. 
But the # notation is ingrained into the very nature of RDF/XML and 
this is, IMO, really sad because will turn out to be a huge issue later 
on, expecially when RDF schemas get bigger and bigger.

Anyway, as for implementation, RDF is highly relational (and that's, 
IMO, the reason why the RDF/XML syntax feels so ugly) so it would be 
straightforward to store it into a relational database (I remember that 
last year Eric Prud'hommeaux was working on defining those abstract 

Of course, to get the data in and out you would need a more complex 
application than simply read/write from disk, but i think it might be a 
good thing and would solve the 'gettability' issue in case of bulky 

> However if we can just feed that data into something else (like 
> Joseki) and
> that automatically generates the contents of the URL, then the make 
> work
> becomes less of an overhead, but I gather this will require changing 
> the
> URIs to be query strings as you describe above?

I highly recommennd against this approach. if you want URIs to be long 
lasting, you can't associate them to the semantic of retrieval or 
you'll be stuck with it forever.


is, IMHO, much more long-lasting than anything like


Concerns should be kept separate, even if this makes the job a harder. 
In my experience, keeping concerns separate *does* pay off later on, 
resulting in a steeper curve in the beginning, but a nicer plateau 

> So perhaps the question I should be asking is for the purposes of the 
> demo,
> is there any advantage in using URLs for instance data rather than 
> URNs, if
> we assume that by making this decision now we are not committing to it 
> long
> term?

that's an entirely different story.


Received on Friday, 5 December 2003 10:24:33 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 6 January 2015 20:13:09 UTC