W3C home > Mailing lists > Public > www-rdf-dspace@w3.org > December 2003

Re: ungetable http URIs

From: Stefano Mazzocchi <stefano@apache.org>
Date: Thu, 4 Dec 2003 11:20:37 -0800
Message-Id: <F1429293-268E-11D8-AB98-000393D2CB02@apache.org>
To: SIMILE public list <www-rdf-dspace@w3.org>

On 1 Dec 2003, at 10:28, Butler, Mark wrote:

>> Joseki is not an infrastructure for placing meaningful
>> documents at URLs -
>> you just need Apache for that.
>
> Yes, understood.
>
> However the problem with just using Apache is at the moment we have
> approximately 34 files, all around 20 megabytes in size. So if we are 
> to
> expose each individual URL via Apache, then we potentially have a lot 
> of
> work to create the files that correspond to each URL, even if we use an
> automated process as just dealing with collections this big is 
> unwieldy. I'd
> have a strong preference against doing this for the demo, because it 
> seems
> to me it' just make work - there's nothing at those URLs that's not in 
> the
> serialized data, right?

Please excuse my ignorance on the topics (I'm trying to get up to 
speed, but I have still a good way to go), but it seems to me that 
between URL gettability and flat huge xml files served straight from 
disk by Apache HTTPd, there is a *long* range in between, depending on 
what granularity you want to access that data.

Let me understand this first: what kind of data would you want to 
access? what you would want to *get* from those URLs? what kind of 
granularity are you envisioning?

I'm asking because it would be possible to "fragment" the various bulky 
XML files in a sort of big tree of concepts... each one creating 
dynamically, say, an RDF-schema fragment upon request on a particular 
granular URLs... getting deeper into the URL path would restrict the 
information returned.

This means that if you want, say, the entire RDF schema, you access

  http://whatever/category/schema/

or if you want just a concept you can do

  http://whatever/category/schema/concept

note that I personally dislike the # notation that RDF/XML keeps using 
so much exactly because anchors are not supposed to be driven server 
side and prevent this kins of dynamic operations on the server side. 
But the # notation is ingrained into the very nature of RDF/XML and 
this is, IMO, really sad because will turn out to be a huge issue later 
on, expecially when RDF schemas get bigger and bigger.

Anyway, as for implementation, RDF is highly relational (and that's, 
IMO, the reason why the RDF/XML syntax feels so ugly) so it would be 
straightforward to store it into a relational database (I remember that 
last year Eric Prud'hommeaux was working on defining those abstract 
mappings).

Of course, to get the data in and out you would need a more complex 
application than simply read/write from disk, but i think it might be a 
good thing and would solve the 'gettability' issue in case of bulky 
files.

> However if we can just feed that data into something else (like 
> Joseki) and
> that automatically generates the contents of the URL, then the make 
> work
> becomes less of an overhead, but I gather this will require changing 
> the
> URIs to be query strings as you describe above?

I highly recommennd against this approach. if you want URIs to be long 
lasting, you can't associate them to the semantic of retrieval or 
you'll be stuck with it forever.

  http://whatever/category/schema/concept

is, IMHO, much more long-lasting than anything like

  http://whatever/lookup-service?get="schema/concept"

Concerns should be kept separate, even if this makes the job a harder. 
In my experience, keeping concerns separate *does* pay off later on, 
resulting in a steeper curve in the beginning, but a nicer plateau 
later.

> So perhaps the question I should be asking is for the purposes of the 
> demo,
> is there any advantage in using URLs for instance data rather than 
> URNs, if
> we assume that by making this decision now we are not committing to it 
> long
> term?

that's an entirely different story.

--
Stefano.


Received on Friday, 5 December 2003 10:24:33 EST

This archive was generated by hypermail pre-2.1.9 : Friday, 5 December 2003 10:24:49 EST