- From: Stefano Mazzocchi <stefano@apache.org>
- Date: Thu, 4 Dec 2003 11:20:37 -0800
- To: SIMILE public list <www-rdf-dspace@w3.org>
- Message-Id: <F1429293-268E-11D8-AB98-000393D2CB02@apache.org>
On 1 Dec 2003, at 10:28, Butler, Mark wrote: >> Joseki is not an infrastructure for placing meaningful >> documents at URLs - >> you just need Apache for that. > > Yes, understood. > > However the problem with just using Apache is at the moment we have > approximately 34 files, all around 20 megabytes in size. So if we are > to > expose each individual URL via Apache, then we potentially have a lot > of > work to create the files that correspond to each URL, even if we use an > automated process as just dealing with collections this big is > unwieldy. I'd > have a strong preference against doing this for the demo, because it > seems > to me it' just make work - there's nothing at those URLs that's not in > the > serialized data, right? Please excuse my ignorance on the topics (I'm trying to get up to speed, but I have still a good way to go), but it seems to me that between URL gettability and flat huge xml files served straight from disk by Apache HTTPd, there is a *long* range in between, depending on what granularity you want to access that data. Let me understand this first: what kind of data would you want to access? what you would want to *get* from those URLs? what kind of granularity are you envisioning? I'm asking because it would be possible to "fragment" the various bulky XML files in a sort of big tree of concepts... each one creating dynamically, say, an RDF-schema fragment upon request on a particular granular URLs... getting deeper into the URL path would restrict the information returned. This means that if you want, say, the entire RDF schema, you access http://whatever/category/schema/ or if you want just a concept you can do http://whatever/category/schema/concept note that I personally dislike the # notation that RDF/XML keeps using so much exactly because anchors are not supposed to be driven server side and prevent this kins of dynamic operations on the server side. But the # notation is ingrained into the very nature of RDF/XML and this is, IMO, really sad because will turn out to be a huge issue later on, expecially when RDF schemas get bigger and bigger. Anyway, as for implementation, RDF is highly relational (and that's, IMO, the reason why the RDF/XML syntax feels so ugly) so it would be straightforward to store it into a relational database (I remember that last year Eric Prud'hommeaux was working on defining those abstract mappings). Of course, to get the data in and out you would need a more complex application than simply read/write from disk, but i think it might be a good thing and would solve the 'gettability' issue in case of bulky files. > However if we can just feed that data into something else (like > Joseki) and > that automatically generates the contents of the URL, then the make > work > becomes less of an overhead, but I gather this will require changing > the > URIs to be query strings as you describe above? I highly recommennd against this approach. if you want URIs to be long lasting, you can't associate them to the semantic of retrieval or you'll be stuck with it forever. http://whatever/category/schema/concept is, IMHO, much more long-lasting than anything like http://whatever/lookup-service?get="schema/concept" Concerns should be kept separate, even if this makes the job a harder. In my experience, keeping concerns separate *does* pay off later on, resulting in a steeper curve in the beginning, but a nicer plateau later. > So perhaps the question I should be asking is for the purposes of the > demo, > is there any advantage in using URLs for instance data rather than > URNs, if > we assume that by making this decision now we are not committing to it > long > term? that's an entirely different story. -- Stefano.
Attachments
- application/pkcs7-signature attachment: smime.p7s
Received on Friday, 5 December 2003 10:24:33 UTC