Re: Announcement: "Cool URIs for the Semantic Web" - W3C SWEO IG Note

I speculate that a significant number of those vocabularies that use  
parts instead of hashes, still package the vocabulary as a whole in  
one file and rely on URLRewriting to deliver that whole rather than  
just the part.  For instance

http://purl.org/dc/dcmitype/Collection

302 redirects to

http://dublincore.org/2008/01/14/dctype.rdf#Collection

retrieving the whole vocabulary

> http://purl.org/dc/dcmitype/Collection
>
> GET /dc/dcmitype/Collection HTTP/1.1
> Host: purl.org
> ...
> Referer: http://dublincore.org/documents/dcmi-terms/
>
> HTTP/1.x 302 Found
> ...
> Location: http://dublincore.org/2008/01/14/dctype.rdf#Collection
> ...
>

Which makes hashes more attractive because with parts, the client is  
returning to the server and getting a 30x for every Class/predicate  
when it already got all the parts via the hash url in the redirect  
(which,by the way Tabulator complains shouldn't be a hash uri).  I  
think if you want to use parts and 303s, it would be a good  
recommendation that dereferencing should just return the part and  
nothing else, otherwise its obvious there is waste of querying the  
server and pulling the same whole vocabulary repeatedly over and  
over. In this particular case, I speculate DCMI thought it was  
getting around that issue by using purl.org which issues a 302  
instead of a 303, thinking it would be cached, but doesn't appear to  
happen in my browser.  If purl.org were returning a 303, the point I  
make would be more obvious.

-Mark

On Apr 10, 2008, at 4:59 AM, Richard Cyganiak wrote:
>
> You are in the fortunate position that your vocabulary is so  
> important that developers will simply pre-load it to work around  
> the inherent slowness of the 303s. It's no accident that Tabulator  
> and Disco come with FOAF pre-loaded. (Same story with DC.)
>
> It's all about latency, and each additional lookup has a negative  
> impact. Sure, there are technical means to work around that  
> (incremental rendering, HTTP pipelining etc), but let's remember  
> that current RDF browsers are cobbled together by people in their  
> free time using shoe string and duct tape, and let's not make their  
> job more difficult by adding additional slow-downs for no good  
> reason. Seriously, none of the advantages of slash URIs over hash  
> URIs apply in the case of publishing vocabularies.
>
> On 9 Apr 2008, at 09:16, Dan Brickley wrote:
>> My apologies for not reviewing the document more carefully. It  
>> seems to be good stuff, but I missed this claim. And (as  
>> responsible party for FOAF ns) think this overstates the problem.   
>> Overstates it to a considerable degree, even.
>>
>
>> Clients can cache the 303 redirects, and the resulting URL's  
>> content can also be cached. For a small ontology of 5 or 6 terms,  
>> this involves 5 or 6 HTTP redirects plus the main fetch. All  
>> cachable. For modest sized ontologies like FOAF, with ~60 terms,  
>> it may be a slight nuisance, ... but let's keep it in perspective:  
>> loading a single Flickr page probably involves more HTTP traffic.  
>> And for massive ontologies, like the various wordnet  
>> representations, breaking them up into parts has its own merits:  
>> why download a description of 50000 classes just because you've  
>> encountered @yone.
>>
>> If somone has specific software engineering problems with a Web  
>> client for FOAF data that is suffering "to a considerable degree",  
>> please post your code and performance stats and let's have a look  
>> at fixing it. Maybe http://en.wikipedia.org/wiki/HTTP_pipelining  
>> is something we can get wired into a few more SemWeb crawling  
>> environments; for instance data as much as for schemas.

~~~~~~~~~~~~~
Mark R. Diggory - DSpace Developer and Systems Manager
MIT Libraries, Systems and Technology Services
Massachusetts Institute of Technology

Received on Thursday, 10 April 2008 15:42:59 UTC