- From: Mark Diggory <mdiggory@MIT.EDU>
- Date: Thu, 10 Apr 2008 08:28:27 -0700
- To: www-tag@w3.org, SWIG <semantic-web@w3.org>
I speculate that a significant number of those vocabularies that use parts instead of hashes, still package the vocabulary as a whole in one file and rely on URLRewriting to deliver that whole rather than just the part. For instance http://purl.org/dc/dcmitype/Collection 302 redirects to http://dublincore.org/2008/01/14/dctype.rdf#Collection retrieving the whole vocabulary > http://purl.org/dc/dcmitype/Collection > > GET /dc/dcmitype/Collection HTTP/1.1 > Host: purl.org > ... > Referer: http://dublincore.org/documents/dcmi-terms/ > > HTTP/1.x 302 Found > ... > Location: http://dublincore.org/2008/01/14/dctype.rdf#Collection > ... > Which makes hashes more attractive because with parts, the client is returning to the server and getting a 30x for every Class/predicate when it already got all the parts via the hash url in the redirect (which,by the way Tabulator complains shouldn't be a hash uri). I think if you want to use parts and 303s, it would be a good recommendation that dereferencing should just return the part and nothing else, otherwise its obvious there is waste of querying the server and pulling the same whole vocabulary repeatedly over and over. In this particular case, I speculate DCMI thought it was getting around that issue by using purl.org which issues a 302 instead of a 303, thinking it would be cached, but doesn't appear to happen in my browser. If purl.org were returning a 303, the point I make would be more obvious. -Mark On Apr 10, 2008, at 4:59 AM, Richard Cyganiak wrote: > > You are in the fortunate position that your vocabulary is so > important that developers will simply pre-load it to work around > the inherent slowness of the 303s. It's no accident that Tabulator > and Disco come with FOAF pre-loaded. (Same story with DC.) > > It's all about latency, and each additional lookup has a negative > impact. Sure, there are technical means to work around that > (incremental rendering, HTTP pipelining etc), but let's remember > that current RDF browsers are cobbled together by people in their > free time using shoe string and duct tape, and let's not make their > job more difficult by adding additional slow-downs for no good > reason. Seriously, none of the advantages of slash URIs over hash > URIs apply in the case of publishing vocabularies. > > On 9 Apr 2008, at 09:16, Dan Brickley wrote: >> My apologies for not reviewing the document more carefully. It >> seems to be good stuff, but I missed this claim. And (as >> responsible party for FOAF ns) think this overstates the problem. >> Overstates it to a considerable degree, even. >> > >> Clients can cache the 303 redirects, and the resulting URL's >> content can also be cached. For a small ontology of 5 or 6 terms, >> this involves 5 or 6 HTTP redirects plus the main fetch. All >> cachable. For modest sized ontologies like FOAF, with ~60 terms, >> it may be a slight nuisance, ... but let's keep it in perspective: >> loading a single Flickr page probably involves more HTTP traffic. >> And for massive ontologies, like the various wordnet >> representations, breaking them up into parts has its own merits: >> why download a description of 50000 classes just because you've >> encountered @yone. >> >> If somone has specific software engineering problems with a Web >> client for FOAF data that is suffering "to a considerable degree", >> please post your code and performance stats and let's have a look >> at fixing it. Maybe http://en.wikipedia.org/wiki/HTTP_pipelining >> is something we can get wired into a few more SemWeb crawling >> environments; for instance data as much as for schemas. ~~~~~~~~~~~~~ Mark R. Diggory - DSpace Developer and Systems Manager MIT Libraries, Systems and Technology Services Massachusetts Institute of Technology
Received on Thursday, 10 April 2008 15:42:26 UTC