Re: (Lost in the noise perhaps - so asking again) - Is a trailing slash 'better' than a trailing hash for vocabs namespace IRIs? from Pat McBennett on 2022-10-06 (semantic-web@w3.org from October 2022)

From: Pat McBennett <patm@inrupt.com>
Date: Fri, 7 Oct 2022 00:49:39 +0100
To: Martynas Jusevičius <martynas@atomgraph.com>
Cc: semantic-web@w3.org
Message-ID: <CABgQ8mLasA80YWixJcCWLFFm8pPdjYuksJ_JKBqTSMjnHPOF8g@mail.gmail.com>
Hi Martynas,

Thanks for the feedback!

But I think any vocabulary can just as easily support that same caching
benefit with slash-based vocab namespace IRIs too, *without* having
to require an initial HTTP request for *each* term - i.e., by simply
returning the entire vocab on namespace IRI lookups.

I think QUDT is a really nice, simple example that very easily demonstrates
exactly this today. It has a slash namespace IRI, and if I only ever
request info on individual single vocab terms (e.g., try clicking now on `
https://qudt.org/schema/qudt/CurrencyUnit`) then yes, I'd encounter that
'HTTP request per lookup' you suggest (but I'd be getting precisely what I
asked for each time!).

But I can just as easily avoid that scenario today too by simply requesting
the vocab's namespace IRI instead - e.g., try it right now by just clicking
on `https://qudt.org/schema/qudt`. See - you get back the entire vocab
containing all the vocab terms in a single HTTP response, which can be
cached and keyed on that one namespace IRI (exactly as you would if they'd
used a hash instead). (I'm not familiar with Jena's OntDocumentManager, but
I'm sure its caching code could easily be extended to take advantage of
servers that choose to server up slash-based vocabularies as QUDT
demonstrates is so feasible today.)

So doesn't that demonstrate my whole point - i..e, that with slashes I can
get the best of both worlds (i.e., precise term-specific HTTP responses if
I want them, *and* the entire vocab in a single HTTP response if I want
that too)? Using a hash completely locks me out, forever, of being able to
achieve those lovely clean term-specific responses.

And that's why I posit that slashes are simply 'more correct' (i.e., since
*only* slashes can ever allow servers to always know exactly,
unambiguously, what a requesting client is really looking for (i.e., a
term-specific response, or an entire vocab response)), and it does so
without losing any of the benefits of using a hash. (I do, by the way,
totally appreciate that servers choosing to work as the QUDT servers do
today might require a bit more server-side work. But my whole point is to
ask this community which option they consider "more technically correct
today and forever", and not "which option is easier for servers or vocab
creators/hosters/editors/publishers today in the absence of any tooling
support". In other words, I think that QUDT-server-like behaviour can be
provided easily by tooling, which I'd personally be very happy to work on
contributing :) !).

Cheers,

Pat.

*Pat McBennett*, Technical Architect

Contact  | patm@inrupt.com

Connect | WebID <http://pmcb55.inrupt.net/profile/card#me>, GitHub
<https://github.com/pmcb55>

Explore  | www.inrupt.com




On Thu, Oct 6, 2022 at 3:44 PM Martynas Jusevičius <martynas@atomgraph.com>
wrote:

> Hi Pat,
>
> For one thing, hash URIs are easier to cache because there is only one
> document URL. After the initial HTTP request the whole document can be
> cached with its URL as the key. All following term lookups (whose URIs
> start with that URL) will hit the cached document.
> Slash URIs will require an initial HTTP request for *each* term and will
> result in a cache entry per term.
>
> This is based on my experience with Jena's OntDocumentManager.
>
> Martynas
> atomgraph.com
>
>
> On Thu, Oct 6, 2022 at 4:15 PM Pat McBennett <patm@inrupt.com> wrote:
>
>> So (I think!) I know all the pro's and con's of using either a trailing
>> slash or a trailing hash for vocab namespace IRIs. Basically it boils down
>> to hashes meaning you'll always get info on all the terms in a vocabulary,
>> even if you only want info for one specific term, whereas using a slash
>> means I can always get just the info for any specific, individual term I
>> request.
>>
>> Note: using slashes provides the ability to get the best of both worlds -
>> i.e., small responses when explicitly asking for info on just one term, but
>> if you want info for all the terms in one HTTP response, then just serve up
>> that complete vocab response when the base namespace IRI itself is
>> dereferenced.
>>
>> Here's a nice simple illustration of the basic difference:
>> - Slash: QUDT's 'CurrencyUnit' term (i.e., click on '
>> https://qudt.org/schema/qudt/CurrencyUnit') and you get a nice clean,
>> concise, and precise set of info on just the one term you asked for -
>> lovely!
>>
>> - Hash: DPV's 'JointDataControllers' (i.e., click on '
>> https://w3id.org/dpv#JointDataControllers') and you get bombarded with a
>> huge document, with a daunting Table of Contents on the left, and info on
>> hundreds of other terms that I didn't ask for, and so had no interest in
>> whatsoever (don't get me wrong - this is fantastically detailed and
>> potentially very useful information, but it's simply not what I asked for!).
>>
>> So based on the greater flexibility and future-proofing of using slash
>> (i.e., it offers the best of both worlds, whereas hash is forever limited),
>> I've become firmly of the opinion that slashes are just 'better' that
>> hashes, and in fact are simply 'more correct' (i.e., all IRIs should be
>> uniquely dereferencable).
>>
>> I also think the distinction is critically important when creating
>> vocabularies intended for widespread and long-lasting use (such as the DPV
>> vocab above). For throw-away or pet projects, sure, it doesn't really
>> matter (yet even then, I still think slashes are the 'more correct' option).
>>
>> I know that the convention from the W3C has tended to be to use hashes,
>> but I think in hindsight that was a mistake, and that the advice from the
>> Semantic Web community as a whole should now be to adopt slashes
>> consistently for all new vocabularies. (And it's not like using slash has
>> no precedent - major 'authoritative' vocabs like QUDT, Schema.org, gist,
>> SOSA, SSN, (even the venerable FOAF!) all use slash).
>>
>> I'd love to hear this group's thoughts. (For reference, I did ask the
>> gist community if they recorded their discussions around their decision (in
>> 2019) to formally switch gist from hash to slash (here
>> <https://github.com/semanticarts/gist/issues/725>), but it seems they
>> weren't recorded, and I've also raised the issue with the DPV group
>> directly too (here <https://github.com/w3c/dpv/issues/53>)).
>>
>> Cheers,
>>
>> Pat.
>>
>> *Pat McBennett*, Technical Architect
>>
>> Contact  | patm@inrupt.com
>>
>> Connect | WebID <http://pmcb55.inrupt.net/profile/card#me>, GitHub
>> <https://github.com/pmcb55>
>>
>> Explore  | www.inrupt.com
>>
>>
>>
>> This e-mail, and any attachments thereto, is intended only for use by the
>> addressee(s) named herein and may contain legally privileged, confidential
>> and/or proprietary information. If you are not the intended recipient of
>> this e-mail (or the person responsible for delivering this document to the
>> intended recipient), please do not disseminate, distribute, print or copy
>> this e-mail, or any attachment thereto. If you have received this e-mail in
>> error, please respond to the individual sending the message, and
>> permanently delete the email.
>
>

-- 
This e-mail, and any attachments thereto, is intended only for use by the 
addressee(s) named herein and may contain legally privileged, confidential 
and/or proprietary information. If you are not the intended recipient of 
this e-mail (or the person responsible for delivering this document to the 
intended recipient), please do not disseminate, distribute, print or copy 
this e-mail, or any attachment thereto. If you have received this e-mail in 
error, please respond to the individual sending the message, and 
permanently delete the email.
Received on Thursday, 6 October 2022 23:50:04 UTC