Re: (Lost in the noise perhaps - so asking again) - Is a trailing slash 'better' than a trailing hash for vocabs namespace IRIs? from Martynas Jusevičius on 2022-10-07 (semantic-web@w3.org from October 2022)

From: Martynas Jusevičius <martynas@atomgraph.com>
Date: Fri, 7 Oct 2022 10:04:19 +0200
To: Pierre-Antoine Champin <pierre-antoine@w3.org>
Cc: Pat McBennett <patm@inrupt.com>, semantic-web@w3.org
Message-ID: <CAE35Vmz9AwgSXmKPMK7r+i91qB7dPCi-eQ+k1r9k4u+jL_CqUQ@mail.gmail.com>
On Fri, Oct 7, 2022 at 9:12 AM Pierre-Antoine Champin <pierre-antoine@w3.org>
wrote:

> On 07/10/2022 01:49, Pat McBennett wrote:
>
> Hi Martynas,
>
> Thanks for the feedback!
>
> But I think any vocabulary can just as easily support that same caching
> benefit with slash-based vocab namespace IRIs too, *without* having
> to require an initial HTTP request for *each* term - i.e., by simply
> returning the entire vocab on namespace IRI lookups.
>
> In the general case, when you encounter an IRI of the form
> http://ex.co/x/Y, you can not assume that http://ex.co/x/ will contain
> the definition of http://ex.co/x/Y together with other related terms. For
> this you need,
>

This above is what I wanted to reply with.

> a) the server to provide an affordance in the description of
> http://ex.co/x/Y pointing to http://ex.co/x/
> b) the client to understand and follow that affordance (e.g. by using
> rdfs:isDefinedBy)
> c) the description at http://ex.co/x/ to include some information about
> any term (e.g. http://ex.co/x/Z) in contains stating "there is nothing
> more to know about this term" (e.g. by using rdfs:isDefinedBy again)
> d) the client to understand that statement and refrain from fetching
> http://ex.co/x/Z later on
>
> So you don't get "the best of both world" as automatically as you suggest.
>
>
> I think QUDT is a really nice, simple example that very easily
> demonstrates exactly this today. It has a slash namespace IRI, and if I
> only ever request info on individual single vocab terms (e.g., try clicking
> now on `https://qudt.org/schema/qudt/CurrencyUnit`) then yes, I'd
> encounter that 'HTTP request per lookup' you suggest (but I'd be getting
> precisely what I asked for each time!).
>
> Terms of a vocabulary/ontology rarely make sense in isolation. So
> arguably, serving the entire vocabulary provides you with enough context to
> understand/use the term appropriately.
>
>
> But I can just as easily avoid that scenario today too by simply
> requesting the vocab's namespace IRI instead - e.g., try it right now by
> just clicking on `https://qudt.org/schema/qudt`
> <https://qudt.org/schema/qudt>. See - you get back the entire vocab
> containing all the vocab terms in a single HTTP response, which can be
> cached and keyed on that one namespace IRI (exactly as you would if they'd
> used a hash instead).
>
> And then you get "bombarded with a huge document", to quote one of your
> arguments against hash-IRIs. Seems to me that you get the worst of both
> worlds here: I had to perform two HTTP queries (one on CurrencyUnit, got
> get the link to the whole vocab, and one on the vocab) instead of one (with
> hash IRIs), and I still end up with a huge ontology. (yes, playing devil's
> advocate here a little)
>
> (I'm not familiar with Jena's OntDocumentManager, but I'm sure its
> caching code could easily be extended to take advantage of servers that
> choose to server up slash-based vocabularies as QUDT demonstrates is so
> feasible today.)
> So doesn't that demonstrate my whole point - i..e, that with slashes I can
> get the best of both worlds
>
> I don't think so. They are different trade-offs between providing targeted
> content vs. reducing the number of HTTP queries, and between working with
> dumb clients and/or dumb servers vs. requiring more coordination between
> them  (e.g. providing and following rdfs:isDefinedBy links).
>
> (i.e., precise term-specific HTTP responses if I want them, *and* the
> entire vocab in a single HTTP response if I want that too)? Using a hash
> completely locks me out, forever, of being able to achieve those lovely
> clean term-specific responses.
> And that's why I posit that slashes are simply 'more correct' (i.e., since
> *only* slashes can ever allow servers to always know exactly,
> unambiguously, what a requesting client is really looking for
>
> I don't by that. The server can never know exactly nor unambiguously what
> the intent of the client is, nor should it (separation of concerns
> <https://en.wikipedia.org/wiki/Separation_of_concerns>).
>
> (i.e., a term-specific response, or an entire vocab response)), and it
> does so without losing any of the benefits of using a hash. (I do, by the
> way, totally appreciate that servers choosing to work as the QUDT servers
> do today might require a bit more server-side work. But my whole point is
> to ask this community which option they consider "more technically correct
> today and forever", and not "which option is easier for servers or vocab
> creators/hosters/editors/publishers today in the absence of any tooling
> support".
>
> Cant' help but cite the priority of consituencies remininded in
> https://www.w3.org/TR/design-principles/#priority-of-constituencies
>
> "User needs come before the needs of web page authors, which come before
> the needs of user agent implementors, which come before the needs of
> specification writers, which come before theoretical purity."
>
> Don't get me wrong, I get the point of thinking beyond the limitation of
> current tools. That's a valuable exercise. But practicality does also
> matter.
>
> Also, in a distributed setting such as the web, you can not assume that
> all other parties will always do the right thing™.
>
>   my 2€
>
>   pa
>
> In other words, I think that QUDT-server-like behaviour can be provided
> easily by tooling, which I'd personally be very happy to work on
> contributing :) !).
> Cheers,
> Pat.
>
> *Pat McBennett*, Technical Architect
>
> Contact  | patm@inrupt.com
>
> Connect | WebID <http://pmcb55.inrupt.net/profile/card#me>, GitHub
> <https://github.com/pmcb55>
>
> Explore  | www.inrupt.com
>
>
>
>
> On Thu, Oct 6, 2022 at 3:44 PM Martynas Jusevičius <martynas@atomgraph.com>
> wrote:
>
>> Hi Pat,
>>
>> For one thing, hash URIs are easier to cache because there is only one
>> document URL. After the initial HTTP request the whole document can be
>> cached with its URL as the key. All following term lookups (whose URIs
>> start with that URL) will hit the cached document.
>> Slash URIs will require an initial HTTP request for *each* term and will
>> result in a cache entry per term.
>>
>> This is based on my experience with Jena's OntDocumentManager.
>>
>> Martynas
>> atomgraph.com
>>
>>
>> On Thu, Oct 6, 2022 at 4:15 PM Pat McBennett <patm@inrupt.com> wrote:
>>
>>> So (I think!) I know all the pro's and con's of using either a trailing
>>> slash or a trailing hash for vocab namespace IRIs. Basically it boils down
>>> to hashes meaning you'll always get info on all the terms in a vocabulary,
>>> even if you only want info for one specific term, whereas using a slash
>>> means I can always get just the info for any specific, individual term I
>>> request.
>>>
>>> Note: using slashes provides the ability to get the best of both worlds
>>> - i.e., small responses when explicitly asking for info on just one term,
>>> but if you want info for all the terms in one HTTP response, then just
>>> serve up that complete vocab response when the base namespace IRI itself is
>>> dereferenced.
>>>
>>> Here's a nice simple illustration of the basic difference:
>>> - Slash: QUDT's 'CurrencyUnit' term (i.e., click on '
>>> https://qudt.org/schema/qudt/CurrencyUnit') and you get a nice clean,
>>> concise, and precise set of info on just the one term you asked for -
>>> lovely!
>>>
>>> - Hash: DPV's 'JointDataControllers' (i.e., click on '
>>> https://w3id.org/dpv#JointDataControllers') and you get bombarded with
>>> a huge document, with a daunting Table of Contents on the left, and info on
>>> hundreds of other terms that I didn't ask for, and so had no interest in
>>> whatsoever (don't get me wrong - this is fantastically detailed and
>>> potentially very useful information, but it's simply not what I asked for!).
>>>
>>> So based on the greater flexibility and future-proofing of using slash
>>> (i.e., it offers the best of both worlds, whereas hash is forever limited),
>>> I've become firmly of the opinion that slashes are just 'better' that
>>> hashes, and in fact are simply 'more correct' (i.e., all IRIs should be
>>> uniquely dereferencable).
>>>
>>> I also think the distinction is critically important when creating
>>> vocabularies intended for widespread and long-lasting use (such as the DPV
>>> vocab above). For throw-away or pet projects, sure, it doesn't really
>>> matter (yet even then, I still think slashes are the 'more correct' option).
>>>
>>> I know that the convention from the W3C has tended to be to use hashes,
>>> but I think in hindsight that was a mistake, and that the advice from the
>>> Semantic Web community as a whole should now be to adopt slashes
>>> consistently for all new vocabularies. (And it's not like using slash has
>>> no precedent - major 'authoritative' vocabs like QUDT, Schema.org, gist,
>>> SOSA, SSN, (even the venerable FOAF!) all use slash).
>>>
>>> I'd love to hear this group's thoughts. (For reference, I did ask the
>>> gist community if they recorded their discussions around their decision (in
>>> 2019) to formally switch gist from hash to slash (here
>>> <https://github.com/semanticarts/gist/issues/725>), but it seems they
>>> weren't recorded, and I've also raised the issue with the DPV group
>>> directly too (here <https://github.com/w3c/dpv/issues/53>)).
>>>
>>> Cheers,
>>>
>>> Pat.
>>>
>>> *Pat McBennett*, Technical Architect
>>>
>>> Contact  | patm@inrupt.com
>>>
>>> Connect | WebID <http://pmcb55.inrupt.net/profile/card#me>, GitHub
>>> <https://github.com/pmcb55>
>>>
>>> Explore  | www.inrupt.com
>>>
>>>
>>>
>>> This e-mail, and any attachments thereto, is intended only for use by
>>> the addressee(s) named herein and may contain legally privileged,
>>> confidential and/or proprietary information. If you are not the intended
>>> recipient of this e-mail (or the person responsible for delivering this
>>> document to the intended recipient), please do not disseminate, distribute,
>>> print or copy this e-mail, or any attachment thereto. If you have received
>>> this e-mail in error, please respond to the individual sending the message,
>>> and permanently delete the email.
>>
>>
> This e-mail, and any attachments thereto, is intended only for use by the
> addressee(s) named herein and may contain legally privileged, confidential
> and/or proprietary information. If you are not the intended recipient of
> this e-mail (or the person responsible for delivering this document to the
> intended recipient), please do not disseminate, distribute, print or copy
> this e-mail, or any attachment thereto. If you have received this e-mail in
> error, please respond to the individual sending the message, and
> permanently delete the email.
>
>
Received on Friday, 7 October 2022 08:04:44 UTC