Re: (Lost in the noise perhaps - so asking again) - Is a trailing slash 'better' than a trailing hash for vocabs namespace IRIs? from Martynas Jusevičius on 2022-10-07 (semantic-web@w3.org from October 2022)

From: Martynas Jusevičius <martynas@atomgraph.com>
Date: Fri, 7 Oct 2022 22:37:32 +0200
To: Hugh Glaser <hugh@glasers.org>
Cc: Pat McBennett <patm@inrupt.com>, Pierre-Antoine Champin <pierre-antoine@w3.org>, semantic-web@w3.org
Message-ID: <CAE35VmwnJc4DW1C8BokmXywk=6xMPLudxCJkPG=k19Z0ccAD9g@mail.gmail.com>
Hi Hugh,

On Fri, 7 Oct 2022 at 19.59, Hugh Glaser <hugh@glasers.org> wrote:

> Does anyone have existing use cases where the ontology needs to be
> retrieved automatically, since Pierre mentions “user needs”.
> So we could look at things like how much of vocabularies/ontologies are
> fetched, how big they are, patterns of frequency of fetching, etc.
>
> In my case, all my Linked Data apps know the vocabulary/ontology from the
> start, and indeed they are pretty much used as the equivalent of schemas
> during design.
> I don’t know of many apps where even the data is acquired from remote
> access by global Follow Your Nose, never mind the vocabulary/ontology,
> although I don’t have a big sample, because I don’t know of many real world
> apps that have been deployed.
>

I happen to have published a screencast of such an app just a few days ago
:)
https://youtu.be/Dl0xIUNd4F0

Let me know what you think.


> So most of the considerations below are not of practical interest; the
> issue is more about using the best URI format, than how to resolve them.
> Which is usually slash rather than hash URIs, I think is the way Linked
> Data leans.
>
> And of course if you are doing Semantic Web that isn’t Linked Data, the
> URIs within the definition document don’t need to resolve, so you can use
> either.
>
> Best
> Hugh
>
> > On 7 Oct 2022, at 08:07, Pierre-Antoine Champin <pierre-antoine@w3.org>
> wrote:
> >
> > On 07/10/2022 01:49, Pat McBennett wrote:
> >
> >> Hi Martynas,
> >>
> >> Thanks for the feedback!
> >>
> >> But I think any vocabulary can just as easily support that same caching
> benefit with slash-based vocab namespace IRIs too, *without* having to
> require an initial HTTP request for *each* term - i.e., by simply returning
> the entire vocab on namespace IRI lookups.
> > In the general case, when you encounter an IRI of the form
> http://ex.co/x/Y, you can not assume that http://ex.co/x/ will contain
> the definition of http://ex.co/x/Ytogether with other related terms. For
> this you need,
> > a) the server to provide an affordance in the description of
> http://ex.co/x/Y pointing to http://ex.co/x/
> > b) the client to understand and follow that affordance (e.g. by using
> rdfs:isDefinedBy)
> > c) the description at http://ex.co/x/ to include some information about
> any term (e.g. http://ex.co/x/Z) in contains stating "there is nothing
> more to know about this term" (e.g. by using rdfs:isDefinedBy again)
> > d) the client to understand that statement and refrain from fetching
> http://ex.co/x/Z later on
> >
> > So you don't get "the best of both world" as automatically as you
> suggest.
> >
> >>
> >> I think QUDT is a really nice, simple example that very easily
> demonstrates exactly this today. It has a slash namespace IRI, and if I
> only ever request info on individual single vocab terms (e.g., try clicking
> now on `https://qudt.org/schema/qudt/CurrencyUnit`
> <https://qudt.org/schema/qudt/CurrencyUnit>) then yes, I'd encounter that
> 'HTTP request per lookup' you suggest (but I'd be getting precisely what I
> asked for each time!).
> > Terms of a vocabulary/ontology rarely make sense in isolation. So
> arguably, serving the entire vocabulary provides you with enough context to
> understand/use the term appropriately.
> >>
> >> But I can just as easily avoid that scenario today too by simply
> requesting the vocab's namespace IRI instead - e.g., try it right now by
> just clicking on `https://qudt.org/schema/qudt`
> <https://qudt.org/schema/qudt>. See - you get back the entire vocab
> containing all the vocab terms in a single HTTP response, which can be
> cached and keyed on that one namespace IRI (exactly as you would if they'd
> used a hash instead).
> > And then you get "bombarded with a huge document", to quote one of your
> arguments against hash-IRIs. Seems to me that you get the worst of both
> worlds here: I had to perform two HTTP queries (one on CurrencyUnit, got
> get the link to the whole vocab, and one on the vocab) instead of one (with
> hash IRIs), and I still end up with a huge ontology. (yes, playing devil's
> advocate here a little)
> >>
> >>
> >> (I'm not familiar with Jena's
> >> OntDocumentManager, but I'm sure its caching code could easily be
> extended to take advantage of servers that choose to server up slash-based
> vocabularies as QUDT demonstrates is so feasible today.)
> >>
> >> So doesn't that demonstrate my whole point - i..e, that with slashes I
> can get the best of both worlds
> > I don't think so. They are different trade-offs between providing
> targeted content vs. reducing the number of HTTP queries, and between
> working with dumb clients and/or dumb servers vs. requiring more
> coordination between them  (e.g. providing and following rdfs:isDefinedBy
> links).
> >
> >>  (i.e., precise term-specific HTTP responses if I want them, *and* the
> entire vocab in a single HTTP response if I want that too)? Using a hash
> completely locks me out, forever, of being able to achieve those lovely
> clean term-specific responses.
> >>
> >> And that's why I posit that slashes are simply 'more correct' (i.e.,
> since *only* slashes can ever allow servers to always know exactly,
> unambiguously, what a requesting client is really looking for
> > I don't by that. The server can never know exactly nor unambiguously
> what the intent of the client is, nor should it (separation of concerns).
> >>  (i.e., a term-specific response, or an entire vocab response)), and it
> does so without losing any of the benefits of using a hash.
> >>
> >> (I do, by the way, totally appreciate that servers choosing to work as
> the QUDT servers do today might require a bit more server-side work. But my
> whole point is to ask this community which option they consider "more
> >> technically correct today and forever", and not "which option is easier
> for servers or vocab creators/hosters/editors/publishers today in the
> absence of any tooling support".
> > Cant' help but cite the priority of consituencies remininded in
> https://www.w3.org/TR/design-principles/#priority-of-constituencies
> >
> > "User needs come before the needs of web page authors, which come before
> the needs of user agent implementors, which come before the needs of
> specification writers, which come before theoretical purity."
> >
> > Don't get me wrong, I get the point of thinking beyond the limitation of
> current tools. That's a valuable exercise. But practicality does also
> matter.
> >
> > Also, in a distributed setting such as the web, you can not assume that
> all other parties will always do the right thing™.
> >
> >   my 2€
> >
> >   pa
> >
> >> In other words, I think that QUDT-server-like behaviour can be provided
> easily by tooling, which I'd personally be very happy to work on
> contributing :) !).
> >>
> >> Cheers,
> >>
> >> Pat.
> >>
> >> Pat McBennett, Technical Architect
> >>
> >> Contact  | patm@inrupt.com
> >>
> >> Connect | WebID, GitHub
> >>
> >> Explore  | www.inrupt.com
> >>
> >>
> >>
> >>
> >>
> >> On Thu, Oct 6, 2022 at 3:44 PM Martynas Jusevičius <
> martynas@atomgraph.com> wrote:
> >> Hi Pat,
> >>
> >> For one thing, hash URIs are easier to cache because there is only one
> document URL. After the initial HTTP request the whole document can be
> cached with its URL as the key. All following term lookups (whose URIs
> start with that URL) will hit the cached document.
> >> Slash URIs will require an initial HTTP request for *each* term and
> will result in a cache entry per term.
> >>
> >> This is based on my experience with Jena's OntDocumentManager.
> >>
> >> Martynas
> >> atomgraph.com
> >>
> >>
> >> On Thu, Oct 6, 2022 at 4:15 PM Pat McBennett <patm@inrupt.com> wrote:
> >> So (I think!) I know all the pro's and con's of using either a trailing
> slash or a trailing hash for vocab namespace IRIs. Basically it boils down
> to hashes meaning you'll always get info on all the terms in a vocabulary,
> even if you only want info for one specific term, whereas using a slash
> means I can always get just the info for any specific, individual term I
> request.
> >>
> >> Note: using slashes provides the ability to get the best of both worlds
> - i.e., small responses when explicitly asking for info on just one term,
> but if you want info for all the terms in one HTTP response, then just
> serve up that complete vocab response when the base namespace IRI itself is
> dereferenced.
> >>
> >> Here's a nice simple illustration of the basic difference:
> >> - Slash: QUDT's 'CurrencyUnit' term (i.e., click on '
> https://qudt.org/schema/qudt/CurrencyUnit') and you get a nice clean,
> concise, and precise set of info on just the one term you asked for -
> lovely!
> >>
> >> - Hash: DPV's 'JointDataControllers' (i.e., click on '
> https://w3id.org/dpv#JointDataControllers') and you get bombarded with a
> huge document, with a daunting Table of Contents on the left, and info on
> hundreds of other terms that I didn't ask for, and so had no interest in
> whatsoever (don't get me wrong - this is fantastically detailed and
> potentially very useful information, but it's simply not what I asked for!).
> >>
> >> So based on the greater flexibility and future-proofing of using slash
> (i.e., it offers the best of both worlds, whereas hash is forever limited),
> I've become firmly of the opinion that slashes are just 'better' that
> hashes, and in fact are simply 'more correct' (i.e., all IRIs should be
> uniquely dereferencable).
> >>
> >> I also think the distinction is critically important when creating
> vocabularies intended for widespread and long-lasting use (such as the DPV
> vocab above). For throw-away or pet projects, sure, it doesn't really
> matter (yet even then, I still think slashes are the 'more correct' option).
> >>
> >> I know that the convention from the W3C has tended to be to use hashes,
> but I think in hindsight that was a mistake, and that the advice from the
> Semantic Web community as a whole should now be to adopt slashes
> consistently for all new vocabularies. (And it's not like using slash has
> no precedent - major 'authoritative' vocabs like QUDT, Schema.org, gist,
> SOSA, SSN, (even the venerable FOAF!) all use slash).
> >>
> >> I'd love to hear this group's thoughts. (For reference, I did ask the
> gist community if they recorded their discussions around their decision (in
> 2019) to formally switch gist from hash to slash (here), but it seems they
> weren't recorded, and I've also raised the issue with the DPV group
> directly too (here)).
> >>
> >> Cheers,
> >>
> >> Pat.
> >>
> >> Pat McBennett, Technical Architect
> >>
> >> Contact  | patm@inrupt.com
> >>
> >> Connect | WebID, GitHub
> >>
> >> Explore  | www.inrupt.com
> >>
> >>
> >>
> >>
> >> This e-mail, and any attachments thereto, is intended only for use by
> the addressee(s) named herein and may contain legally privileged,
> confidential and/or proprietary information. If you are not the intended
> recipient of this e-mail (or the person responsible for delivering this
> document to the intended recipient), please do not disseminate, distribute,
> print or copy this e-mail, or any attachment thereto. If you have received
> this e-mail in error, please respond to the individual sending the message,
> and permanently delete the email.
> >>
> >> This e-mail, and any attachments thereto, is intended only for use by
> the addressee(s) named herein and may contain legally privileged,
> confidential and/or proprietary information. If you are not the intended
> recipient of this e-mail (or the person responsible for delivering this
> document to the intended recipient), please do not disseminate, distribute,
> print or copy this e-mail, or any attachment thereto. If you have received
> this e-mail in error, please respond to the individual sending the message,
> and permanently delete the email.
> > <OpenPGP_0x9D1EDAEEEF98D438.asc>
>
>
>
Received on Friday, 7 October 2022 20:37:57 UTC