Re: (Lost in the noise perhaps - so asking again) - Is a trailing slash 'better' than a trailing hash for vocabs namespace IRIs? from Pierre-Antoine Champin on 2022-10-07 (semantic-web@w3.org from October 2022)

From: Pierre-Antoine Champin <pierre-antoine@w3.org>
Date: Fri, 7 Oct 2022 10:24:06 +0200
To: Pat McBennett <patm@inrupt.com>
Cc: semantic-web@w3.org
Message-ID: <d9285470-7589-5e33-01c3-249c79d659a8@w3.org>
oops, slight typo in my previous email

On 07/10/2022 09:07, Pierre-Antoine Champin wrote:
>
> On 07/10/2022 01:49, Pat McBennett wrote:
>
>> Hi Martynas,
>>
>> Thanks for the feedback!
>>
>> But I think any vocabulary can just as easily support that same 
>> caching benefit with slash-based vocab namespace IRIs too, *without* 
>> having to require an initial HTTP request for *each* term - i.e., by 
>> simply returning the entire vocab on namespace IRI lookups.
> In the general case, when you encounter an IRI of the form 
> http://ex.co/x/Y, you can not assume that http://ex.co/x/ will contain 
> the definition of http://ex.co/x/Y together with other related terms. 
> For this you need,
>
> a) the server to provide an affordance in the description of 
> http://ex.co/x/Y pointing to http://ex.co/x/

> b) the client to understand and follow that affordance (e.g. by using 
> rdfs:isDefinedBy)
>
the parenthesis "(e.g. by using rdfs:isDefinedBy)" was supposed to be in 
a), not in b)...
sorry about that
>
> c) the description at http://ex.co/x/ to include some information 
> about any term (e.g. http://ex.co/x/Z) in contains stating "there is 
> nothing more to know about this term" (e.g. by using rdfs:isDefinedBy 
> again)
> d) the client to understand that statement and refrain from fetching 
> http://ex.co/x/Z later on
>
> So you don't get "the best of both world" as automatically as you suggest.
>
>>
>> I think QUDT is a really nice, simple example that very easily 
>> demonstrates exactly this today. It has a slash namespace IRI, and if 
>> I only ever request info on individual single vocab terms (e.g., try 
>> clicking now on `https://qudt.org/schema/qudt/CurrencyUnit`) then 
>> yes, I'd encounter that 'HTTP request per lookup' you suggest (but 
>> I'd be getting precisely what I asked for each time!).
> Terms of a vocabulary/ontology rarely make sense in isolation. So 
> arguably, serving the entire vocabulary provides you with enough 
> context to understand/use the term appropriately.
>>
>> But I can just as easily avoid that scenario today too by simply 
>> requesting the vocab's namespace IRI instead - e.g., try it right now 
>> by just clicking on `https://qudt.org/schema/qudt` 
>> <https://qudt.org/schema/qudt`>. See - you get back the entire vocab 
>> containing all the vocab terms in a single HTTP response, which can 
>> be cached and keyed on that one namespace IRI (exactly as you would 
>> if they'd used a hash instead).
> And then you get "bombarded with a huge document", to quote one of 
> your arguments against hash-IRIs. Seems to me that you get the worst 
> of both worlds here: I had to perform two HTTP queries (one on 
> CurrencyUnit, got get the link to the whole vocab, and one on the 
> vocab) instead of one (with hash IRIs), and I still end up with a huge 
> ontology. (yes, playing devil's advocate here a little)
>> (I'm not familiar with Jena's OntDocumentManager, but I'm sure its 
>> caching code could easily be extended to take advantage of servers 
>> that choose to server up slash-based vocabularies as QUDT 
>> demonstrates is so feasible today.)
>> So doesn't that demonstrate my whole point - i..e, that with slashes 
>> I can get the best of both worlds
>
> I don't think so. They are different trade-offs between providing 
> targeted content vs. reducing the number of HTTP queries, and between 
> working with dumb clients and/or dumb servers vs. requiring more 
> coordination between them  (e.g. providing and following 
> rdfs:isDefinedBy links).
>
>> (i.e., precise term-specific HTTP responses if I want them, *and* the 
>> entire vocab in a single HTTP response if I want that too)? Using a 
>> hash completely locks me out, forever, of being able to achieve those 
>> lovely clean term-specific responses.
>> And that's why I posit that slashes are simply 'more correct' (i.e., 
>> since *only* slashes can ever allow servers to always know exactly, 
>> unambiguously, what a requesting client is really looking for
> I don't by that. The server can never know exactly nor unambiguously 
> what the intent of the client is, nor should it (separation of 
> concerns <https://en.wikipedia.org/wiki/Separation_of_concerns>).
>> (i.e., a term-specific response, or an entire vocab response)), and 
>> it does so without losing any of the benefits of using a hash. (I do, 
>> by the way, totally appreciate that servers choosing to work as the 
>> QUDT servers do today might require a bit more server-side work. But 
>> my whole point is to ask this community which option they consider 
>> "more technically correct today and forever", and not "which option 
>> is easier for servers or vocab creators/hosters/editors/publishers 
>> today in the absence of any tooling support".
>
> Cant' help but cite the priority of consituencies remininded in 
> https://www.w3.org/TR/design-principles/#priority-of-constituencies

>
> "User needs come before the needs of web page authors, which come 
> before the needs of user agent implementors, which come before the 
> needs of specification writers, which come before theoretical purity."
>
> Don't get me wrong, I get the point of thinking beyond the limitation 
> of current tools. That's a valuable exercise. But practicality does 
> also matter.
>
> Also, in a distributed setting such as the web, you can not assume 
> that all other parties will always do the right thing™.
>
>   my 2€
>
>   pa
>
>> In other words, I think that QUDT-server-like behaviour can be 
>> provided easily by tooling, which I'd personally be very happy to 
>> work on contributing :) !).
>> Cheers,
>> Pat.
>>
>> *Pat McBennett*, Technical Architect
>>
>> Contact  | patm@inrupt.com
>>
>> Connect | WebID <http://pmcb55.inrupt.net/profile/card#me>, GitHub 
>> <https://github.com/pmcb55>
>>
>> Explore  | www.inrupt.com <http://www.inrupt.com/>
>>
>>
>>
>>
>> On Thu, Oct 6, 2022 at 3:44 PM Martynas Jusevičius 
>> <martynas@atomgraph.com> wrote:
>>
>>     Hi Pat,
>>
>>     For one thing, hash URIs are easier to cache because there is
>>     only one document URL. After the initial HTTP request the whole
>>     document can be cached with its URL as the key. All following
>>     term lookups (whose URIs start with that URL) will hit the cached
>>     document.
>>     Slash URIs will require an initial HTTP request for *each* term
>>     and will result in a cache entry per term.
>>
>>     This is based on my experience with Jena's OntDocumentManager.
>>
>>     Martynas
>>     atomgraph.com <http://atomgraph.com>
>>
>>
>>     On Thu, Oct 6, 2022 at 4:15 PM Pat McBennett <patm@inrupt.com> wrote:
>>
>>         So (I think!) I know all the pro's and con's of using either
>>         a trailing slash or a trailing hash for vocab namespace IRIs.
>>         Basically it boils down to hashes meaning you'll always get
>>         info on all the terms in a vocabulary, even if you only want
>>         info for one specific term, whereas using a slash means I can
>>         always get just the info for any specific, individual term I
>>         request.
>>
>>         Note: using slashes provides the ability to get the best of
>>         both worlds - i.e., small responses when explicitly asking
>>         for info on just one term, but if you want info for all the
>>         terms in one HTTP response, then just serve up that complete
>>         vocab response when the base namespace IRI itself is
>>         dereferenced.
>>
>>         Here's a nice simple illustration of the basic difference:
>>         - Slash: QUDT's 'CurrencyUnit' term (i.e., click on
>>         'https://qudt.org/schema/qudt/CurrencyUnit') and you get a
>>         nice clean, concise, and precise set of info on just the one
>>         term you asked for - lovely!
>>
>>         - Hash: DPV's 'JointDataControllers' (i.e., click on
>>         'https://w3id.org/dpv#JointDataControllers') and you get
>>         bombarded with a huge document, with a daunting Table of
>>         Contents on the left, and info on hundreds of other terms
>>         that I didn't ask for, and so had no interest in whatsoever
>>         (don't get me wrong - this is fantastically detailed and
>>         potentially very useful information, but it's simply not what
>>         I asked for!).
>>
>>         So based on the greater flexibility and future-proofing of
>>         using slash (i.e., it offers the best of both worlds, whereas
>>         hash is forever limited), I've become firmly of the opinion
>>         that slashes are just 'better' that hashes, and in fact are
>>         simply 'more correct' (i.e., all IRIs should be uniquely
>>         dereferencable).
>>
>>         I also think the distinction is critically important when
>>         creating vocabularies intended for widespread and
>>         long-lasting use (such as the DPV vocab above). For
>>         throw-away or pet projects, sure, it doesn't really matter
>>         (yet even then, I still think slashes are the 'more correct'
>>         option).
>>
>>         I know that the convention from the W3C has tended to be to
>>         use hashes, but I think in hindsight that was a mistake, and
>>         that the advice from the Semantic Web community as a whole
>>         should now be to adopt slashes consistently for all new
>>         vocabularies. (And it's not like using slash has no precedent
>>         - major 'authoritative' vocabs like QUDT, Schema.org, gist,
>>         SOSA, SSN, (even the venerable FOAF!) all use slash).
>>
>>         I'd love to hear this group's thoughts. (For reference, I did
>>         ask the gist community if they recorded their discussions
>>         around their decision (in 2019) to formally switch gist from
>>         hash to slash (here
>>         <https://github.com/semanticarts/gist/issues/725>), but it
>>         seems they weren't recorded, and I've also raised the issue
>>         with the DPV group directly too (here
>>         <https://github.com/w3c/dpv/issues/53>)).
>>
>>         Cheers,
>>
>>         Pat.
>>
>>         *Pat McBennett*, Technical Architect
>>
>>         Contact  | patm@inrupt.com
>>
>>         Connect | WebID <http://pmcb55.inrupt.net/profile/card#me>,
>>         GitHub <https://github.com/pmcb55>
>>
>>         Explore  | www.inrupt.com <http://www.inrupt.com/>
>>
>>
>>
>>         This e-mail, and any attachments thereto, is intended only
>>         for use by the addressee(s) named herein and may contain
>>         legally privileged, confidential and/or proprietary
>>         information. If you are not the intended recipient of this
>>         e-mail (or the person responsible for delivering this
>>         document to the intended recipient), please do not
>>         disseminate, distribute, print or copy this e-mail, or any
>>         attachment thereto. If you have received this e-mail in
>>         error, please respond to the individual sending the message,
>>         and permanently delete the email.
>>
>>
>> This e-mail, and any attachments thereto, is intended only for use by 
>> the addressee(s) named herein and may contain legally privileged, 
>> confidential and/or proprietary information. If you are not the 
>> intended recipient of this e-mail (or the person responsible for 
>> delivering this document to the intended recipient), please do not 
>> disseminate, distribute, print or copy this e-mail, or any attachment 
>> thereto. If you have received this e-mail in error, please respond to 
>> the individual sending the message, and permanently delete the email.
Attachments

application/pgp-keys attachment: OpenPGP public key
Received on Friday, 7 October 2022 08:24:11 UTC