Re: (Lost in the noise perhaps - so asking again) - Is a trailing slash 'better' than a trailing hash for vocabs namespace IRIs? from Hugh Glaser on 2022-10-07 (semantic-web@w3.org from October 2022)

From: Hugh Glaser <hugh@glasers.org>
Date: Fri, 7 Oct 2022 22:01:27 +0100
To: Martynas Jusevičius <martynas@atomgraph.com>
Cc: Pat McBennett <patm@inrupt.com>, Pierre-Antoine Champin <pierre-antoine@w3.org>, semantic-web@w3.org
Message-Id: <176C434B-6237-4BCF-8B02-2A703AD1BCD1@glasers.org>
Thanks Martynas.

Enjoyable to watch..

So it looks to me like the methodology captured by your system is do what I said seems to be common - identify an ontology that you want for an application, and import it at that time.
So you don’t care if it is slash or hash.

As something of an aside perhaps, the “end-users” I was thinking of are not the knowledge engineers & creators, which is I think who you are aimed at.
Presumably the output of your system results in a knowledge base.
And this is then used by an app or apps to do something that the end-users I am thinking of will find useful?
And of course during that use, the URIs in the ontology do not need to be resolved?

Best
Hugh


> On 7 Oct 2022, at 21:37, Martynas Jusevičius <martynas@atomgraph.com> wrote:
> 
> 
>> Hi Hugh,
>> 
>> On Fri, 7 Oct 2022 at 19.59, Hugh Glaser <hugh@glasers.org> wrote:
>> Does anyone have existing use cases where the ontology needs to be retrieved automatically, since Pierre mentions “user needs”.
>> So we could look at things like how much of vocabularies/ontologies are fetched, how big they are, patterns of frequency of fetching, etc.
>> 
>> In my case, all my Linked Data apps know the vocabulary/ontology from the start, and indeed they are pretty much used as the equivalent of schemas during design.
>> I don’t know of many apps where even the data is acquired from remote access by global Follow Your Nose, never mind the vocabulary/ontology, although I don’t have a big sample, because I don’t know of many real world apps that have been deployed.
> 
> I happen to have published a screencast of such an app just a few days ago :)
> https://youtu.be/Dl0xIUNd4F0
> 
> Let me know what you think.
> 
> So most of the considerations below are not of practical interest; the issue is more about using the best URI format, than how to resolve them.
>> Which is usually slash rather than hash URIs, I think is the way Linked Data leans.
>> 
>> And of course if you are doing Semantic Web that isn’t Linked Data, the URIs within the definition document don’t need to resolve, so you can use either.
>> 
>> Best
>> Hugh
>> 
>> > On 7 Oct 2022, at 08:07, Pierre-Antoine Champin <pierre-antoine@w3.org> wrote:
>> > 
>> > On 07/10/2022 01:49, Pat McBennett wrote:
>> > 
>> >> Hi Martynas,
>> >> 
>> >> Thanks for the feedback!
>> >> 
>> >> But I think any vocabulary can just as easily support that same caching benefit with slash-based vocab namespace IRIs too, *without* having to require an initial HTTP request for *each* term - i.e., by simply returning the entire vocab on namespace IRI lookups.
>> > In the general case, when you encounter an IRI of the form http://ex.co/x/Y, you can not assume that http://ex.co/x/ will contain the definition of http://ex.co/x/Ytogether with other related terms. For this you need,
>> > a) the server to provide an affordance in the description of http://ex.co/x/Y pointing to http://ex.co/x/
>> > b) the client to understand and follow that affordance (e.g. by using rdfs:isDefinedBy)
>> > c) the description at http://ex.co/x/ to include some information about any term (e.g. http://ex.co/x/Z) in contains stating "there is nothing more to know about this term" (e.g. by using rdfs:isDefinedBy again)
>> > d) the client to understand that statement and refrain from fetching http://ex.co/x/Z later on
>> > 
>> > So you don't get "the best of both world" as automatically as you suggest.
>> > 
>> >> 
>> >> I think QUDT is a really nice, simple example that very easily demonstrates exactly this today. It has a slash namespace IRI, and if I only ever request info on individual single vocab terms (e.g., try clicking now on `https://qudt.org/schema/qudt/CurrencyUnit`) then yes, I'd encounter that 'HTTP request per lookup' you suggest (but I'd be getting precisely what I asked for each time!).
>> > Terms of a vocabulary/ontology rarely make sense in isolation. So arguably, serving the entire vocabulary provides you with enough context to understand/use the term appropriately.
>> >> 
>> >> But I can just as easily avoid that scenario today too by simply requesting the vocab's namespace IRI instead - e.g., try it right now by just clicking on `https://qudt.org/schema/qudt`. See - you get back the entire vocab containing all the vocab terms in a single HTTP response, which can be cached and keyed on that one namespace IRI (exactly as you would if they'd used a hash instead).
>> > And then you get "bombarded with a huge document", to quote one of your arguments against hash-IRIs. Seems to me that you get the worst of both worlds here: I had to perform two HTTP queries (one on CurrencyUnit, got get the link to the whole vocab, and one on the vocab) instead of one (with hash IRIs), and I still end up with a huge ontology. (yes, playing devil's advocate here a little)
>> >> 
>> >> 
>> >> (I'm not familiar with Jena's 
>> >> OntDocumentManager, but I'm sure its caching code could easily be extended to take advantage of servers that choose to server up slash-based vocabularies as QUDT demonstrates is so feasible today.)
>> >> 
>> >> So doesn't that demonstrate my whole point - i..e, that with slashes I can get the best of both worlds
>> > I don't think so. They are different trade-offs between providing targeted content vs. reducing the number of HTTP queries, and between working with dumb clients and/or dumb servers vs. requiring more coordination between them  (e.g. providing and following rdfs:isDefinedBy links).
>> > 
>> >>  (i.e., precise term-specific HTTP responses if I want them, *and* the entire vocab in a single HTTP response if I want that too)? Using a hash completely locks me out, forever, of being able to achieve those lovely clean term-specific responses.
>> >> 
>> >> And that's why I posit that slashes are simply 'more correct' (i.e., since *only* slashes can ever allow servers to always know exactly, unambiguously, what a requesting client is really looking for
>> > I don't by that. The server can never know exactly nor unambiguously what the intent of the client is, nor should it (separation of concerns).
>> >>  (i.e., a term-specific response, or an entire vocab response)), and it does so without losing any of the benefits of using a hash.
>> >> 
>> >> (I do, by the way, totally appreciate that servers choosing to work as the QUDT servers do today might require a bit more server-side work. But my whole point is to ask this community which option they consider "more 
>> >> technically correct today and forever", and not "which option is easier for servers or vocab creators/hosters/editors/publishers today in the absence of any tooling support". 
>> > Cant' help but cite the priority of consituencies remininded in https://www.w3.org/TR/design-principles/#priority-of-constituencies
>> > 
>> > "User needs come before the needs of web page authors, which come before the needs of user agent implementors, which come before the needs of specification writers, which come before theoretical purity."
>> > 
>> > Don't get me wrong, I get the point of thinking beyond the limitation of current tools. That's a valuable exercise. But practicality does also matter.
>> > 
>> > Also, in a distributed setting such as the web, you can not assume that all other parties will always do the right thing™.
>> > 
>> >   my 2€
>> > 
>> >   pa
>> > 
>> >> In other words, I think that QUDT-server-like behaviour can be provided easily by tooling, which I'd personally be very happy to work on contributing :) !).
>> >> 
>> >> Cheers,
>> >> 
>> >> Pat.
>> >> 
>> >> Pat McBennett, Technical Architect
>> >> 
>> >> Contact  | patm@inrupt.com
>> >> 
>> >> Connect | WebID, GitHub
>> >> 
>> >> Explore  | www.inrupt.com
>> >> 
>> >> 
>> >> 
>> >> 
>> >> 
>> >> On Thu, Oct 6, 2022 at 3:44 PM Martynas Jusevičius <martynas@atomgraph.com> wrote:
>> >> Hi Pat,
>> >> 
>> >> For one thing, hash URIs are easier to cache because there is only one document URL. After the initial HTTP request the whole document can be cached with its URL as the key. All following term lookups (whose URIs start with that URL) will hit the cached document.
>> >> Slash URIs will require an initial HTTP request for *each* term and will result in a cache entry per term.
>> >> 
>> >> This is based on my experience with Jena's OntDocumentManager.
>> >> 
>> >> Martynas
>> >> atomgraph.com
>> >> 
>> >> 
>> >> On Thu, Oct 6, 2022 at 4:15 PM Pat McBennett <patm@inrupt.com> wrote:
>> >> So (I think!) I know all the pro's and con's of using either a trailing slash or a trailing hash for vocab namespace IRIs. Basically it boils down to hashes meaning you'll always get info on all the terms in a vocabulary, even if you only want info for one specific term, whereas using a slash means I can always get just the info for any specific, individual term I request.
>> >> 
>> >> Note: using slashes provides the ability to get the best of both worlds - i.e., small responses when explicitly asking for info on just one term, but if you want info for all the terms in one HTTP response, then just serve up that complete vocab response when the base namespace IRI itself is dereferenced.
>> >> 
>> >> Here's a nice simple illustration of the basic difference:
>> >> - Slash: QUDT's 'CurrencyUnit' term (i.e., click on 'https://qudt.org/schema/qudt/CurrencyUnit') and you get a nice clean, concise, and precise set of info on just the one term you asked for - lovely!
>> >> 
>> >> - Hash: DPV's 'JointDataControllers' (i.e., click on 'https://w3id.org/dpv#JointDataControllers') and you get bombarded with a huge document, with a daunting Table of Contents on the left, and info on hundreds of other terms that I didn't ask for, and so had no interest in whatsoever (don't get me wrong - this is fantastically detailed and potentially very useful information, but it's simply not what I asked for!).
>> >> 
>> >> So based on the greater flexibility and future-proofing of using slash (i.e., it offers the best of both worlds, whereas hash is forever limited), I've become firmly of the opinion that slashes are just 'better' that hashes, and in fact are simply 'more correct' (i.e., all IRIs should be uniquely dereferencable).
>> >> 
>> >> I also think the distinction is critically important when creating vocabularies intended for widespread and long-lasting use (such as the DPV vocab above). For throw-away or pet projects, sure, it doesn't really matter (yet even then, I still think slashes are the 'more correct' option).
>> >> 
>> >> I know that the convention from the W3C has tended to be to use hashes, but I think in hindsight that was a mistake, and that the advice from the Semantic Web community as a whole should now be to adopt slashes consistently for all new vocabularies. (And it's not like using slash has no precedent - major 'authoritative' vocabs like QUDT, Schema.org, gist, SOSA, SSN, (even the venerable FOAF!) all use slash).
>> >> 
>> >> I'd love to hear this group's thoughts. (For reference, I did ask the gist community if they recorded their discussions around their decision (in 2019) to formally switch gist from hash to slash (here), but it seems they weren't recorded, and I've also raised the issue with the DPV group directly too (here)).
>> >> 
>> >> Cheers,
>> >> 
>> >> Pat.
>> >> 
>> >> Pat McBennett, Technical Architect
>> >> 
>> >> Contact  | patm@inrupt.com
>> >> 
>> >> Connect | WebID, GitHub
>> >> 
>> >> Explore  | www.inrupt.com
>> >> 
>> >> 
>> >> 
>> >> 
>> >> This e-mail, and any attachments thereto, is intended only for use by the addressee(s) named herein and may contain legally privileged, confidential and/or proprietary information. If you are not the intended recipient of this e-mail (or the person responsible for delivering this document to the intended recipient), please do not disseminate, distribute, print or copy this e-mail, or any attachment thereto. If you have received this e-mail in error, please respond to the individual sending the message, and permanently delete the email.
>> >> 
>> >> This e-mail, and any attachments thereto, is intended only for use by the addressee(s) named herein and may contain legally privileged, confidential and/or proprietary information. If you are not the intended recipient of this e-mail (or the person responsible for delivering this document to the intended recipient), please do not disseminate, distribute, print or copy this e-mail, or any attachment thereto. If you have received this e-mail in error, please respond to the individual sending the message, and permanently delete the email.
>> > <OpenPGP_0x9D1EDAEEEF98D438.asc>
>> 
>> 
>
Received on Friday, 7 October 2022 21:01:47 UTC