- From: Martynas Jusevičius <martynas@atomgraph.com>
- Date: Fri, 7 Oct 2022 10:04:19 +0200
- To: Pierre-Antoine Champin <pierre-antoine@w3.org>
- Cc: Pat McBennett <patm@inrupt.com>, semantic-web@w3.org
- Message-ID: <CAE35Vmz9AwgSXmKPMK7r+i91qB7dPCi-eQ+k1r9k4u+jL_CqUQ@mail.gmail.com>
On Fri, Oct 7, 2022 at 9:12 AM Pierre-Antoine Champin <pierre-antoine@w3.org> wrote: > On 07/10/2022 01:49, Pat McBennett wrote: > > Hi Martynas, > > Thanks for the feedback! > > But I think any vocabulary can just as easily support that same caching > benefit with slash-based vocab namespace IRIs too, *without* having > to require an initial HTTP request for *each* term - i.e., by simply > returning the entire vocab on namespace IRI lookups. > > In the general case, when you encounter an IRI of the form > http://ex.co/x/Y, you can not assume that http://ex.co/x/ will contain > the definition of http://ex.co/x/Y together with other related terms. For > this you need, > This above is what I wanted to reply with. > a) the server to provide an affordance in the description of > http://ex.co/x/Y pointing to http://ex.co/x/ > b) the client to understand and follow that affordance (e.g. by using > rdfs:isDefinedBy) > c) the description at http://ex.co/x/ to include some information about > any term (e.g. http://ex.co/x/Z) in contains stating "there is nothing > more to know about this term" (e.g. by using rdfs:isDefinedBy again) > d) the client to understand that statement and refrain from fetching > http://ex.co/x/Z later on > > So you don't get "the best of both world" as automatically as you suggest. > > > I think QUDT is a really nice, simple example that very easily > demonstrates exactly this today. It has a slash namespace IRI, and if I > only ever request info on individual single vocab terms (e.g., try clicking > now on `https://qudt.org/schema/qudt/CurrencyUnit`) then yes, I'd > encounter that 'HTTP request per lookup' you suggest (but I'd be getting > precisely what I asked for each time!). > > Terms of a vocabulary/ontology rarely make sense in isolation. So > arguably, serving the entire vocabulary provides you with enough context to > understand/use the term appropriately. > > > But I can just as easily avoid that scenario today too by simply > requesting the vocab's namespace IRI instead - e.g., try it right now by > just clicking on `https://qudt.org/schema/qudt` > <https://qudt.org/schema/qudt>. See - you get back the entire vocab > containing all the vocab terms in a single HTTP response, which can be > cached and keyed on that one namespace IRI (exactly as you would if they'd > used a hash instead). > > And then you get "bombarded with a huge document", to quote one of your > arguments against hash-IRIs. Seems to me that you get the worst of both > worlds here: I had to perform two HTTP queries (one on CurrencyUnit, got > get the link to the whole vocab, and one on the vocab) instead of one (with > hash IRIs), and I still end up with a huge ontology. (yes, playing devil's > advocate here a little) > > (I'm not familiar with Jena's OntDocumentManager, but I'm sure its > caching code could easily be extended to take advantage of servers that > choose to server up slash-based vocabularies as QUDT demonstrates is so > feasible today.) > So doesn't that demonstrate my whole point - i..e, that with slashes I can > get the best of both worlds > > I don't think so. They are different trade-offs between providing targeted > content vs. reducing the number of HTTP queries, and between working with > dumb clients and/or dumb servers vs. requiring more coordination between > them (e.g. providing and following rdfs:isDefinedBy links). > > (i.e., precise term-specific HTTP responses if I want them, *and* the > entire vocab in a single HTTP response if I want that too)? Using a hash > completely locks me out, forever, of being able to achieve those lovely > clean term-specific responses. > And that's why I posit that slashes are simply 'more correct' (i.e., since > *only* slashes can ever allow servers to always know exactly, > unambiguously, what a requesting client is really looking for > > I don't by that. The server can never know exactly nor unambiguously what > the intent of the client is, nor should it (separation of concerns > <https://en.wikipedia.org/wiki/Separation_of_concerns>). > > (i.e., a term-specific response, or an entire vocab response)), and it > does so without losing any of the benefits of using a hash. (I do, by the > way, totally appreciate that servers choosing to work as the QUDT servers > do today might require a bit more server-side work. But my whole point is > to ask this community which option they consider "more technically correct > today and forever", and not "which option is easier for servers or vocab > creators/hosters/editors/publishers today in the absence of any tooling > support". > > Cant' help but cite the priority of consituencies remininded in > https://www.w3.org/TR/design-principles/#priority-of-constituencies > > "User needs come before the needs of web page authors, which come before > the needs of user agent implementors, which come before the needs of > specification writers, which come before theoretical purity." > > Don't get me wrong, I get the point of thinking beyond the limitation of > current tools. That's a valuable exercise. But practicality does also > matter. > > Also, in a distributed setting such as the web, you can not assume that > all other parties will always do the right thing™. > > my 2€ > > pa > > In other words, I think that QUDT-server-like behaviour can be provided > easily by tooling, which I'd personally be very happy to work on > contributing :) !). > Cheers, > Pat. > > *Pat McBennett*, Technical Architect > > Contact | patm@inrupt.com > > Connect | WebID <http://pmcb55.inrupt.net/profile/card#me>, GitHub > <https://github.com/pmcb55> > > Explore | www.inrupt.com > > > > > On Thu, Oct 6, 2022 at 3:44 PM Martynas Jusevičius <martynas@atomgraph.com> > wrote: > >> Hi Pat, >> >> For one thing, hash URIs are easier to cache because there is only one >> document URL. After the initial HTTP request the whole document can be >> cached with its URL as the key. All following term lookups (whose URIs >> start with that URL) will hit the cached document. >> Slash URIs will require an initial HTTP request for *each* term and will >> result in a cache entry per term. >> >> This is based on my experience with Jena's OntDocumentManager. >> >> Martynas >> atomgraph.com >> >> >> On Thu, Oct 6, 2022 at 4:15 PM Pat McBennett <patm@inrupt.com> wrote: >> >>> So (I think!) I know all the pro's and con's of using either a trailing >>> slash or a trailing hash for vocab namespace IRIs. Basically it boils down >>> to hashes meaning you'll always get info on all the terms in a vocabulary, >>> even if you only want info for one specific term, whereas using a slash >>> means I can always get just the info for any specific, individual term I >>> request. >>> >>> Note: using slashes provides the ability to get the best of both worlds >>> - i.e., small responses when explicitly asking for info on just one term, >>> but if you want info for all the terms in one HTTP response, then just >>> serve up that complete vocab response when the base namespace IRI itself is >>> dereferenced. >>> >>> Here's a nice simple illustration of the basic difference: >>> - Slash: QUDT's 'CurrencyUnit' term (i.e., click on ' >>> https://qudt.org/schema/qudt/CurrencyUnit') and you get a nice clean, >>> concise, and precise set of info on just the one term you asked for - >>> lovely! >>> >>> - Hash: DPV's 'JointDataControllers' (i.e., click on ' >>> https://w3id.org/dpv#JointDataControllers') and you get bombarded with >>> a huge document, with a daunting Table of Contents on the left, and info on >>> hundreds of other terms that I didn't ask for, and so had no interest in >>> whatsoever (don't get me wrong - this is fantastically detailed and >>> potentially very useful information, but it's simply not what I asked for!). >>> >>> So based on the greater flexibility and future-proofing of using slash >>> (i.e., it offers the best of both worlds, whereas hash is forever limited), >>> I've become firmly of the opinion that slashes are just 'better' that >>> hashes, and in fact are simply 'more correct' (i.e., all IRIs should be >>> uniquely dereferencable). >>> >>> I also think the distinction is critically important when creating >>> vocabularies intended for widespread and long-lasting use (such as the DPV >>> vocab above). For throw-away or pet projects, sure, it doesn't really >>> matter (yet even then, I still think slashes are the 'more correct' option). >>> >>> I know that the convention from the W3C has tended to be to use hashes, >>> but I think in hindsight that was a mistake, and that the advice from the >>> Semantic Web community as a whole should now be to adopt slashes >>> consistently for all new vocabularies. (And it's not like using slash has >>> no precedent - major 'authoritative' vocabs like QUDT, Schema.org, gist, >>> SOSA, SSN, (even the venerable FOAF!) all use slash). >>> >>> I'd love to hear this group's thoughts. (For reference, I did ask the >>> gist community if they recorded their discussions around their decision (in >>> 2019) to formally switch gist from hash to slash (here >>> <https://github.com/semanticarts/gist/issues/725>), but it seems they >>> weren't recorded, and I've also raised the issue with the DPV group >>> directly too (here <https://github.com/w3c/dpv/issues/53>)). >>> >>> Cheers, >>> >>> Pat. >>> >>> *Pat McBennett*, Technical Architect >>> >>> Contact | patm@inrupt.com >>> >>> Connect | WebID <http://pmcb55.inrupt.net/profile/card#me>, GitHub >>> <https://github.com/pmcb55> >>> >>> Explore | www.inrupt.com >>> >>> >>> >>> This e-mail, and any attachments thereto, is intended only for use by >>> the addressee(s) named herein and may contain legally privileged, >>> confidential and/or proprietary information. If you are not the intended >>> recipient of this e-mail (or the person responsible for delivering this >>> document to the intended recipient), please do not disseminate, distribute, >>> print or copy this e-mail, or any attachment thereto. If you have received >>> this e-mail in error, please respond to the individual sending the message, >>> and permanently delete the email. >> >> > This e-mail, and any attachments thereto, is intended only for use by the > addressee(s) named herein and may contain legally privileged, confidential > and/or proprietary information. If you are not the intended recipient of > this e-mail (or the person responsible for delivering this document to the > intended recipient), please do not disseminate, distribute, print or copy > this e-mail, or any attachment thereto. If you have received this e-mail in > error, please respond to the individual sending the message, and > permanently delete the email. > >
Received on Friday, 7 October 2022 08:04:44 UTC