- From: Pierre-Antoine Champin <pierre-antoine@w3.org>
- Date: Fri, 7 Oct 2022 10:24:06 +0200
- To: Pat McBennett <patm@inrupt.com>
- Cc: semantic-web@w3.org
- Message-ID: <d9285470-7589-5e33-01c3-249c79d659a8@w3.org>
oops, slight typo in my previous email On 07/10/2022 09:07, Pierre-Antoine Champin wrote: > > On 07/10/2022 01:49, Pat McBennett wrote: > >> Hi Martynas, >> >> Thanks for the feedback! >> >> But I think any vocabulary can just as easily support that same >> caching benefit with slash-based vocab namespace IRIs too, *without* >> having to require an initial HTTP request for *each* term - i.e., by >> simply returning the entire vocab on namespace IRI lookups. > In the general case, when you encounter an IRI of the form > http://ex.co/x/Y, you can not assume that http://ex.co/x/ will contain > the definition of http://ex.co/x/Y together with other related terms. > For this you need, > > a) the server to provide an affordance in the description of > http://ex.co/x/Y pointing to http://ex.co/x/ > b) the client to understand and follow that affordance (e.g. by using > rdfs:isDefinedBy) > the parenthesis "(e.g. by using rdfs:isDefinedBy)" was supposed to be in a), not in b)... sorry about that > > c) the description at http://ex.co/x/ to include some information > about any term (e.g. http://ex.co/x/Z) in contains stating "there is > nothing more to know about this term" (e.g. by using rdfs:isDefinedBy > again) > d) the client to understand that statement and refrain from fetching > http://ex.co/x/Z later on > > So you don't get "the best of both world" as automatically as you suggest. > >> >> I think QUDT is a really nice, simple example that very easily >> demonstrates exactly this today. It has a slash namespace IRI, and if >> I only ever request info on individual single vocab terms (e.g., try >> clicking now on `https://qudt.org/schema/qudt/CurrencyUnit`) then >> yes, I'd encounter that 'HTTP request per lookup' you suggest (but >> I'd be getting precisely what I asked for each time!). > Terms of a vocabulary/ontology rarely make sense in isolation. So > arguably, serving the entire vocabulary provides you with enough > context to understand/use the term appropriately. >> >> But I can just as easily avoid that scenario today too by simply >> requesting the vocab's namespace IRI instead - e.g., try it right now >> by just clicking on `https://qudt.org/schema/qudt` >> <https://qudt.org/schema/qudt`>. See - you get back the entire vocab >> containing all the vocab terms in a single HTTP response, which can >> be cached and keyed on that one namespace IRI (exactly as you would >> if they'd used a hash instead). > And then you get "bombarded with a huge document", to quote one of > your arguments against hash-IRIs. Seems to me that you get the worst > of both worlds here: I had to perform two HTTP queries (one on > CurrencyUnit, got get the link to the whole vocab, and one on the > vocab) instead of one (with hash IRIs), and I still end up with a huge > ontology. (yes, playing devil's advocate here a little) >> (I'm not familiar with Jena's OntDocumentManager, but I'm sure its >> caching code could easily be extended to take advantage of servers >> that choose to server up slash-based vocabularies as QUDT >> demonstrates is so feasible today.) >> So doesn't that demonstrate my whole point - i..e, that with slashes >> I can get the best of both worlds > > I don't think so. They are different trade-offs between providing > targeted content vs. reducing the number of HTTP queries, and between > working with dumb clients and/or dumb servers vs. requiring more > coordination between them (e.g. providing and following > rdfs:isDefinedBy links). > >> (i.e., precise term-specific HTTP responses if I want them, *and* the >> entire vocab in a single HTTP response if I want that too)? Using a >> hash completely locks me out, forever, of being able to achieve those >> lovely clean term-specific responses. >> And that's why I posit that slashes are simply 'more correct' (i.e., >> since *only* slashes can ever allow servers to always know exactly, >> unambiguously, what a requesting client is really looking for > I don't by that. The server can never know exactly nor unambiguously > what the intent of the client is, nor should it (separation of > concerns <https://en.wikipedia.org/wiki/Separation_of_concerns>). >> (i.e., a term-specific response, or an entire vocab response)), and >> it does so without losing any of the benefits of using a hash. (I do, >> by the way, totally appreciate that servers choosing to work as the >> QUDT servers do today might require a bit more server-side work. But >> my whole point is to ask this community which option they consider >> "more technically correct today and forever", and not "which option >> is easier for servers or vocab creators/hosters/editors/publishers >> today in the absence of any tooling support". > > Cant' help but cite the priority of consituencies remininded in > https://www.w3.org/TR/design-principles/#priority-of-constituencies > > "User needs come before the needs of web page authors, which come > before the needs of user agent implementors, which come before the > needs of specification writers, which come before theoretical purity." > > Don't get me wrong, I get the point of thinking beyond the limitation > of current tools. That's a valuable exercise. But practicality does > also matter. > > Also, in a distributed setting such as the web, you can not assume > that all other parties will always do the right thing™. > > my 2€ > > pa > >> In other words, I think that QUDT-server-like behaviour can be >> provided easily by tooling, which I'd personally be very happy to >> work on contributing :) !). >> Cheers, >> Pat. >> >> *Pat McBennett*, Technical Architect >> >> Contact | patm@inrupt.com >> >> Connect | WebID <http://pmcb55.inrupt.net/profile/card#me>, GitHub >> <https://github.com/pmcb55> >> >> Explore | www.inrupt.com <http://www.inrupt.com/> >> >> >> >> >> On Thu, Oct 6, 2022 at 3:44 PM Martynas Jusevičius >> <martynas@atomgraph.com> wrote: >> >> Hi Pat, >> >> For one thing, hash URIs are easier to cache because there is >> only one document URL. After the initial HTTP request the whole >> document can be cached with its URL as the key. All following >> term lookups (whose URIs start with that URL) will hit the cached >> document. >> Slash URIs will require an initial HTTP request for *each* term >> and will result in a cache entry per term. >> >> This is based on my experience with Jena's OntDocumentManager. >> >> Martynas >> atomgraph.com <http://atomgraph.com> >> >> >> On Thu, Oct 6, 2022 at 4:15 PM Pat McBennett <patm@inrupt.com> wrote: >> >> So (I think!) I know all the pro's and con's of using either >> a trailing slash or a trailing hash for vocab namespace IRIs. >> Basically it boils down to hashes meaning you'll always get >> info on all the terms in a vocabulary, even if you only want >> info for one specific term, whereas using a slash means I can >> always get just the info for any specific, individual term I >> request. >> >> Note: using slashes provides the ability to get the best of >> both worlds - i.e., small responses when explicitly asking >> for info on just one term, but if you want info for all the >> terms in one HTTP response, then just serve up that complete >> vocab response when the base namespace IRI itself is >> dereferenced. >> >> Here's a nice simple illustration of the basic difference: >> - Slash: QUDT's 'CurrencyUnit' term (i.e., click on >> 'https://qudt.org/schema/qudt/CurrencyUnit') and you get a >> nice clean, concise, and precise set of info on just the one >> term you asked for - lovely! >> >> - Hash: DPV's 'JointDataControllers' (i.e., click on >> 'https://w3id.org/dpv#JointDataControllers') and you get >> bombarded with a huge document, with a daunting Table of >> Contents on the left, and info on hundreds of other terms >> that I didn't ask for, and so had no interest in whatsoever >> (don't get me wrong - this is fantastically detailed and >> potentially very useful information, but it's simply not what >> I asked for!). >> >> So based on the greater flexibility and future-proofing of >> using slash (i.e., it offers the best of both worlds, whereas >> hash is forever limited), I've become firmly of the opinion >> that slashes are just 'better' that hashes, and in fact are >> simply 'more correct' (i.e., all IRIs should be uniquely >> dereferencable). >> >> I also think the distinction is critically important when >> creating vocabularies intended for widespread and >> long-lasting use (such as the DPV vocab above). For >> throw-away or pet projects, sure, it doesn't really matter >> (yet even then, I still think slashes are the 'more correct' >> option). >> >> I know that the convention from the W3C has tended to be to >> use hashes, but I think in hindsight that was a mistake, and >> that the advice from the Semantic Web community as a whole >> should now be to adopt slashes consistently for all new >> vocabularies. (And it's not like using slash has no precedent >> - major 'authoritative' vocabs like QUDT, Schema.org, gist, >> SOSA, SSN, (even the venerable FOAF!) all use slash). >> >> I'd love to hear this group's thoughts. (For reference, I did >> ask the gist community if they recorded their discussions >> around their decision (in 2019) to formally switch gist from >> hash to slash (here >> <https://github.com/semanticarts/gist/issues/725>), but it >> seems they weren't recorded, and I've also raised the issue >> with the DPV group directly too (here >> <https://github.com/w3c/dpv/issues/53>)). >> >> Cheers, >> >> Pat. >> >> *Pat McBennett*, Technical Architect >> >> Contact | patm@inrupt.com >> >> Connect | WebID <http://pmcb55.inrupt.net/profile/card#me>, >> GitHub <https://github.com/pmcb55> >> >> Explore | www.inrupt.com <http://www.inrupt.com/> >> >> >> >> This e-mail, and any attachments thereto, is intended only >> for use by the addressee(s) named herein and may contain >> legally privileged, confidential and/or proprietary >> information. If you are not the intended recipient of this >> e-mail (or the person responsible for delivering this >> document to the intended recipient), please do not >> disseminate, distribute, print or copy this e-mail, or any >> attachment thereto. If you have received this e-mail in >> error, please respond to the individual sending the message, >> and permanently delete the email. >> >> >> This e-mail, and any attachments thereto, is intended only for use by >> the addressee(s) named herein and may contain legally privileged, >> confidential and/or proprietary information. If you are not the >> intended recipient of this e-mail (or the person responsible for >> delivering this document to the intended recipient), please do not >> disseminate, distribute, print or copy this e-mail, or any attachment >> thereto. If you have received this e-mail in error, please respond to >> the individual sending the message, and permanently delete the email.
Attachments
- application/pgp-keys attachment: OpenPGP public key
Received on Friday, 7 October 2022 08:24:11 UTC