- From: Martynas Jusevičius <martynas@atomgraph.com>
- Date: Fri, 7 Oct 2022 22:37:32 +0200
- To: Hugh Glaser <hugh@glasers.org>
- Cc: Pat McBennett <patm@inrupt.com>, Pierre-Antoine Champin <pierre-antoine@w3.org>, semantic-web@w3.org
- Message-ID: <CAE35VmwnJc4DW1C8BokmXywk=6xMPLudxCJkPG=k19Z0ccAD9g@mail.gmail.com>
Hi Hugh, On Fri, 7 Oct 2022 at 19.59, Hugh Glaser <hugh@glasers.org> wrote: > Does anyone have existing use cases where the ontology needs to be > retrieved automatically, since Pierre mentions “user needs”. > So we could look at things like how much of vocabularies/ontologies are > fetched, how big they are, patterns of frequency of fetching, etc. > > In my case, all my Linked Data apps know the vocabulary/ontology from the > start, and indeed they are pretty much used as the equivalent of schemas > during design. > I don’t know of many apps where even the data is acquired from remote > access by global Follow Your Nose, never mind the vocabulary/ontology, > although I don’t have a big sample, because I don’t know of many real world > apps that have been deployed. > I happen to have published a screencast of such an app just a few days ago :) https://youtu.be/Dl0xIUNd4F0 Let me know what you think. > So most of the considerations below are not of practical interest; the > issue is more about using the best URI format, than how to resolve them. > Which is usually slash rather than hash URIs, I think is the way Linked > Data leans. > > And of course if you are doing Semantic Web that isn’t Linked Data, the > URIs within the definition document don’t need to resolve, so you can use > either. > > Best > Hugh > > > On 7 Oct 2022, at 08:07, Pierre-Antoine Champin <pierre-antoine@w3.org> > wrote: > > > > On 07/10/2022 01:49, Pat McBennett wrote: > > > >> Hi Martynas, > >> > >> Thanks for the feedback! > >> > >> But I think any vocabulary can just as easily support that same caching > benefit with slash-based vocab namespace IRIs too, *without* having to > require an initial HTTP request for *each* term - i.e., by simply returning > the entire vocab on namespace IRI lookups. > > In the general case, when you encounter an IRI of the form > http://ex.co/x/Y, you can not assume that http://ex.co/x/ will contain > the definition of http://ex.co/x/Ytogether with other related terms. For > this you need, > > a) the server to provide an affordance in the description of > http://ex.co/x/Y pointing to http://ex.co/x/ > > b) the client to understand and follow that affordance (e.g. by using > rdfs:isDefinedBy) > > c) the description at http://ex.co/x/ to include some information about > any term (e.g. http://ex.co/x/Z) in contains stating "there is nothing > more to know about this term" (e.g. by using rdfs:isDefinedBy again) > > d) the client to understand that statement and refrain from fetching > http://ex.co/x/Z later on > > > > So you don't get "the best of both world" as automatically as you > suggest. > > > >> > >> I think QUDT is a really nice, simple example that very easily > demonstrates exactly this today. It has a slash namespace IRI, and if I > only ever request info on individual single vocab terms (e.g., try clicking > now on `https://qudt.org/schema/qudt/CurrencyUnit` > <https://qudt.org/schema/qudt/CurrencyUnit>) then yes, I'd encounter that > 'HTTP request per lookup' you suggest (but I'd be getting precisely what I > asked for each time!). > > Terms of a vocabulary/ontology rarely make sense in isolation. So > arguably, serving the entire vocabulary provides you with enough context to > understand/use the term appropriately. > >> > >> But I can just as easily avoid that scenario today too by simply > requesting the vocab's namespace IRI instead - e.g., try it right now by > just clicking on `https://qudt.org/schema/qudt` > <https://qudt.org/schema/qudt>. See - you get back the entire vocab > containing all the vocab terms in a single HTTP response, which can be > cached and keyed on that one namespace IRI (exactly as you would if they'd > used a hash instead). > > And then you get "bombarded with a huge document", to quote one of your > arguments against hash-IRIs. Seems to me that you get the worst of both > worlds here: I had to perform two HTTP queries (one on CurrencyUnit, got > get the link to the whole vocab, and one on the vocab) instead of one (with > hash IRIs), and I still end up with a huge ontology. (yes, playing devil's > advocate here a little) > >> > >> > >> (I'm not familiar with Jena's > >> OntDocumentManager, but I'm sure its caching code could easily be > extended to take advantage of servers that choose to server up slash-based > vocabularies as QUDT demonstrates is so feasible today.) > >> > >> So doesn't that demonstrate my whole point - i..e, that with slashes I > can get the best of both worlds > > I don't think so. They are different trade-offs between providing > targeted content vs. reducing the number of HTTP queries, and between > working with dumb clients and/or dumb servers vs. requiring more > coordination between them (e.g. providing and following rdfs:isDefinedBy > links). > > > >> (i.e., precise term-specific HTTP responses if I want them, *and* the > entire vocab in a single HTTP response if I want that too)? Using a hash > completely locks me out, forever, of being able to achieve those lovely > clean term-specific responses. > >> > >> And that's why I posit that slashes are simply 'more correct' (i.e., > since *only* slashes can ever allow servers to always know exactly, > unambiguously, what a requesting client is really looking for > > I don't by that. The server can never know exactly nor unambiguously > what the intent of the client is, nor should it (separation of concerns). > >> (i.e., a term-specific response, or an entire vocab response)), and it > does so without losing any of the benefits of using a hash. > >> > >> (I do, by the way, totally appreciate that servers choosing to work as > the QUDT servers do today might require a bit more server-side work. But my > whole point is to ask this community which option they consider "more > >> technically correct today and forever", and not "which option is easier > for servers or vocab creators/hosters/editors/publishers today in the > absence of any tooling support". > > Cant' help but cite the priority of consituencies remininded in > https://www.w3.org/TR/design-principles/#priority-of-constituencies > > > > "User needs come before the needs of web page authors, which come before > the needs of user agent implementors, which come before the needs of > specification writers, which come before theoretical purity." > > > > Don't get me wrong, I get the point of thinking beyond the limitation of > current tools. That's a valuable exercise. But practicality does also > matter. > > > > Also, in a distributed setting such as the web, you can not assume that > all other parties will always do the right thing™. > > > > my 2€ > > > > pa > > > >> In other words, I think that QUDT-server-like behaviour can be provided > easily by tooling, which I'd personally be very happy to work on > contributing :) !). > >> > >> Cheers, > >> > >> Pat. > >> > >> Pat McBennett, Technical Architect > >> > >> Contact | patm@inrupt.com > >> > >> Connect | WebID, GitHub > >> > >> Explore | www.inrupt.com > >> > >> > >> > >> > >> > >> On Thu, Oct 6, 2022 at 3:44 PM Martynas Jusevičius < > martynas@atomgraph.com> wrote: > >> Hi Pat, > >> > >> For one thing, hash URIs are easier to cache because there is only one > document URL. After the initial HTTP request the whole document can be > cached with its URL as the key. All following term lookups (whose URIs > start with that URL) will hit the cached document. > >> Slash URIs will require an initial HTTP request for *each* term and > will result in a cache entry per term. > >> > >> This is based on my experience with Jena's OntDocumentManager. > >> > >> Martynas > >> atomgraph.com > >> > >> > >> On Thu, Oct 6, 2022 at 4:15 PM Pat McBennett <patm@inrupt.com> wrote: > >> So (I think!) I know all the pro's and con's of using either a trailing > slash or a trailing hash for vocab namespace IRIs. Basically it boils down > to hashes meaning you'll always get info on all the terms in a vocabulary, > even if you only want info for one specific term, whereas using a slash > means I can always get just the info for any specific, individual term I > request. > >> > >> Note: using slashes provides the ability to get the best of both worlds > - i.e., small responses when explicitly asking for info on just one term, > but if you want info for all the terms in one HTTP response, then just > serve up that complete vocab response when the base namespace IRI itself is > dereferenced. > >> > >> Here's a nice simple illustration of the basic difference: > >> - Slash: QUDT's 'CurrencyUnit' term (i.e., click on ' > https://qudt.org/schema/qudt/CurrencyUnit') and you get a nice clean, > concise, and precise set of info on just the one term you asked for - > lovely! > >> > >> - Hash: DPV's 'JointDataControllers' (i.e., click on ' > https://w3id.org/dpv#JointDataControllers') and you get bombarded with a > huge document, with a daunting Table of Contents on the left, and info on > hundreds of other terms that I didn't ask for, and so had no interest in > whatsoever (don't get me wrong - this is fantastically detailed and > potentially very useful information, but it's simply not what I asked for!). > >> > >> So based on the greater flexibility and future-proofing of using slash > (i.e., it offers the best of both worlds, whereas hash is forever limited), > I've become firmly of the opinion that slashes are just 'better' that > hashes, and in fact are simply 'more correct' (i.e., all IRIs should be > uniquely dereferencable). > >> > >> I also think the distinction is critically important when creating > vocabularies intended for widespread and long-lasting use (such as the DPV > vocab above). For throw-away or pet projects, sure, it doesn't really > matter (yet even then, I still think slashes are the 'more correct' option). > >> > >> I know that the convention from the W3C has tended to be to use hashes, > but I think in hindsight that was a mistake, and that the advice from the > Semantic Web community as a whole should now be to adopt slashes > consistently for all new vocabularies. (And it's not like using slash has > no precedent - major 'authoritative' vocabs like QUDT, Schema.org, gist, > SOSA, SSN, (even the venerable FOAF!) all use slash). > >> > >> I'd love to hear this group's thoughts. (For reference, I did ask the > gist community if they recorded their discussions around their decision (in > 2019) to formally switch gist from hash to slash (here), but it seems they > weren't recorded, and I've also raised the issue with the DPV group > directly too (here)). > >> > >> Cheers, > >> > >> Pat. > >> > >> Pat McBennett, Technical Architect > >> > >> Contact | patm@inrupt.com > >> > >> Connect | WebID, GitHub > >> > >> Explore | www.inrupt.com > >> > >> > >> > >> > >> This e-mail, and any attachments thereto, is intended only for use by > the addressee(s) named herein and may contain legally privileged, > confidential and/or proprietary information. If you are not the intended > recipient of this e-mail (or the person responsible for delivering this > document to the intended recipient), please do not disseminate, distribute, > print or copy this e-mail, or any attachment thereto. If you have received > this e-mail in error, please respond to the individual sending the message, > and permanently delete the email. > >> > >> This e-mail, and any attachments thereto, is intended only for use by > the addressee(s) named herein and may contain legally privileged, > confidential and/or proprietary information. If you are not the intended > recipient of this e-mail (or the person responsible for delivering this > document to the intended recipient), please do not disseminate, distribute, > print or copy this e-mail, or any attachment thereto. If you have received > this e-mail in error, please respond to the individual sending the message, > and permanently delete the email. > > <OpenPGP_0x9D1EDAEEEF98D438.asc> > > >
Received on Friday, 7 October 2022 20:37:57 UTC