- From: Pat McBennett <patm@inrupt.com>
- Date: Wed, 12 Oct 2022 12:15:12 +0100
- To: Hugh Glaser <hugh@glasers.org>
- Cc: Pierre-Antoine Champin <pierre-antoine@w3.org>, semantic-web@w3.org
- Message-ID: <CABgQ8mJgRmHsoJqstxhKPtvD7q7bz444rMJm4p3isZ8t-RbEaA@mail.gmail.com>
Hiya Hugh, Ok, so let me tease this out with some example Turtle below... On Wed, Oct 12, 2022 at 9:44 AM Hugh Glaser <hugh@glasers.org> wrote: > Hi Pat. > Looks like we still aren’t there yet. > > > On 12 Oct 2022, at 01:53, Pat McBennett <patm@inrupt.com> wrote: > > > > Hiya Hugh, > > > >> On Tue, Oct 11, 2022 at 1:22 PM Hugh Glaser <hugh@glasers.org> wrote: > >> Hi Pat, > >> > >> (I’ve tried sorting out the quotation levels a bit) > >> > >> [PMcB] Thanks! > >> > >> > >> I like your proposal. > >> However, I think that arguing that slash is no less efficient than hash > in terms of network is just wrong. > >> > > [PMcB] Well, just to be clear, I never said it was *no* less efficient > :) ! What I was trying to say was that in the case of simply dereferencing > a vocab's namespace IRI, *in that case*, it's no less efficient - i.e., in > both cases, slash and hash, you'd expect to get back exactly the same > full-vocab-metadata response in a single HTTP request. So if you don't want > to pay any inefficiency cost, then, if possible, just dereference the > vocab's namespace IRI up-front to get everything you need in one single > HTTP request, and just cache it for all further term lookups. That'll give > you exactly the same efficiency as using hash namespace IRI - but only if > you know the namespace IRI beforehand, and can dereference it up-front. To be clear - we are counting HTTP requests here. > ]PMcB] Ok. > A look-up of one term in slash or hash mode is one request, I think. > [PMcB] Yep, absolutely (unless (and this applies equally to *both* slash and hash) we can get the info from a local cache though, right?). > The situation I am looking at is the lookup of more than one term. > [PMcB] Ok great, no problem. In general, I always suggest 'Show me the Turtle' - so let's say our client wishes to look up vocab terms `A` and `B`. So our vocab can be defined as either: 1. Using hash: <https://ex.com/vocab#> a owl:Ontology ; rdfs:comment "I'm the example vocab - with hash namespace IRI" . <https://ex.com/vocab#A> a rdf:Property ; rdfs:comment "I'm term A in the hash vocab!" ; rdfs:isDefinedBy <https://ex.com/vocab#> . <https://ex.com/vocab#B> a rdf:Property ; rdfs:comment "I'm term B in the hash vocab!" ; rdfs:isDefinedBy <https://ex.com/vocab#> . ...or... 2. Using slash: <https://ex.com/vocab/> a owl:Ontology ; rdfs:comment "I'm the example vocab - with slash namespace IRI" . # 2a. Just the triples for term 'A'. <https://ex.com/vocab/A> a rdf:Property ; rdfs:comment "I'm term A in the slash vocab!" ; rdfs:isDefinedBy <https://ex.com/vocab/> . # 2b. Just the triples for term 'B'. <https://ex.com/vocab/B> a rdf:Property ; rdfs:comment "I'm term B in the slash vocab!" ; rdfs:isDefinedBy <https://ex.com/vocab/> . At this point though I'll point out two *separate* Best Practices (BPs) that I'd recommend (both of which I'd apply regardless of any slash vs hash discussion, but that are kinda a foundation for this slash vs hash debate): (Yeah, I should have realized the need for these Best Practices *before* I kicked off this slash discussion - but heck, I'm still just learning here myself!): BP-1. That all vocab terms provide an 'rdfs:isDefinedBy' triple (or some other Best Practice-recommended predicate - I don't really mind what the predicate is, just that there's some link back from each vocab term to the overall vocabulary that defines it, and that whatever the chosen predicate is, it's recommended as a Best Practice). BP-2. That dereferencing the vocab namespace IRI always returns *all-vocab* metadata (i.e., dereferencing 'https://ex.com/vocab#' returns *all* the RDF in 1. above, and dereferencing 'https://ex.com/vocab/' returns *all* the RDF in 2. above). And I'll repeat again, just for good measure, both of these Best Practices I'd apply *regardless* of any slash vs hash discussion. So I'd suggest any discussion on them should really be in separate threads :) ! For hash URIs, there is one request for the first term in a vocab, and then > no further requests are required, because the target document has been > fetched and cached. > [PMcB] Sure thing, I totally agree - and so yeah, in this case the client receives back all the RDF in 1. above. But remember, very importantly here, you are saying that the client has *cached* the server response. So you're assuming that the client is clever enough (sophisticated enough, has the 'smarts') to implement and manage that cache, and it needs enough smarts to lookup that cache before subsequent vocab term lookups, right? That cache management is an extra burden on the client too, right? (Now don't get me wrong here - I think caching server responses here is an extremely sensible and common thing to do - I'm just pointing out that it does require extra 'smarts' in the client, that's all!). But if our client is super-naive, or doesn't want, or can't, implement any caching mechanism, then such a client needs to just blindly make a HTTP request for each and every vocab term lookup, right? I know this would be 'very silly' of the client, but it just highlights a simple 'can't-or-won't cache' client use case. And in this potential use case, using a hash would actually be a lot *less* efficient that using a slash - since with hash each HTTP request is returning a much bigger payload (i.e., all the RDF in 1. above), whereas slash would just return each term's info alone (i.e., the smaller payloads of either just the triples in 2a. or 2b. above!). But anyway, yeah, let's assume clients can cache - I'm really just making the point that using hash still requires 'smarts' on the client if that client wants/needs efficient term lookup. (But it also (rather nicely I think) illustrates again that you can never really know all the clients of your shared vocabs (or how 'smart' they might be or not be) - you just can't!) But sure, let's just assume that some form of client-side caching of server responses is fine, and our clients aren't so naive as to just blindly fire off HTTP requests unnecessarily. > For slash URIs, every lookup is a new request - it has to be, because each > one is a different document. > [PMcB] Nope, absolutely not (it seems this might be our disconnect :) ). Your statement is only correct *if your client is still behaving naively* (just in a slightly different naive way than above!) :) When the client gets back the server response from the first vocab term lookup of 'https://ex.com/vocab/A' (i.e., it gets back *just* the triples in 2a. above), it uses its 'smarts' to determine that that response does not contain all the vocab metadata (e.g., the response of 2a. does not contain any triples of the form '<> a owl:Ontology .'). So it now knows (based on Best Practice BP-1 above) that is should be able to expect a triple for this form in it's response: <https://ex.com/vocab/A> rdfs:isDefinedBy ?vocabNamespaceIRI . The client's smarts now (based on Best Practice BP-2 above) can expect that dereferencing this '?vocabNamespaceIRI' IRI will return all the vocab term metadata - i.e., it can expect to get back *all* the RDF in 2. above (i.e., including the triples in 2a. and 2b.). And of course, now the client just needs to cache *that* entire 2. response, and its cache now has all the vocab metadata it needs to lookup any subsequent terms from this entire vocab, i.e., no need for any HTTP request to lookup 'https://ex.com/vocab/B', as it's in the cache already. So yeah (if following BP-1 and BP-2 above!), there is one, and only one, extra HTTP request for slash-based vocabs, regardless of how many terms might be in that vocab. And remember, this one extra HTTP request is *only required* if we didn't know the vocab's namespace IRI in the first place ('cos if we did know that IRI up-front, we'd just dereference that as our first, and only, HTTP request and populate our cache with that response - so no need for any subsequent HTTP requests when looking up individual vocab terms). > So in some use modes, slash could be hugely more costly than hash. > [PMcB] Nope, not unless your client is too naive to be able to follow a single Best Practice-recommended predicate. And if it's that naive, then it's probably too naive to implement any form of caching at all - in which case hash would be even less efficient than slashes (as each IRI lookup is returning a much bigger payload than a slash-based vocab :) !). > And I can’t see any way that hash is ever more efficient in request > numbers than slash, but it can be in terms of network traffic, for big > and/or sparsely-used vocabs. > [PMcB] (I assume you meant "I can’t see any way that *slash* is ever more efficient in request numbers than *hash*" - if so, then yeah, I completely agree. But it's only the worst case scenario with slashes to have one extra HTTP request per vocab, and yet the payoff is greater choice and flexibility for all users (known and unknown) into the future. And in my view, that future-proofing and greater flexibility is very well worth the (only potentially) extra cost. Cheers, Pat. > > That’s what I meant > Hugh > > > > I accept indeed that it will be *less efficient* in the case of looking > up a single vocab term's IRI from a slash-based vocab, since yes, you need > to first dereference that single term IRI, then parse out (hopefully) a > `rdfs:isDefinedBy` triple, and then you have to dereference the RDF Object > value of that triple to get all the metadata for all the vocab terms. So > yes indeed, in that specific case, using slash is 'less efficient' (i.e., > it requires a bit more client-side processing and knowledge of the > `rdfs:isDefinedBy` predicate, and it's one extra HTTP request). But it > should only be one extra HTTP request per vocab (when you store/save/cache > the server responses), regardless of the number of terms in each vocab - so > not unreasonable I think, and only needed when you don't already know a > vocab's namespace IRI up-front. > > > >> But it is a price that may well be worth paying in general. > >> After all, I still think that systems don’t resolve vocab much once > they go live. > >> > > [PMcB] Yeah, I indeed think it is a price well worth paying (even if > *just* people (in general) can have a single, simple piece of *guidance* to > follow, if they so choose). In other words, I think it's vastly better > (especially for newbies) than saying (in paraphrasing Sarven's position > (sorry Sarven, I'll reply more thoroughly to your thoughts separately :) )) > - i.e., "Well, you need to decide for yourself between slash and hash for > your new vocab, by weighing up: your specific use case; reflecting on > empirical evidence, e.g., what characteristics do the majority of the > vocabs share?; and helping the URI owners when considering persistence > policies". To be honest, I feel that kind of guidance is precisely what > results in newbies running screaming to the hills... :) > > > > And yes, I totally agree too that (from my experience anyway) systems > don’t resolve vocabs much at all (including when they go live). But > regardless of whether they do or not, I think adopting slashes (as mere > guidance) helps pave the way for Linked Data clients to *be able to* more > easily and efficiently choose for themselves to resolve entire-vocab > metadata and/or individual-vocab-term metadata at runtime more and more in > the future (e.g., to drive user interfaces from vocab metadata, to help > drive dynamic queries via link traversals, etc.). Whereas just sticking > with the current empirical evidence of vocabs in the wild today (i.e., > hashes) can only result in limiting future choices for vocab users. > > > > Cheers, > > > > Pat. > > > > > > -- This e-mail, and any attachments thereto, is intended only for use by the addressee(s) named herein and may contain legally privileged, confidential and/or proprietary information. If you are not the intended recipient of this e-mail (or the person responsible for delivering this document to the intended recipient), please do not disseminate, distribute, print or copy this e-mail, or any attachment thereto. If you have received this e-mail in error, please respond to the individual sending the message, and permanently delete the email.
Received on Wednesday, 12 October 2022 11:15:38 UTC