Re: (Lost in the noise perhaps - so asking again) - Is a trailing slash 'better' than a trailing hash for vocabs namespace IRIs? from Pat McBennett on 2022-10-12 (semantic-web@w3.org from October 2022)

From: Pat McBennett <patm@inrupt.com>
Date: Wed, 12 Oct 2022 12:15:12 +0100
To: Hugh Glaser <hugh@glasers.org>
Cc: Pierre-Antoine Champin <pierre-antoine@w3.org>, semantic-web@w3.org
Message-ID: <CABgQ8mJgRmHsoJqstxhKPtvD7q7bz444rMJm4p3isZ8t-RbEaA@mail.gmail.com>
Hiya Hugh,

Ok, so let me tease this out with some example Turtle below...

On Wed, Oct 12, 2022 at 9:44 AM Hugh Glaser <hugh@glasers.org> wrote:

> Hi Pat.
> Looks like we still aren’t there yet.
>
> > On 12 Oct 2022, at 01:53, Pat McBennett <patm@inrupt.com> wrote:
> >
> > Hiya Hugh,
> >
> >> On Tue, Oct 11, 2022 at 1:22 PM Hugh Glaser <hugh@glasers.org> wrote:
> >> Hi Pat,
> >>
> >> (I’ve tried sorting out the quotation levels a bit)
> >>
> >> [PMcB] Thanks!
> >>
> >>
> >> I like your proposal.
> >> However, I think that arguing that slash is no less efficient than hash
> in terms of network is just wrong.
> >>
> > [PMcB] Well, just to be clear, I never said it was *no* less efficient
> :) ! What I was trying to say was that in the case of simply dereferencing
> a vocab's namespace IRI, *in that case*, it's no less efficient - i.e., in
> both cases, slash and hash, you'd expect to get back exactly the same
> full-vocab-metadata response in a single HTTP request. So if you don't want
> to pay any inefficiency cost, then, if possible, just dereference the
> vocab's namespace IRI up-front to get everything you need in one single
> HTTP request, and just cache it for all further term lookups. That'll give
> you exactly the same efficiency as using hash namespace IRI - but only if
> you know the namespace IRI beforehand, and can dereference it up-front.

To be clear - we are counting HTTP requests here.
>

]PMcB] Ok.


> A look-up of one term in slash or hash mode is one request, I think.
>

[PMcB] Yep, absolutely (unless (and this applies equally to *both* slash
and hash) we can get the info from a local cache though, right?).


> The situation I am looking at is the lookup of more than one term.
>

[PMcB] Ok great, no problem.

In general, I always suggest 'Show me the Turtle' - so let's say our client
wishes to look up vocab terms `A` and `B`. So our vocab can be defined as
either:

1. Using hash:
  <https://ex.com/vocab#> a owl:Ontology ;
    rdfs:comment "I'm the example vocab - with hash namespace IRI" .

  <https://ex.com/vocab#A> a rdf:Property ;
    rdfs:comment "I'm term A in the hash vocab!" ;
    rdfs:isDefinedBy <https://ex.com/vocab#> .

  <https://ex.com/vocab#B> a rdf:Property ;
    rdfs:comment "I'm term B in the hash vocab!" ;
    rdfs:isDefinedBy <https://ex.com/vocab#> .

...or...

 2. Using slash:
  <https://ex.com/vocab/> a owl:Ontology ;
    rdfs:comment "I'm the example vocab - with slash namespace IRI" .

  # 2a. Just the triples for term 'A'.
  <https://ex.com/vocab/A> a rdf:Property ;
    rdfs:comment "I'm term A in the slash vocab!" ;
    rdfs:isDefinedBy <https://ex.com/vocab/> .

  # 2b. Just the triples for term 'B'.
  <https://ex.com/vocab/B> a rdf:Property ;
    rdfs:comment "I'm term B in the slash vocab!" ;
    rdfs:isDefinedBy <https://ex.com/vocab/> .

At this point though I'll point out two *separate* Best Practices (BPs)
that I'd recommend (both of which I'd apply regardless of any slash vs hash
discussion, but that are kinda a foundation for this slash vs hash debate):
(Yeah, I should have realized the need for these Best Practices *before* I
kicked off this slash discussion - but heck, I'm still just learning here
myself!):

BP-1. That all vocab terms provide an 'rdfs:isDefinedBy' triple (or some
other Best Practice-recommended predicate - I don't really mind what the
predicate is, just that there's some link back from each vocab term to the
overall vocabulary that defines it, and that whatever the chosen predicate
is, it's recommended as a Best Practice).

BP-2. That dereferencing the vocab namespace IRI always returns *all-vocab*
metadata (i.e., dereferencing 'https://ex.com/vocab#' returns *all* the RDF
in 1. above, and dereferencing 'https://ex.com/vocab/' returns *all* the
RDF in 2. above).

And I'll repeat again, just for good measure, both of these Best Practices
I'd apply *regardless* of any slash vs hash discussion. So I'd suggest any
discussion on them should really be in separate threads :) !


For hash URIs, there is one request for the first term in a vocab, and then
> no further requests are required, because the target document has been
> fetched and cached.
>

[PMcB] Sure thing, I totally agree - and so yeah, in this case the client
receives back all the RDF in 1. above.

But remember, very importantly here, you are saying that the client has
*cached* the server response. So you're assuming that the client is clever
enough (sophisticated enough, has the 'smarts') to implement and
manage that cache, and it needs enough smarts to lookup that cache before
subsequent vocab term lookups, right? That cache management is an extra
burden on the client too, right? (Now don't get me wrong here - I think
caching server responses here is an extremely sensible and common thing to
do - I'm just pointing out that it does require extra 'smarts' in the
client, that's all!).

But if our client is super-naive, or doesn't want, or can't, implement any
caching mechanism, then such a client needs to just blindly make a HTTP
request for each and every vocab term lookup, right? I know this would be
'very silly' of the client, but it just highlights a simple 'can't-or-won't
cache' client use case. And in this potential use case, using a hash would
actually be a lot *less* efficient that using a slash - since with hash
each HTTP request is returning a much bigger payload (i.e., all the RDF in
1. above), whereas slash would just return each term's info alone (i.e.,
the smaller payloads of either just the triples in 2a. or 2b. above!).

But anyway, yeah, let's assume clients can cache - I'm really just making
the point that using hash still requires 'smarts' on the client if that
client wants/needs efficient term lookup. (But it also (rather nicely I
think) illustrates again that you can never really know all the clients of
your shared vocabs (or how 'smart' they might be or not be) - you just
can't!)

But sure, let's just assume that some form of client-side caching of server
responses is fine, and our clients aren't so naive as to just blindly fire
off HTTP requests unnecessarily.


> For slash URIs, every lookup is a new request - it has to be, because each
> one is a different document.
>

[PMcB] Nope, absolutely not (it seems this might be our disconnect :) ).

Your statement is only correct *if your client is still behaving naively*
(just in a slightly different naive way than above!) :)

When the client gets back the server response from the first vocab term
lookup of 'https://ex.com/vocab/A' (i.e., it gets back *just* the triples
in 2a. above), it uses its 'smarts' to determine that that response does
not contain all the vocab metadata (e.g., the response of 2a. does not
contain any triples of the form '<> a owl:Ontology .').

So it now knows (based on Best Practice BP-1 above) that is should be able
to expect a triple for this form in it's response:

  <https://ex.com/vocab/A> rdfs:isDefinedBy ?vocabNamespaceIRI .

The client's smarts now (based on Best Practice BP-2 above) can expect that
dereferencing this '?vocabNamespaceIRI' IRI will return all the vocab term
metadata - i.e., it can expect to get back *all* the RDF in 2. above (i.e.,
including the triples in 2a. and 2b.).

And of course, now the client just needs to cache *that* entire 2.
response, and its cache now has all the vocab metadata it needs to lookup
any subsequent terms from this entire vocab, i.e., no need for any HTTP
request to lookup 'https://ex.com/vocab/B', as it's in the cache already.

So yeah (if following BP-1 and BP-2 above!), there is one, and only one,
extra HTTP request for slash-based vocabs, regardless of how many terms
might be in that vocab. And remember, this one extra HTTP request is *only
required* if we didn't know the vocab's namespace IRI in the first place
('cos if we did know that IRI up-front, we'd just dereference that as our
first, and only, HTTP request and populate our cache with that response -
so no need for any subsequent HTTP requests when looking up individual
vocab terms).


> So in some use modes, slash could be hugely more costly than hash.
>

[PMcB] Nope, not unless your client is too naive to be able to follow a
single Best Practice-recommended predicate. And if it's that naive, then
it's probably too naive to implement any form of caching at all - in which
case hash would be even less efficient than slashes (as each IRI lookup is
returning a much bigger payload than a slash-based vocab :) !).


> And I can’t see any way that hash is ever more efficient in request
> numbers than slash, but it can be in terms of network traffic, for big
> and/or sparsely-used vocabs.
>

[PMcB] (I assume you meant "I can’t see any way that *slash* is ever more
efficient in request numbers than *hash*" - if so, then yeah, I completely
agree. But it's only the worst case scenario with slashes to have one extra
HTTP request per vocab, and yet the payoff is greater choice and
flexibility for all users (known and unknown) into the future.

And in my view, that future-proofing and greater flexibility is very well
worth the (only potentially) extra cost.

Cheers,
Pat.


>
> That’s what I meant
> Hugh
> >
> > I accept indeed that it will be *less efficient* in the case of looking
> up a single vocab term's IRI from a slash-based vocab, since yes, you need
> to first dereference that single term IRI, then parse out (hopefully) a
> `rdfs:isDefinedBy` triple, and then you have to dereference the RDF Object
> value of that triple to get all the metadata for all the vocab terms. So
> yes indeed, in that specific case, using slash is 'less efficient' (i.e.,
> it requires a bit more client-side processing and knowledge of the
> `rdfs:isDefinedBy` predicate, and it's one extra HTTP request). But it
> should only be one extra HTTP request per vocab (when you store/save/cache
> the server responses), regardless of the number of terms in each vocab - so
> not unreasonable I think, and only needed when you don't already know a
> vocab's namespace IRI up-front.
> >
> >> But it is a price that may well be worth paying in general.
> >> After all, I still think that systems don’t resolve vocab much once
> they go live.
> >>
> > [PMcB] Yeah, I indeed think it is a price well worth paying (even if
> *just* people (in general) can have a single, simple piece of *guidance* to
> follow, if they so choose). In other words, I think it's vastly better
> (especially for newbies) than saying (in paraphrasing Sarven's position
> (sorry Sarven, I'll reply more thoroughly to your thoughts separately :) ))
> - i.e., "Well, you need to decide for yourself between slash and hash for
> your new vocab, by weighing up: your specific use case; reflecting on
> empirical evidence, e.g., what characteristics do the majority of the
> vocabs share?; and helping the URI owners when considering persistence
> policies". To be honest, I feel that kind of guidance is precisely what
> results in newbies running screaming to the hills... :)
> >
> > And yes, I totally agree too that (from my experience anyway) systems
> don’t resolve vocabs much at all (including when they go live). But
> regardless of whether they do or not, I think adopting slashes (as mere
> guidance) helps pave the way for Linked Data clients to *be able to* more
> easily and efficiently choose for themselves to resolve entire-vocab
> metadata and/or individual-vocab-term metadata at runtime more and more in
> the future (e.g., to drive user interfaces from vocab metadata, to help
> drive dynamic queries via link traversals, etc.). Whereas just sticking
> with the current empirical evidence of vocabs in the wild today (i.e.,
> hashes) can only result in limiting future choices for vocab users.
> >
> > Cheers,
> >
> > Pat.
> >
> >
>
>

-- 
This e-mail, and any attachments thereto, is intended only for use by the 
addressee(s) named herein and may contain legally privileged, confidential 
and/or proprietary information. If you are not the intended recipient of 
this e-mail (or the person responsible for delivering this document to the 
intended recipient), please do not disseminate, distribute, print or copy 
this e-mail, or any attachment thereto. If you have received this e-mail in 
error, please respond to the individual sending the message, and 
permanently delete the email.
Received on Wednesday, 12 October 2022 11:15:38 UTC