Re: (Lost in the noise perhaps - so asking again) - Is a trailing slash 'better' than a trailing hash for vocabs namespace IRIs? from Pat McBennett on 2022-10-11 (semantic-web@w3.org from October 2022)

From: Pat McBennett <patm@inrupt.com>
Date: Tue, 11 Oct 2022 12:54:34 +0100
To: Hugh Glaser <hugh@glasers.org>
Cc: Pierre-Antoine Champin <pierre-antoine@w3.org>, semantic-web@w3.org
Message-ID: <CABgQ8m+mH8LBRFNdNiwLmW4x1osWk31Se2axZu1+Kac2dRyN9Q@mail.gmail.com>
Hiya Hugh,

Thanks so much for engaging!

On Tue, Oct 11, 2022 at 10:23 AM Hugh Glaser <hugh@glasers.org> wrote:

> Hi,
>
> > On 11 Oct 2022, at 09:49, Pat McBennett <patm@inrupt.com> wrote:
> >
> > Hiya Pierre-Antoine,
> >
> > I'm going to try and reply in-line this time - hopefully GMail won't
> garble the formatting this time (and I've prefixed my responses with [PMcB]
> too):
> >
> > On Mon, Oct 10, 2022 at 11:21 AM Pierre-Antoine Champin <
> pierre-antoine@w3.org> wrote:
> > Dear Pat,
> >
> > I just wanted to make sure we were on the same page regarding the "best
> of both worlds" situation, but clearly we are.
> >
> > To answer your question about my points c) and d) below:
> > when the client retrieves something from http://ex.co/x/, it contains
> some triples about http://ex.co/x/Z. But when the client wants to know
> exactly what http://ex.co/x/Z is, how does it determine that it does not
> need to retrieve http://ex.co/x/Z, because it already retrieved
> everything there is to know about http://ex.co/x/Z when it retrieved
> http://ex.co/x/ ?
> >
> > [PMcB] - I'd say it simply queries (i.e., locally, in-memory) the
> response it got from the server when it de-referenced http://ex.co/x/
> (i.e., the "large" representation), to see if that response already
> contains triples for http://ex.co/x/Z. In other words, I'd expect the
> Best Practice guidance to state that all vocab term metadata for all vocab
> terms be returned in that "large" representation, and not just a subset of
> term metadata, or only the `rdfs:isDefinedBy` triples for vocab terms.
> >
> > And yes, to perform such a query does need a client-side library (like
> RDF4J or Jena for Java, or RDF-JS for JavaScript, or rdflib for Python,
> etc.) - but given we're talking about RDF here in the first place, I don't
> see that as a huge ask. (Caveat: I do know and recognize of course that the
> current mainstream RDF libraries are very low-level and therefore
> 'complex', which is why we at Inrupt (and others, such as Ghent University)
> are actively trying to produce open-source, higher-level, easier-to-use
> SDKs to make doing such things much, much easier, especially for devs not
> familiar with RDF at all).
> > But without getting into that whole client-side library debate(!), I
> think the fundamental, inevitable answer to your perfectly valid question
> of "how does it determine...?" has to be "it
> checks/asks/looks-inside/queries/looks-up the response from the server",
> and therefore a minimal level of 'client understanding' of server responses
> will always be necessary.
> I like slash URIs, but sorry, I can’t like this.
> The basis of Linked Data is that you find the triples of authority for an
> entity by resolving the URI, and munching what you get back.
>

[PMcB] Yeah, but that's exactly my whole argument for using slashes :) ! In
my view, the IRI of a single vocab term is literally the authority for
*just* the triples associated with that one vocab term. I think it's
debatable what a vocab's namespace IRI is the authority over (which is why
I suggest that it be just a Best Practice guidance that it be considered
the authority over the entire vocab's metadata, including *all* the
metadata for *all* the terms defined by that vocab too).

This is also why I say using slashes is 'more correct', as it allows for a
clear distinction over what an IRI is an authority over. Hashes conflate
that by basically saying the authority over a single vocab term's metadata
is actually the entire vocab, instead of the individual term itself. And
presumably the vocab namespace IRI is *also* an identifier for that same
entire vocab. So this inability of hashes to clearly distinguish these
'authorities' seems to me to be much 'less correct'.


> Simply looking in your triplestore to see if you have triples about that
> entity is not enough.
>

[PMcB] Why not? One of the biggest lightbulb moments I've ever had with RDF
was when I first grokked the elegance and beauty of being able to store all
my T-Box data (i.e., schema metadata) alongside all the A-Box (instance)
data in a single triplestore repository.

But Ok, let's say we don't load all the T-Box data from all the vocabs we
reference (as that's perfectly reasonable too!). Well then, yeah, you're
*always* going to need to 'do more work' to discover the metadata for vocab
terms. But I think we'd all agree that you can *never* just assume that a
lookup of a single vocab term's IRI will always give you all the entire
defining-vocab's info too (because obviously none of the slash-based vocabs
will do that, like Schema.org won't today, nor will QUDT, nor gist, etc.).
So in other words, I think you'll always have to 'do more work', and I
think that necessarily means dereferencing any IRI you've got, and then
understanding and processing the response you get back from the
vocab-hosting server.

If that IRI was a hash IRI, then you still have to parse that response to
extract the term's info, and you also have to parse that response to
determine if it also contains any other vocab term metadata too ('cos you
certainly can't (or shouldn't!) just assume that it does). And at this
point, for hash IRIs, you now know that you can cache that entire server
response, and from now on do cache lookups for more terms from that vocab -
great.

But the only extra work for a slash-based IRI would be that you have to
look for an `rdfs:isDefinedBy` triple, dereference that, and cache that
server response for all further vocab term lookups - done. But of course,
you only have to do that extra lookup *if* you know you want to retrieve
(and cache, presumably) all the info for all the other vocab-defined terms
too.

So again, I think that *potential* extra work (i.e., only do it if you
really need it) for slash-based IRIs is well worth the great flexibility it
can afford users (and potential users) forever into the future.


> What happens if some other source has triples with that URI in?
> rdfs:isDefinedBy might mitigate this to some extent, but even then, why
> should I think that is any more authoritative than anything else.
>

[PMcB] I'm not sure I follow. Using `rdfs:isDefinedBy` is as authoritative
as it's possible to get, as it's metadata asserted on the individual vocab
term itself (by definition). But yeah, as I've said before too, I do think
providing an `rdfs:isDefinedBy` triple-per-vocab-term should just be a Best
Practice *regardless* of this entire slash vs hash discussion (and again,
it's just guidance, a recommendation - if you can't, or don't want to, or
can't afford the extra T-Box triples - then don't (but just know that
you'll be *potentially* hurting some users of your vocab)).


>
> Of course, if you give vocabs/onotoliges special status, then you can do
> this sort of thing.
> But if you are just treating them as the RDF/Linked Data that they are,
> then you are in trouble saying this.
> My standard system with caching triplestore etc. would always want to know
> it had got the resolved URI at some time.
>

[PMcB] I'm not sure I follow this point either. I most certainly agree with
treating vocabs as the RDF/Linked Data that they are (as I said above, that
was a big lightbulb moment for me!), and I certainly don't think that
there's any need to treat vocabs in any way specially. That's why all vocab
terms *must*, by definition, have explicit RDF types stating 'what they
are' (i.e., they *must* use `rdfs:Class`, or `rdf:Property`, or
`owl:NamedIndividual`, or `owl:Class`, etc.). So when I dereference any IRI
at all, I should be able to determine if that response contains info on
just a single vocab term, or multiple/all vocab terms, or vocab metadata
(e.g., an `rdf:type owl:Ontology` triple), or if any term metadata contains
`rdfs:isDefinedBy` triples, etc.

But I suspect I may be missing your point here and in the preceding point!


>
> This is why I think use cases are needed - slash is great for
> pre-loaded/engineered systems, but for proper dynamics aims Semantic Web,
> will incur extra fetching costs for the terms.
>

[PMcB] So yeah, but again, to the 'use-cases are needed' point - we can
*never* know all the potential use-cases up front. Even if you create a
vocab intended *only* for a narrowly defined set of use-cases, you still
can't know or predict how potential future users might *want* to use it.
(And again, this is only a guidance - if you really, really want even
potential future users to always have rigid expectations from your vocab,
then sure, go ahead and use a hash, and just explain to them why - that'd
be perfectly fine with me).

But my main point is that the 'extra fetching' costs can be massively
alleviated if the vocab simply follows Best Practice of providing
`rdfs:isDefinedBy` triples, and with just a little bit of extra smarts on
the client (but only to handle the cases where you don't know already the
vocab's namespace IRI - 'cos if you do, you'd just dereference that and
you're done).

So if you know you want the entire vocab info, and do you already know the
vocab's namespace IRI, then just dereference that and you have literally
zero extra fetching costs (i.e., you get exactly the same response,
regardless of slash or hash).

But *if* all you have is an IRI, and that IRI happens to be a single term's
IRI, then after you dereference it and parse it, if it's a slash-based IRI,
then the only extra work you need to do is to look for an
`rdfs:isDefinedBy` triple and dereference that IRI - that's it.

For all the extra flexibility and consistency (and the 'more correct-ness',
in my view) that comes from using slashes, I think there's only, at worst,
a tiny extra cost, and something that our RDF libraries and tools can
easily handle for us anyway.

Cheers,

Pat.



>
> Best
> Hugh
> >
> > One way to achieve this would be to include, in the content of
> http://ex.co/x/, the triple
> >     <http://ex.co/x/Z> rdfs:isDefinedBy <http://ex.co/x/>.
> >
> > but again, that is a convention that both the server and the client have
> to share.
> >
> > [PMcB] - Yep, exactly! I already make doing that a strongly recommended
> Best Practice for all vocabs I produce or work with, and so I'd love to see
> that become a more universally shared convention. But yes, it would just be
> a Best Practice guidance, one that I'd hope would become more and more
> widespread over time. For sure, we can't enforce it, but we can point at
> good examples from major, highly successful vocabs out there today, like
> QUDT and gist and Schema.org and DPV and ...!
> >
>
>

-- 
This e-mail, and any attachments thereto, is intended only for use by the 
addressee(s) named herein and may contain legally privileged, confidential 
and/or proprietary information. If you are not the intended recipient of 
this e-mail (or the person responsible for delivering this document to the 
intended recipient), please do not disseminate, distribute, print or copy 
this e-mail, or any attachment thereto. If you have received this e-mail in 
error, please respond to the individual sending the message, and 
permanently delete the email.
Received on Tuesday, 11 October 2022 11:54:59 UTC