Re: (Lost in the noise perhaps - so asking again) - Is a trailing slash 'better' than a trailing hash for vocabs namespace IRIs? from Pat McBennett on 2022-10-11 (semantic-web@w3.org from October 2022)

From: Pat McBennett <patm@inrupt.com>
Date: Tue, 11 Oct 2022 09:49:37 +0100
To: Pierre-Antoine Champin <pierre-antoine@w3.org>
Cc: semantic-web@w3.org
Message-ID: <CABgQ8mJCmqDzoOv1x14uhBk3VkDsANGmspchKCcgLqMcgreTpQ@mail.gmail.com>
Hiya Pierre-Antoine,

I'm going to try and reply in-line this time - hopefully GMail won't garble
the formatting this time (and I've prefixed my responses with [PMcB] too):

On Mon, Oct 10, 2022 at 11:21 AM Pierre-Antoine Champin <
pierre-antoine@w3.org> wrote:

> Dear Pat,
>
> I just wanted to make sure we were on the same page regarding the "best of
> both worlds" situation, but clearly we are.
>
> To answer your question about my points c) and d) below:
> when the client retrieves something from http://ex.co/x/, it contains
> some triples about http://ex.co/x/Z. But when the client wants to know
> exactly what http://ex.co/x/Z is, how does it determine that it does *not
> need *to retrieve http://ex.co/x/Z, because it already retrieved
> everything there is to know about http://ex.co/x/Z when it retrieved
> http://ex.co/x/ ?
>
[PMcB] - I'd say it simply queries (i.e., locally, in-memory) the response
it got from the server when it de-referenced http://ex.co/x/ (i.e., the
"large" representation), to see if that response already contains triples
for http://ex.co/x/Z. In other words, I'd expect the Best Practice guidance
to state that all vocab term metadata for all vocab terms be returned in
that "large" representation, and not just a subset of term metadata, or
only the `rdfs:isDefinedBy` triples for vocab terms.

And yes, to perform such a query does need a client-side library (like
RDF4J or Jena for Java, or RDF-JS for JavaScript, or rdflib for Python,
etc.) - but given we're talking about RDF here in the first place, I don't
see that as a huge ask. (Caveat: I do know and recognize of course that the
current mainstream RDF libraries are very low-level and therefore
'complex', which is why we at Inrupt (and others, such as Ghent University)
are actively trying to produce open-source, higher-level, easier-to-use
SDKs to make doing such things much, much easier, especially for devs not
familiar with RDF at all).
But without getting into that whole client-side library debate(!), I think
the fundamental, inevitable answer to your perfectly valid question of "how
does it determine...?" has to be "it
checks/asks/looks-inside/queries/looks-up the response from the server",
and therefore a minimal level of 'client understanding' of server responses
will always be necessary.

One way to achieve this would be to include, in the content of
> http://ex.co/x/, the triple
>     <http://ex.co/x/Z> rdfs:isDefinedBy <http://ex.co/x/>.
>
> but again, that is a convention that both the server and the client have
> to share.
>
[PMcB] - Yep, exactly! I already make doing that a strongly recommended
Best Practice for all vocabs I produce or work with, and so I'd love to see
that become a more universally shared convention. But yes, it would just be
a Best Practice guidance, one that I'd hope would become more and more
widespread over time. For sure, we can't enforce it, but we can point
at good examples from major, highly successful vocabs out there today, like
QUDT and gist and Schema.org and DPV and ...!


> Overall, I think we agree that slash-IRIs provide more flexibility.
> However,
>
> - that flexibility requires more work on the server side (to maintain
> consistency between the "focused" representations and the "large"
> representation), and
>
[PMcB] - Agreed - but only *if you _want_ to provide that extra flexibility
to your users* - i.e., only *then* do you need to do that extra work on
your server side. If you don't want, or can't, or couldn't be bothered to
provide that extra flexibility to users of your vocabs (or potential users,
as you can never know all the potential users of any shared vocab, see more
below!), then simply don't!

Instead, just provide the same "large" representation response to all
vocab-namespace and vocab-term lookups (by using a single URL Rewriting
rule on your server). You can do that, and continue using slashes in your
term IRIs to leave-the-door-open to you later providing users the
greater flexibility (e.g., if your vocab grows much larger, or becomes
super-popular, or applications need or want vocab term lookups to be more
efficient, etc.).

And finally, I think this extra server-side work is work that relatively
simple tooling can do automatically for you (e.g., see my recent feature
request on Widoco here: https://github.com/dgarijo/Widoco/issues/559 (it's
a feature I'd love to implement and contribute to Widoco myself, if I can
find the time).

- it only pays off if the client is smart enough to understand the hints
> provided by the server (otherwise, the dumb client may only retrieve the
> "focused" representations -- or worse, retrieve all the "focused"
> representations AND the "large" representation).
>
 [PMcB] - Well, as I said above, there is nothing 'to pay off' in the first
place it you use slashes and a simple URL Rewrite rule - i.e., with that,
you'll get exactly what you get today with hashes, yet you've left the door
open to providing your users with greater flexibility later (at which point
you *then* have to pay the extra server-side cost, but only if and whenever
you deem it worthwhile (or your vocab users demand it!)).

And it's not like I see metadata like `rdfs:isDefinedBy` as a mere hint
either - I think, for vocab terms, it's simply an example of 'good data
modelling' (regardless of any hash vs slash discussion). And so whether
clients are ever smart enough to understand and take advantage of that (or
any other term metadata) is kinda irrelevant here I think. At the very
least, our own internal vocab documentation *will* be able to take
advantage of it, and so I think it always pays for itself regardless.

> But all I'm looking for here is this community's opinion on whether we
> can offer a clear, *single*, *preference* for the creators of new RDF
> vocabularies going forward
>
> Again, I believe this is a matter of trade-of, so my personal opinion is:
> no, there is no clear single solution that is better than the other in all
> cases.
>
[PMcB] Hhhmmm... so it seems my basic argument isn't really resonating with
you then :) !

Maybe another point I'll try to make here is to your point of "there is no
clear single solution that is better than the other in *all cases*", also
stated by Christian Chiarcos in his suggestion that the "choice be driven
by *the intended use case*". But isn't the whole point of publishing any
vocabulary on the Web in the first place that literally anyone else on the
web can simply choose to reuse the terms in that vocab for whatever purpose
*they choose*, without ever having to request your permission...? In other
words, no vocab publisher can *ever* know all the use cases to which their
vocabs might be used.

Even within the closed eco-system of a single Enterprise, I'd say you still
can't predict all intended use-cases for a shared vocabulary - because
no-one can ever know what future applications (e.g., via mergers or
acquisitions) may come along that might wish to reuse the terms from that
vocab.

So if we accept that as a fundamental principle of Linked Data, then isn't
it kinda self-evident that in an ideal world, as good netizens, we should
always strive to publish our vocabs to be as flexible as possible to
support any possible future users - and especially so if we can start off
doing that without any extra work (i.e., using URL Rewriting and always
just returning the single "large" vocab representation).

My argument is simply that using slashes doesn't *require* any extra
server-side work at all initially (i.e., using an URL Rewrite rule results
in the exact same 'amount of work' as using hashes today). But slashes
*also* allow (perhaps much later, maybe never) a means to offer both
"small" and "large" representations (with `rdfs:isDefinedBy` links between
them) *if* and *when* you might choose to begin offering that extra
flexibility. If you start out with slashes, nothing breaks for users (in
terms of IRI changes) when you choose to offer that greater flexibility,
whereas if you start out with hashes, and then later
want/need/desire/require that extra flexibility, you'll have to change all
your existing vocab term IRIs, thereby 'breaking' all the existing clients
- yikes!

So since slashes are literally the only means to offer *the capability* of
providing the best-of-both-worlds here from the very beginning of vocab
creation (even if both options are not provided 'automatically'), that, for
me, does make slashes the clear single solution that *is* better across all
possible potential future use-cases.


> > Newbies don't want to have to 'decide for themselves' if they can help
> it when learning new technology - and so they'll just continue the current
> practice of cutting-and-pasting what they see as most prevalent out there
> today [i.e. hash-IRIs]
>
> ... and I personally believe this is an OK approach for newbies. Whenever
> your ontology becomes too big for this approach (as Schema.org or QUDT),
> and/or you look for more flexibility, you need more expertise -- and then
> you need to make informed choices. :)
>
[PMcB] But it's that current wishy-washy guidance that I believe is doing a
dis-service to the broader vision of Linked Data. It's saying to newbies
that they need to make a choice for themselves (never a good thing to ask
newbies to do), when all they want is clear, simple guidance along the
lines of: *prefer this very specific, single option*, and preferably one
they won't ever have to change later.

I feel that that current guidance is leading newbies toward starting-off
with the less flexible hash option (since hash IRIs make up the bulk of
vocabs today), and they'll see the enlightened vocabs (i.e., the ones using
slash) as ones that might *require* 'more expertise' somehow, which isn't
true - you just *optionally* need more expertise (i.e., more server-side
tooling) if and when you choose to offer greater flexibility to the
potential myriad users of your published vocabs.

And finally, just to reiterate, this would simply be providing a clear
*preference* - if anyone wishes to go against that guidance, and use
hashes, then of course, they'll always be completely free to do that too,
but just hopefully with far more awareness of the consequences of them
doing that.

>   pa
> On 10/10/2022 11:08, Pat McBennett wrote:
>
> Hi Pierre-Antoine,
>
> Thanks so much for engaging in the discussion - I really, really
> appreciate it! (I'm re-sending this as I just noticed that the tabbing from
> my GMail response didn't come through formatted correctly - so I've
> prefixed quotes with initials of [PAC] for [Pierre-Antonie Champin])
>
> You bring up some great points, but in my view, and with a bit more
> clarification from my side, I think they all seem to actually be
> reinforcing the argument for slashes (just as a *preference* when creating
> new vocabs)!
>
> [PAC] "In the general case, when you encounter an IRI of the form
> http://ex.co/x/Y, you can not assume that http://ex.co/x/ will contain
> the definition of http://ex.co/x/Y together with other related terms."
>
> Yep, absolutely, I totally agree - this is the wild-wild-west of the World
> Wide Web after all, so yeah, regardless of slashes or hashes, we can never
> assume anything at all when dereferencing any IRI anywhere.
> And so yeah, nobody should ever make the assumption you point out here (it
> never even occurred to me that anyone would!). As you say below, that's why
> `rdfs:isDefinedBy` is so useful (and should be a general Best Practice for
> vocabs anyway, I'd say).
>
> [PAC] "For this you need,
>
> a) the server to provide an affordance in the description of
> http://ex.co/x/Y pointing to http://ex.co/x/  (e.g. by using
> rdfs:isDefinedBy)
>
> Yep - totally agree. Which is why I always highly recommend (as a
> separate, but related, general Best Practice) that all vocab terms should
> always provide an `rdfs:isDefinedBy` triple regardless. (Thankfully that
> term's local name (i.e., 'isDefinedBy') is very intuitively
> self-explanatory!)
>
> b) the client to understand and follow that affordance."
>
> Well, yeah, kinda, but only *if* that client *wants* to be able to take
> advantage of that extremely handy and helpful little affordance to
> follow-its-nose. And given the client needs to understand RDF (to some
> minimal degree at least) in the first place to be working with RDF vocabs
> at all, that doesn't seem like a problem to me, or any kind of an issue at
> all.
>
> In other words, all I'm stating is that if we *prefered* slashes, then any
> client wishing to understand what any individual IRI *is* can simply
> deference that IRI (i.e., isn't that just the first principle of Linked
> Data really!). And they should always (in my opinion) be able to expect to
> get back *only* a representation of whatever that IRI represents. So *if*
> the IRI they dereferenced happened to be an individual vocab term, then
> (only with slash-based vocabs) they'd correctly get back data on just that
> one vocab term.
>
> (I'd consider it merely a Best Practice that they might *also* be able to
> expect a `rdfs:isDefinedBy` link to the overall vocab within which that
> single vocab term is defined - but they only need to understand and/or
> follow that link if they ever wanted to *also* discover information on the
> containing is-defined-by vocab.)
> So in other words, I only see goodness here, and a simple consistent
> expression of Linked Data first principles (i.e., all 'things of interest'
> should have uniquely *dereferencable* IRIs, and you can choose to
> follow-your-nose to 'more info' if you want to, and you understand the
> predicates leading the way).
>
> [PAC] "c) the description at http://ex.co/x/ to include some information
> about any term (e.g. http://ex.co/x/Z) in contains stating "there is
> nothing more to know about this term" (e.g. by using rdfs:isDefinedBy
> again)"
>
> I don't quite follow this point - perhaps you could elaborate a little, or
> provide some sample Turtle...? (For example, I would expect the description
> at http://ex.co/x/ (assuming that to be a vocab namespace IRI) to indeed
> contain *all* of the information about the vocab itself (like it's
> versioning info, preferred prefix, creation date, etc.), and *all* the
> information about *all* of the terms that that vocab contains/defines -
> i.e., exactly as QUDT do today when you click on their namespace IRI:
> https://qudt.org/schema/qudt/ (although they only seem to provide Turtle,
> and not a content-negotiable full HTML representation that I'd prefer to
> see them provide from my browser (e.g., DPV does provide a lovely 'complete
> vocab' that is a content-negotiable (i.e., HTML or Turtle) representation
> when you dereference it's namespace IRI today: http://www.w3.org/ns/dpv#)
> ).
>
>
> [PAC] "d) the client to understand that statement and refrain from
> fetching http://ex.co/x/Z later on"
>
> I didn't follow the above point, so this one loses me too, but (I think)
> this comes down to clients needing to know (regardless of slash or hash)
> the basic difference between an `owl:Ontology` and an `rdf:Class` or
> `rdfs:Property` (i.e., between 'a full vocab' and 'a term in a vocab') in
> the responses they get from servers when they are dereferencing
> vocab-related IRIs anyway. I don't think the issues around caching are
> majorly affected by the slash/hash choice - but perhaps I'm missing your
> real point here...
>
>
> [PAC] "So you don't get "the best of both world" as automatically as you
> suggest."
>
> Oh yeah, absolutely - so we agree again. I should have been clearer
> perhaps - I certainly didn't mean to imply that getting
> the-best-of-both-worlds was in any way 'automatic' at all. As I said later
> in my post, getting both will require more work on the server side, or from
> tooling.
> All I'm trying to emphasize is that slashes provide *a means* to get
> the-best-of-both-worlds, whereas with hashes the best servers can ever
> offer to clients (regardless of the needs or wants of those clients) is to
> return information on all terms in the entire vocab (since, by HTTP design,
> the server will never receive the hash fragment in any HTTP request), and
> so you can never, ever offer any client *the option* to just retrieve a
> single vocab term's information and nothing else *if that's what the client
> wants/needs/prefers*.
>
> [PAC] "Terms of a vocabulary/ontology rarely make sense in isolation. So
> arguably, serving the entire vocabulary provides you with enough context to
> understand/use the term appropriately."
>
> Well, I wouldn't agree with that at all. I think QUDT's CurrencyUnit (
> https://qudt.org/schema/qudt/CurrencyUnit) is a great example of where it
> makes perfect sense (i.e., all I want to know is what QUDT *mean* by a
> 'CurrencyUnit'). And surely no-one would argue that Schema.org should
> switch from its current slash to use hash instead, because terms like
> Person (https://schema.org/Person) need context from the entire 2,500
> terms defined in Schema.org as a whole.
>
> But I do certainly agree with your point that individual terms should
> indeed provide enough context to understand/use the term - but I think all
> that context should be provided *in isolation* within the vocab's
> definition of that term itself, and should certainly not require
> downloading the entire vocabulary - i.e., examples of precisely that are
> `rdfs:isDefinedBy`, `rdfs:domain`, `schema:rangeIncludes`, `rdfs:seeAlso`,
> `skos:related`, `skos:narrower`, etc.
> Now, given that much of that 'term-specific context' would actually be
> IRIs, it should then be up to the client to decide if it now wishes to
> dereference each of those individual links with multiple HTTP requests, or
> if it actually wishes to now download the entire vocabulary in one HTTP
> request (again, only slashes offer all clients the choice and flexibility
> for them to decide between those options for themselves).
>
> [PAC] "And then you get "bombarded with a huge document"..."
>
> Yep, but again my point is that only with slash do clients get at least
> the option, or the ability, or the possibility, to *choose for themselves*
> whether they get bombarded with the entire vocab in one HTTP request or not.
>
>
> [PAC] "[PMcB] - So doesn't that demonstrate my whole point - i..e, that
> with slashes I can get the best of both worlds
>
> I don't think so. They are different trade-offs between providing targeted
> content vs. reducing the number of HTTP queries, and between working with
> dumb clients and/or dumb servers vs. requiring more coordination between
> them  (e.g. providing and following rdfs:isDefinedBy links)."
>
> Well to emphasize my point, with slashes I *can* get
> the-best-of-both-worlds, with hashes I *can't*.
> Yep, for sure there are tradeoffs, and indeed implementing the full set of
> options (with full conneg, and providing/generating individual
> term-specific static HTML pages, etc.) definitely requires more server-side
> work/tooling. But I'd argue that adopting slash still doesn't *require*
> that any of that work be done at all, for example, if all you have to work
> with are dumb servers - i.e. your dumb server can just continue as always,
> serving up the full vocab information for all requests using the exact same
> single static page it uses today with hash, by simply using a single URL
> rewrite rule to rewrite http://ex.co/x/Z to http://ex.co/x#Z
> <http://ex.co/x/Z>. Sure, that breaks the client expectation somewhat
> (i.e., "I only asked for info on term Z, and you gave me info on all the
> vocab terms - but at least you provided the HTML anchor so that my browser
> jumped down automatically to the info for term Z!") - but that's a
> worst-case scenario (i.e., a scenario that may have been forced on you due
> to only having dumb servers and no server-side tooling), and yet it's still
> no worse than what you get today with hashes (i.e., it *is* what you get
> today with hashes).
>
> [PAC] "[PMcB} - And that's why I posit that slashes are simply 'more
> correct' (i.e., since *only* slashes can ever allow servers to always know
> exactly, unambiguously, what a requesting client is really looking for
>
> I don't by that. The server can never know exactly nor unambiguously what
> the intent of the client is, nor should it (separation of concerns)."
>
> Sure, of course :) ! So let me re-phrase my point - only slashes provide
> the means for the server to *see* the *full/complete IRI* that a client may
> wish to de-reference. In other words, with hashes, by HTTP design, the
> client *must* strip off the hash fragment (if any) before putting the HTTP
> request on the wire - hence the server can't ever distinguish between a
> client asking for info on a single term (e.g., GET http://ex.co/x#Z
> <http://ex.co/x/Z>) or a client wishing for info on the entire vocab
> (e.g., GET http://ex.co/x# <http://ex.co/x/Z> or just GET http://ex.co/x
> <http://ex.co/x/Z>).
>
>
> [PAC] "Cant' help but cite the priority of consituencies remininded in
> https://www.w3.org/TR/design-principles/#priority-of-constituencies"
>
> Yep, exactly (we agree again!) - but for me, this is precisely an argument
> for slashes - i.e., hashes restrict what clients can possibly get back from
> a server (i.e., they'll always get the full vocab information back),
> whereas slashes at least provide *the potential* for servers to offer
> clients more flexibility and choice (i.e., info just on individual terms,
> *or* info on the full vocab).
> So surely giving clients *more* choice (with slashes), not less (with
> hashes), is putting their needs first (since we can't possibly ever know
> up-front, for any vocab, the 'needs' of all potential users (i.e., the
> entire user base of the Web) for vocabs we publish, right!?
>
> [PAC] "Also, in a distributed setting such as the web, you can not assume
> that all other parties will always do the right thing™."
>
> Again, I totally agree (who wouldn't!).
> But all I'm looking for here is this community's opinion on whether we can
> offer a clear, *single*, *preference* for the creators of new RDF
> vocabularies going forward. I think we can, and that *preference* should be
> using slashes (i.e., not a requirement, or a mandate, or something anyone
> can ever force people to do). I just think the current state of guidance in
> the Linked Data community is too loose and therefore off-putting for
> newbies - i.e., "You can do either, there are pro's and con's, but it
> doesn't really matter much, so you can just decide for yourself". Newbies
> don't want to have to 'decide for themselves' if they can help it when
> learning new technology - and so they'll just continue the current practice
> of cutting-and-pasting what they see as most prevalent out there today
> (e.g., nearly all the W3C vocab examples today), which will most likely
> mean repeating the 'mistake' of using hashes, and thereby 'hurting' the
> longer-term options for client/user software that may wish to have the
> ability (at some future stage perhaps) to be able to choose for themselves
> between term-specific or full-vocab lookups.
>
> Thanks again Pierre-Antoine for pushing me to think this through even more
> thoroughly - I hope it's been somewhat useful for you (and others?) to
> ponder on too :)
>
> Pat.
>
> *Pat McBennett*, Technical Architect
>
> Contact  | patm@inrupt.com
>
> Connect | WebID <http://pmcb55.inrupt.net/profile/card#me>, GitHub
> <https://github.com/pmcb55>
>
> Explore  | www.inrupt.com
>
>
>
>
> On Sat, Oct 8, 2022 at 2:10 AM Pat McBennett <patm@inrupt.com> wrote:
>
>> Hi Pierre-Antoine,
>>
>> Thanks so much for engaging in the discussion - I really, really
>> appreciate it!
>>
>> You bring up some great points, but in my view, and with a bit more
>> clarification from my side, I think they all seem to actually be
>> reinforcing the argument for slashes (just as a *preference* when creating
>> new vocabs)!
>>
>> In the general case, when you encounter an IRI of the form
>> http://ex.co/x/Y, you can not assume that http://ex.co/x/ will contain
>> the definition of http://ex.co/x/Y together with other related terms.
>>
>> Yep, absolutely, I totally agree - this is the wild-wild-west of the
>> World Wide Web after all, so yeah, regardless of slashes or hashes, we can
>> never assume anything at all when dereferencing any IRI anywhere.
>> And so yeah, nobody should ever make the assumption you point out here
>> (it never even occurred to me that anyone would!). As you say below, that's
>> why `rdfs:isDefinedBy` is so useful (and should be a general Best Practice
>> for vocabs anyway, I'd say).
>>
>> For this you need,
>>
>> a) the server to provide an affordance in the description of
>> http://ex.co/x/Y pointing to http://ex.co/x/  (e.g. by using
>> rdfs:isDefinedBy)
>>
>> Yep - totally agree. Which is why I always highly recommend (as a
>> separate, but related, general Best Practice) that all vocab terms should
>> always provide an `rdfs:isDefinedBy` triple regardless. (Thankfully that
>> term's local name (i.e., 'isDefinedBy') is very intuitively
>> self-explanatory!)
>>
>> b) the client to understand and follow that affordance.
>>
>> Well, yeah, kinda, but only *if* that client *wants* to be able to take
>> advantage of that extremely handy and helpful little affordance to
>> follow-its-nose. And given the client needs to understand RDF (to some
>> minimal degree at least) in the first place to be working with RDF vocabs
>> at all, that doesn't seem like a problem to me, or any kind of an issue at
>> all.
>>
>> In other words, all I'm stating is that if we *prefered* slashes, then
>> any client wishing to understand what any individual IRI *is* can simply
>> deference that IRI (i.e., isn't that just the first principle of Linked
>> Data really!). And they should always (in my opinion) be able to expect to
>> get back *only* a representation of whatever that IRI represents. So *if*
>> the IRI they dereferenced happened to be an individual vocab term, then
>> (only with slash-based vocabs) they'd correctly get back data on just that
>> one vocab term.
>>
>> (I'd consider it merely a Best Practice that they might *also* be able to
>> expect a `rdfs:isDefinedBy` link to the overall vocab within which that
>> single vocab term is defined - but they only need to understand and/or
>> follow that link if they ever wanted to *also* discover information on the
>> containing is-defined-by vocab.)
>> So in other words, I only see goodness here, and a simple consistent
>> expression of Linked Data first principles (i.e., all 'things of interest'
>> should have uniquely *dereferencable* IRIs, and you can choose to
>> follow-your-nose to 'more info' if you want to, and you understand the
>> predicates leading the way).
>>
>> c) the description at http://ex.co/x/ to include some information about
>> any term (e.g. http://ex.co/x/Z) in contains stating "there is nothing
>> more to know about this term" (e.g. by using rdfs:isDefinedBy again)
>>
>> I don't quite follow this point - perhaps you could elaborate a little,
>> or provide some sample Turtle...? (For example, I would expect the
>> description at http://ex.co/x/ (assuming that to be a vocab namespace
>> IRI) to indeed contain *all* of the information about the vocab itself
>> (like it's versioning info, preferred prefix, creation date, etc.), and
>> *all* the information about *all* of the terms that that vocab
>> contains/defines - i.e., exactly as QUDT do today when you click on their
>> namespace IRI: https://qudt.org/schema/qudt/ (although they only seem to
>> provide Turtle, and not a content-negotiable full HTML representation that
>> I'd prefer to see them provide from my browser (e.g., DPV does provide a
>> lovely 'complete vocab' that is a content-negotiable (i.e., HTML or Turtle)
>> representation when you dereference it's namespace IRI today:
>> http://www.w3.org/ns/dpv#)).
>>
>>
>> d) the client to understand that statement and refrain from fetching
>> http://ex.co/x/Z later on
>>
>> I didn't follow the above point, so this one loses me too, but (I think)
>> this comes down to clients needing to know (regardless of slash or hash)
>> the basic difference between an `owl:Ontology` and an `rdf:Class` or
>> `rdfs:Property` (i.e., between 'a full vocab' and 'a term in a vocab') in
>> the responses they get from servers when they are dereferencing
>> vocab-related IRIs anyway. I don't think the issues around caching are
>> majorly affected by the slash/hash choice - but perhaps I'm missing your
>> real point here...
>>
>>
>> So you don't get "the best of both world" as automatically as you suggest.
>>
>> Oh yeah, absolutely - so we agree again. I should have been clearer
>> perhaps - I certainly didn't mean to imply that getting
>> the-best-of-both-worlds was in any way 'automatic' at all. As I said later
>> in my post, getting both will require more work on the server side, or from
>> tooling.
>> All I'm trying to emphasize is that slashes provide *a means* to get
>> the-best-of-both-worlds, whereas with hashes the best servers can ever
>> offer to clients (regardless of the needs or wants of those clients) is to
>> return information on all terms in the entire vocab (since, by HTTP design,
>> the server will never receive the hash fragment in any HTTP request), and
>> so you can never, ever offer any client *the option* to just retrieve a
>> single vocab term's information and nothing else *if that's what the client
>> wants/needs/prefers*.
>>
>> Terms of a vocabulary/ontology rarely make sense in isolation. So
>> arguably, serving the entire vocabulary provides you with enough context to
>> understand/use the term appropriately.
>>
>> Well, I wouldn't agree with that at all. I think QUDT's CurrencyUnit (
>> https://qudt.org/schema/qudt/CurrencyUnit) is a great example of where
>> it makes perfect sense (i.e., all I want to know is what QUDT *mean* by a
>> 'CurrencyUnit'). And surely no-one would argue that Schema.org should
>> switch from its current slash to use hash instead, because terms like
>> Person (https://schema.org/Person) need context from the entire 2,500
>> terms defined in Schema.org as a whole.
>>
>> But I do certainly agree with your point that individual terms should
>> indeed provide enough context to understand/use the term - but I think all
>> that context should be provided *in isolation* within the vocab's
>> definition of that term itself, and should certainly not require
>> downloading the entire vocabulary - i.e., examples of precisely that are
>> `rdfs:isDefinedBy`, `rdfs:domain`, `schema:rangeIncludes`, `rdfs:seeAlso`,
>> `skos:related`, `skos:narrower`, etc.
>> Now, given that much of that 'term-specific context' would actually be
>> IRIs, it should then be up to the client to decide if it now wishes to
>> dereference each of those individual links with multiple HTTP requests, or
>> if it actually wishes to now download the entire vocabulary in one HTTP
>> request (again, only slashes offer all clients the choice and flexibility
>> for them to decide between those options for themselves).
>>
>> And then you get "bombarded with a huge document"...
>>
>> Yep, but again my point is that only with slash do clients get at least
>> the option, or the ability, or the possibility, to *choose for themselves*
>> whether they get bombarded with the entire vocab in one HTTP request or not.
>>
>>
>> So doesn't that demonstrate my whole point - i..e, that with slashes I
>> can get the best of both worlds
>>
>> I don't think so. They are different trade-offs between providing
>> targeted content vs. reducing the number of HTTP queries, and between
>> working with dumb clients and/or dumb servers vs. requiring more
>> coordination between them  (e.g. providing and following rdfs:isDefinedBy
>> links).
>>
>> Well to emphasize my point, with slashes I *can* get
>> the-best-of-both-worlds, with hashes I *can't*.
>> Yep, for sure there are tradeoffs, and indeed implementing the full set
>> of options (with full conneg, and providing/generating individual
>> term-specific static HTML pages, etc.) definitely requires more server-side
>> work/tooling. But I'd argue that adopting slash still doesn't *require*
>> that any of that work be done at all, for example, if all you have to work
>> with are dumb servers - i.e. your dumb server can just continue as always,
>> serving up the full vocab information for all requests using the exact same
>> single static page it uses today with hash, by simply using a single URL
>> rewrite rule to rewrite http://ex.co/x/Z to http://ex.co/x#Z
>> <http://ex.co/x/Z>. Sure, that breaks the client expectation somewhat
>> (i.e., "I only asked for info on term Z, and you gave me info on all the
>> vocab terms - but at least you provided the HTML anchor so that my browser
>> jumped down automatically to the info for term Z!") - but that's a
>> worst-case scenario (i.e., a scenario that may have been forced on you due
>> to only having dumb servers and no server-side tooling), and yet it's still
>> no worse than what you get today with hashes (i.e., it *is* what you get
>> today with hashes).
>>
>> And that's why I posit that slashes are simply 'more correct' (i.e.,
>> since *only* slashes can ever allow servers to always know exactly,
>> unambiguously, what a requesting client is really looking for
>>
>> I don't by that. The server can never know exactly nor unambiguously what
>> the intent of the client is, nor should it (separation of concerns).
>>
>> Sure, of course :) ! So let me re-phrase my point - only slashes provide
>> the means for the server to *see* the *full/complete IRI* that a client may
>> wish to de-reference. In other words, with hashes, by HTTP design, the
>> client *must* strip off the hash fragment (if any) before putting the HTTP
>> request on the wire - hence the server can't ever distinguish between a
>> client asking for info on a single term (e.g., GET http://ex.co/x#Z
>> <http://ex.co/x/Z>) or a client wishing for info on the entire vocab
>> (e.g., GET http://ex.co/x# <http://ex.co/x/Z> or just GET http://ex.co/x
>> <http://ex.co/x/Z>).
>>
>>
>> Cant' help but cite the priority of consituencies remininded in
>> https://www.w3.org/TR/design-principles/#priority-of-constituencies
>>
>> Yep, exactly (we agree again!) - but for me, this is precisely an
>> argument for slashes - i.e., hashes restrict what clients can possibly get
>> back from a server (i.e., they'll always get the full vocab information
>> back), whereas slashes at least provide *the potential* for servers to
>> offer clients more flexibility and choice (i.e., info just on individual
>> terms, *or* info on the full vocab).
>> So surely giving clients *more* choice (with slashes), not less (with
>> hashes), is putting their needs first (since we can't possibly ever know
>> up-front, for any vocab, the 'needs' of all potential users (i.e., the
>> entire user base of the Web) for vocabs we publish, right!?
>>
>> Also, in a distributed setting such as the web, you can not assume that
>> all other parties will always do the right thing™.
>>
>> Again, I totally agree (who wouldn't!).
>> But all I'm looking for here is this community's opinion on whether we
>> can offer a clear, *single*, *preference* for the creators of new RDF
>> vocabularies going forward. I think we can, and that *preference* should be
>> using slashes (i.e., not a requirement, or a mandate, or something anyone
>> can ever force people to do). I just think the current state of guidance in
>> the Linked Data community is too loose and therefore off-putting for
>> newbies - i.e., "You can do either, there are pro's and con's, but it
>> doesn't really matter much, so you can just decide for yourself". Newbies
>> don't want to have to 'decide for themselves' if they can help it when
>> learning new technology - and so they'll just continue the current practice
>> of cutting-and-pasting what they see as most prevalent out there today
>> (e.g., nearly all the W3C vocab examples today), which will most likely
>> mean repeating the 'mistake' of using hashes, and thereby 'hurting' the
>> longer-term options for client/user software that may wish to have the
>> ability (at some future stage perhaps) to be able to choose for themselves
>> between term-specific or full-vocab lookups.
>>
>> Thanks again Pierre-Antoine for pushing me to think this through even
>> more thoroughly - I hope it's been somewhat useful for you (and others?) to
>> ponder on too :)
>>
>> Pat.
>>
>>
>> *Pat McBennett*, Technical Architect
>>
>> Contact  | patm@inrupt.com
>>
>> Connect | WebID <http://pmcb55.inrupt.net/profile/card#me>, GitHub
>> <https://github.com/pmcb55>
>>
>> Explore  | www.inrupt.com
>>
>>
>>
>>
>>
>> On Fri, Oct 7, 2022 at 8:07 AM Pierre-Antoine Champin <
>> pierre-antoine@w3.org> wrote:
>>
>>> On 07/10/2022 01:49, Pat McBennett wrote:
>>>
>>> Hi Martynas,
>>>
>>> Thanks for the feedback!
>>>
>>> But I think any vocabulary can just as easily support that same caching
>>> benefit with slash-based vocab namespace IRIs too, *without* having
>>> to require an initial HTTP request for *each* term - i.e., by simply
>>> returning the entire vocab on namespace IRI lookups.
>>>
>>> In the general case, when you encounter an IRI of the form
>>> http://ex.co/x/Y, you can not assume that http://ex.co/x/ will contain
>>> the definition of http://ex.co/x/Y together with other related terms.
>>> For this you need,
>>>
>>> a) the server to provide an affordance in the description of
>>> http://ex.co/x/Y pointing to http://ex.co/x/
>>> b) the client to understand and follow that affordance (e.g. by using
>>> rdfs:isDefinedBy)
>>> c) the description at http://ex.co/x/ to include some information about
>>> any term (e.g. http://ex.co/x/Z) in contains stating "there is nothing
>>> more to know about this term" (e.g. by using rdfs:isDefinedBy again)
>>> d) the client to understand that statement and refrain from fetching
>>> http://ex.co/x/Z later on
>>>
>>> So you don't get "the best of both world" as automatically as you
>>> suggest.
>>>
>>>
>>> I think QUDT is a really nice, simple example that very easily
>>> demonstrates exactly this today. It has a slash namespace IRI, and if I
>>> only ever request info on individual single vocab terms (e.g., try clicking
>>> now on `https://qudt.org/schema/qudt/CurrencyUnit`) then yes, I'd
>>> encounter that 'HTTP request per lookup' you suggest (but I'd be getting
>>> precisely what I asked for each time!).
>>>
>>> Terms of a vocabulary/ontology rarely make sense in isolation. So
>>> arguably, serving the entire vocabulary provides you with enough context to
>>> understand/use the term appropriately.
>>>
>>>
>>> But I can just as easily avoid that scenario today too by simply
>>> requesting the vocab's namespace IRI instead - e.g., try it right now by
>>> just clicking on `https://qudt.org/schema/qudt`
>>> <https://qudt.org/schema/qudt>. See - you get back the entire vocab
>>> containing all the vocab terms in a single HTTP response, which can be
>>> cached and keyed on that one namespace IRI (exactly as you would if they'd
>>> used a hash instead).
>>>
>>> And then you get "bombarded with a huge document", to quote one of your
>>> arguments against hash-IRIs. Seems to me that you get the worst of both
>>> worlds here: I had to perform two HTTP queries (one on CurrencyUnit, got
>>> get the link to the whole vocab, and one on the vocab) instead of one (with
>>> hash IRIs), and I still end up with a huge ontology. (yes, playing devil's
>>> advocate here a little)
>>>
>>> (I'm not familiar with Jena's OntDocumentManager, but I'm sure its
>>> caching code could easily be extended to take advantage of servers that
>>> choose to server up slash-based vocabularies as QUDT demonstrates is so
>>> feasible today.)
>>> So doesn't that demonstrate my whole point - i..e, that with slashes I
>>> can get the best of both worlds
>>>
>>> I don't think so. They are different trade-offs between providing
>>> targeted content vs. reducing the number of HTTP queries, and between
>>> working with dumb clients and/or dumb servers vs. requiring more
>>> coordination between them  (e.g. providing and following rdfs:isDefinedBy
>>> links).
>>>
>>> (i.e., precise term-specific HTTP responses if I want them, *and* the
>>> entire vocab in a single HTTP response if I want that too)? Using a hash
>>> completely locks me out, forever, of being able to achieve those lovely
>>> clean term-specific responses.
>>> And that's why I posit that slashes are simply 'more correct' (i.e.,
>>> since *only* slashes can ever allow servers to always know exactly,
>>> unambiguously, what a requesting client is really looking for
>>>
>>> I don't by that. The server can never know exactly nor unambiguously
>>> what the intent of the client is, nor should it (separation of concerns
>>> <https://en.wikipedia.org/wiki/Separation_of_concerns>).
>>>
>>> (i.e., a term-specific response, or an entire vocab response)), and it
>>> does so without losing any of the benefits of using a hash. (I do, by the
>>> way, totally appreciate that servers choosing to work as the QUDT servers
>>> do today might require a bit more server-side work. But my whole point is
>>> to ask this community which option they consider "more technically correct
>>> today and forever", and not "which option is easier for servers or vocab
>>> creators/hosters/editors/publishers today in the absence of any tooling
>>> support".
>>>
>>> Cant' help but cite the priority of consituencies remininded in
>>> https://www.w3.org/TR/design-principles/#priority-of-constituencies
>>>
>>> "User needs come before the needs of web page authors, which come before
>>> the needs of user agent implementors, which come before the needs of
>>> specification writers, which come before theoretical purity."
>>>
>>> Don't get me wrong, I get the point of thinking beyond the limitation of
>>> current tools. That's a valuable exercise. But practicality does also
>>> matter.
>>>
>>> Also, in a distributed setting such as the web, you can not assume that
>>> all other parties will always do the right thing™.
>>>
>>>   my 2€
>>>
>>>   pa
>>>
>>> In other words, I think that QUDT-server-like behaviour can be provided
>>> easily by tooling, which I'd personally be very happy to work on
>>> contributing :) !).
>>> Cheers,
>>> Pat.
>>>
>>> *Pat McBennett*, Technical Architect
>>>
>>> Contact  | patm@inrupt.com
>>>
>>> Connect | WebID <http://pmcb55.inrupt.net/profile/card#me>, GitHub
>>> <https://github.com/pmcb55>
>>>
>>> Explore  | www.inrupt.com
>>>
>>>
>>>
>>>
>>> On Thu, Oct 6, 2022 at 3:44 PM Martynas Jusevičius <
>>> martynas@atomgraph.com> wrote:
>>>
>>>> Hi Pat,
>>>>
>>>> For one thing, hash URIs are easier to cache because there is only one
>>>> document URL. After the initial HTTP request the whole document can be
>>>> cached with its URL as the key. All following term lookups (whose URIs
>>>> start with that URL) will hit the cached document.
>>>> Slash URIs will require an initial HTTP request for *each* term and
>>>> will result in a cache entry per term.
>>>>
>>>> This is based on my experience with Jena's OntDocumentManager.
>>>>
>>>> Martynas
>>>> atomgraph.com
>>>>
>>>>
>>>> On Thu, Oct 6, 2022 at 4:15 PM Pat McBennett <patm@inrupt.com> wrote:
>>>>
>>>>> So (I think!) I know all the pro's and con's of using either
>>>>> a trailing slash or a trailing hash for vocab namespace IRIs. Basically it
>>>>> boils down to hashes meaning you'll always get info on all the terms in a
>>>>> vocabulary, even if you only want info for one specific term, whereas using
>>>>> a slash means I can always get just the info for any specific, individual
>>>>> term I request.
>>>>>
>>>>> Note: using slashes provides the ability to get the best of both
>>>>> worlds - i.e., small responses when explicitly asking for info on just one
>>>>> term, but if you want info for all the terms in one HTTP response, then
>>>>> just serve up that complete vocab response when the base namespace IRI
>>>>> itself is dereferenced.
>>>>>
>>>>> Here's a nice simple illustration of the basic difference:
>>>>> - Slash: QUDT's 'CurrencyUnit' term (i.e., click on '
>>>>> https://qudt.org/schema/qudt/CurrencyUnit') and you get a nice clean,
>>>>> concise, and precise set of info on just the one term you asked for -
>>>>> lovely!
>>>>>
>>>>> - Hash: DPV's 'JointDataControllers' (i.e., click on '
>>>>> https://w3id.org/dpv#JointDataControllers') and you get bombarded
>>>>> with a huge document, with a daunting Table of Contents on the left, and
>>>>> info on hundreds of other terms that I didn't ask for, and so had no
>>>>> interest in whatsoever (don't get me wrong - this is fantastically detailed
>>>>> and potentially very useful information, but it's simply not what I asked
>>>>> for!).
>>>>>
>>>>> So based on the greater flexibility and future-proofing of using slash
>>>>> (i.e., it offers the best of both worlds, whereas hash is forever limited),
>>>>> I've become firmly of the opinion that slashes are just 'better' that
>>>>> hashes, and in fact are simply 'more correct' (i.e., all IRIs should be
>>>>> uniquely dereferencable).
>>>>>
>>>>> I also think the distinction is critically important when creating
>>>>> vocabularies intended for widespread and long-lasting use (such as the DPV
>>>>> vocab above). For throw-away or pet projects, sure, it doesn't really
>>>>> matter (yet even then, I still think slashes are the 'more correct' option).
>>>>>
>>>>> I know that the convention from the W3C has tended to be to use
>>>>> hashes, but I think in hindsight that was a mistake, and that the advice
>>>>> from the Semantic Web community as a whole should now be to adopt slashes
>>>>> consistently for all new vocabularies. (And it's not like using slash has
>>>>> no precedent - major 'authoritative' vocabs like QUDT, Schema.org, gist,
>>>>> SOSA, SSN, (even the venerable FOAF!) all use slash).
>>>>>
>>>>> I'd love to hear this group's thoughts. (For reference, I did ask the
>>>>> gist community if they recorded their discussions around their decision (in
>>>>> 2019) to formally switch gist from hash to slash (here
>>>>> <https://github.com/semanticarts/gist/issues/725>), but it seems they
>>>>> weren't recorded, and I've also raised the issue with the DPV group
>>>>> directly too (here <https://github.com/w3c/dpv/issues/53>)).
>>>>>
>>>>> Cheers,
>>>>>
>>>>> Pat.
>>>>>
>>>>> *Pat McBennett*, Technical Architect
>>>>>
>>>>> Contact  | patm@inrupt.com
>>>>>
>>>>> Connect | WebID <http://pmcb55.inrupt.net/profile/card#me>, GitHub
>>>>> <https://github.com/pmcb55>
>>>>>
>>>>> Explore  | www.inrupt.com
>>>>>
>>>>>
>>>>>
>>>>> This e-mail, and any attachments thereto, is intended only for use by
>>>>> the addressee(s) named herein and may contain legally privileged,
>>>>> confidential and/or proprietary information. If you are not the intended
>>>>> recipient of this e-mail (or the person responsible for delivering this
>>>>> document to the intended recipient), please do not disseminate, distribute,
>>>>> print or copy this e-mail, or any attachment thereto. If you have received
>>>>> this e-mail in error, please respond to the individual sending the message,
>>>>> and permanently delete the email.
>>>>
>>>>
>>> This e-mail, and any attachments thereto, is intended only for use by
>>> the addressee(s) named herein and may contain legally privileged,
>>> confidential and/or proprietary information. If you are not the intended
>>> recipient of this e-mail (or the person responsible for delivering this
>>> document to the intended recipient), please do not disseminate, distribute,
>>> print or copy this e-mail, or any attachment thereto. If you have received
>>> this e-mail in error, please respond to the individual sending the message,
>>> and permanently delete the email.
>>>
>>>
> This e-mail, and any attachments thereto, is intended only for use by the
> addressee(s) named herein and may contain legally privileged, confidential
> and/or proprietary information. If you are not the intended recipient of
> this e-mail (or the person responsible for delivering this document to the
> intended recipient), please do not disseminate, distribute, print or copy
> this e-mail, or any attachment thereto. If you have received this e-mail in
> error, please respond to the individual sending the message, and
> permanently delete the email.
>
>

-- 
This e-mail, and any attachments thereto, is intended only for use by the 
addressee(s) named herein and may contain legally privileged, confidential 
and/or proprietary information. If you are not the intended recipient of 
this e-mail (or the person responsible for delivering this document to the 
intended recipient), please do not disseminate, distribute, print or copy 
this e-mail, or any attachment thereto. If you have received this e-mail in 
error, please respond to the individual sending the message, and 
permanently delete the email.
Received on Tuesday, 11 October 2022 08:50:08 UTC