Re: (Lost in the noise perhaps - so asking again) - Is a trailing slash 'better' than a trailing hash for vocabs namespace IRIs? from Antoine Zimmermann on 2022-11-15 (semantic-web@w3.org from November 2022)

From: Antoine Zimmermann <antoine.zimmermann@emse.fr>
Date: Tue, 15 Nov 2022 16:16:24 +0100
To: semantic-web@w3.org
Cc: patm@inrupt.com
Message-ID: <d3eb9f71-5adb-ec7a-d299-a47a6fc8958d@emse.fr>
Dear semwebbers,


Sorry to follow up on this conversation a little late but I'd like to 
add a few things, hopefully worth more than 2 cents.

Overall, +1 to recommend, as a best practice, using a slash-based 
namespace for vocabs.

A few comments regarding:
  1. number of HTTP lookups
  2. simplicity
  3. "ontology terms don't mean much outside the context of the whole 
ontology"
  4. Using hashes for other things
  5. httpRange14
  6. modularisation

Regarding 1., real life experiments would have to be made because there 
are good reasons to think that, from a network perspective as a whole, 
slash IRIs are not an issue at all. In most cases, applications know 
what term to look for in data, and already know the ontologies that 
correspond to those terms. Only very rarely would an application crawl 
the Web from data documents to term documents to ontology documents. 
This would be very inefficient in most cases, and telling the world to 
use slashes rather than hashes (or single-term documentation rather than 
full ontology documentation) would only marginally affect this 
inefficiency, if at all (IMHO). With hash-based namespaces, there is 
also a potential for inefficient use of network, e.g. when caching is 
not possible for some reason.

Regarding 2., yes, hash-based namespaces are simpler to setup and 
publish. But they are difficult to work with in the long term. In 
professional projects, there are tons of things that are cumbersome if 
applied to simple personal tasks, such as setting up a version control 
system for every piece of code or document one is authoring; or applying 
full-fledged collaborative development methodology for your hobby short 
novel writing. The burden of setting up a server with proper URI 
redirection is minuscule if you think of ontology development as a type 
of professional software project. It seems to me that justifying 
hash-based namespaces based on its simplicity is aiming at the lowest 
possible quality requirement.

Regarding 3., why would this be a problem for ontologies and not for 
other kinds of linked data or knowledge graphs? In object oriented 
software development, a single class does not mean anything in 
isolation, yet most often each class is defined in a separate file. If 
you access these files on the Web (say, via Github), you don't have the 
context outside the class, and your class cannot even function without 
the other classes it relates to. Why wouldn't it be a problem too if it 
is a problem for ontologies? But in reality, it is not a problem because 
you can always download the whole package, the same as you can download 
the whole ontology from its ontology IRI. It should be rather easy to be 
directed to the whole ontology file when necessary, and yet allow one to 
simply get a documentation of a single term.

Regarding 4., with slash-based namespaces, hashes can be used for other 
useful things. E.g., there could be a fragment of a term specification 
that provides usage examples like http://onto.org/Term#example (this is 
done in schema.org). There could be a section about history (e.g. when 
the term was added to the vocab, version info about the term itself 
http://onto.org/Term#history) or metadata (who created the term 
http://onto.org/Term#meta, related Github issues and discussions, etc.).

Regarding 5., if a term like http://myonto.org/Person denotes a class of 
people, then it certainly isn't an information resource. However, if GET 
http://myonto.org/Person responds with a 200 OK, then, by httpRange14 
resolution, the IRI must denote an information resource. A solution is 
to redirect to another IRI, say http://myonto.org/doc/Person, but this 
means yet another HTTP lookup. Instead, one could use 
http://myonto.org/Person# as an identifier for the term, and 
http://myonto.org/Person as an identifier for the RDF document that 
defines the term. Then it's using the best of both worlds: a slash-based 
namespace with a hash IRI.

Regarding 6., with slash IRIs, the ontology can be modularised while 
preserving a single namespace. There can be modules 
http://myonto.org/module1 and http://myonto.org/module2 that provide 
each a distinct ontology that use the same namespace http://myonto.org/ 
for all terms, and, assuming a single slash-based namespace ont:

#In ont:Term1 file:
ont:Term1 rdfs:isDefinedBy ont:module1 .

#In ont:Term2 file:
ont:Term2 rdfs:isDefinedBy ont:module2 .

It is also possible to redirect ont:Term1 to module1, and ont:Term2 to 
module2, if the ontology owner prefers to serve the whole module 
instead. Then there can be a global ontology document:

ont: a owl:Ontology;
  owl:imports ont:module1, ont:module2 .

This last option was an idea by my colleague Maxime Lefrançois who 
implemented it in the Smart Energy Aware Systems ontology: 
https://w3id.org/seas/


Given the many advantages I see, with tiny drawbacks, I can't understand 
how not recommending slash-based namespaces for vocabs be a tenable 
position.


Best,
--AZ

Le 06/10/2022 à 16:10, Pat McBennett a écrit :
> So (I think!) I know all the pro's and con's of using either a trailing
> slash or a trailing hash for vocab namespace IRIs. Basically it boils down
> to hashes meaning you'll always get info on all the terms in a vocabulary,
> even if you only want info for one specific term, whereas using a slash
> means I can always get just the info for any specific, individual term I
> request.
> 
> Note: using slashes provides the ability to get the best of both worlds -
> i.e., small responses when explicitly asking for info on just one term, but
> if you want info for all the terms in one HTTP response, then just serve up
> that complete vocab response when the base namespace IRI itself is
> dereferenced.
> 
> Here's a nice simple illustration of the basic difference:
> - Slash: QUDT's 'CurrencyUnit' term (i.e., click on '
> https://qudt.org/schema/qudt/CurrencyUnit') and you get a nice clean,
> concise, and precise set of info on just the one term you asked for -
> lovely!
> 
> - Hash: DPV's 'JointDataControllers' (i.e., click on '
> https://w3id.org/dpv#JointDataControllers') and you get bombarded with a
> huge document, with a daunting Table of Contents on the left, and info on
> hundreds of other terms that I didn't ask for, and so had no interest in
> whatsoever (don't get me wrong - this is fantastically detailed and
> potentially very useful information, but it's simply not what I asked for!).
> 
> So based on the greater flexibility and future-proofing of using slash
> (i.e., it offers the best of both worlds, whereas hash is forever limited),
> I've become firmly of the opinion that slashes are just 'better' that
> hashes, and in fact are simply 'more correct' (i.e., all IRIs should be
> uniquely dereferencable).
> 
> I also think the distinction is critically important when creating
> vocabularies intended for widespread and long-lasting use (such as the DPV
> vocab above). For throw-away or pet projects, sure, it doesn't really
> matter (yet even then, I still think slashes are the 'more correct' option).
> 
> I know that the convention from the W3C has tended to be to use hashes, but
> I think in hindsight that was a mistake, and that the advice from the
> Semantic Web community as a whole should now be to adopt slashes
> consistently for all new vocabularies. (And it's not like using slash has
> no precedent - major 'authoritative' vocabs like QUDT, Schema.org, gist,
> SOSA, SSN, (even the venerable FOAF!) all use slash).
> 
> I'd love to hear this group's thoughts. (For reference, I did ask the gist
> community if they recorded their discussions around their decision (in
> 2019) to formally switch gist from hash to slash (here
> <https://github.com/semanticarts/gist/issues/725>), but it seems they
> weren't recorded, and I've also raised the issue with the DPV group
> directly too (here <https://github.com/w3c/dpv/issues/53>)).
> 
> Cheers,
> 
> Pat.
> 
> *Pat McBennett*, Technical Architect
> 
> Contact  | patm@inrupt.com
> 
> Connect | WebID <http://pmcb55.inrupt.net/profile/card#me>, GitHub
> <https://github.com/pmcb55>
> 
> Explore  | www.inrupt.com
> 

-- 
Antoine Zimmermann
École des Mines de Saint-Étienne
158 cours Fauriel
CS 62362
42023 Saint-Étienne Cedex 2
France
Tél:+33(0)4 77 49 97 02
http://www.emse.fr/~zimmermann/
Received on Tuesday, 15 November 2022 15:17:10 UTC