W3C home > Mailing lists > Public > public-vocabs@w3.org > April 2015

Re: Schema for online dictionary and glossary?

From: Mo McRoberts <Mo.McRoberts@bbc.co.uk>
Date: Mon, 13 Apr 2015 16:31:02 +0000
To: Peter Krauss <ppkrauss@gmail.com>
CC: "chaals@yandex-team.ru" <chaals@yandex-team.ru>, Bernard Vatant <bernard.vatant@mondeca.com>, Dan Brickley <danbri@google.com>, "Anastasia Baryshnikova" <asia.baryshnikova@gmail.com>, "public-vocabs@w3.org" <public-vocabs@w3.org>
Message-ID: <6AE70D0F-BE27-4AA3-923E-0B15BA3DFA2A@bbc.co.uk>

On  2015-Apr-13, at 16:57, Peter Krauss <ppkrauss@gmail.com> wrote:

> Ok, good points... I need some lines to try to explain why I not agree with all, and why some are also good arguments for URNs.
> The "curatorial authority" is perhaps the most important (!), but first we must avoid historic "URN preconceptionsĒ.

Thereís nothing historic about my statements - I used to think URNs were the neatest thing since the introduction of digital watches. It turns out I was wrong.

> We must see "microservices" like DOI (see its http://dx.doi.org fast resolver) as an URN, even DOI not promoting itself as an URN.
> The "URN-resolver microservice" architecture is also a valid solution for LANs, not only for "all WebĒ.

The resolver architecture is only really valid in two situations:

1 - You canít encode the authority/dereference point into the URI because of historical/architectural restrictions (see, e.g., ISBNs) or because itís nonsensical (e.g., the concept of the year 2014 CE isnít maintained by anybody, although arguably one might consider it an identifier within the ISO-standardised calendar, so...)

2 - You can have multiple entirely equivalent authorities which operate independently

3 - You canít find a domain name, operated by anybody (including yourself), which is considered to have sufficient longevity to identify an authority.

Even (2) is shaky, because whomever is operating the resolver which maintains the list of those equivalent authorities is a de facto authority.

(3) is *really* shaky, because it basically assumes that human beings are incapable of managing even the most basic pieces of Internet-facing infrastructure with any degree of longevity, even with the option of outsourcing it to somebody who ought to be capable; it also works on the basis that you know enough not to trust yourself (or just follow the crowd, as often happens with DOI), but donít know enough to find somebody who you can trust - all of which is a bit weird for the sake of $10/yr. The biggest problem with (3) is that it pretty much says that thereís no point doing any long-term planning around the Internet, and so a lot of us should probably give up and go home.

Barring the situations where (1) applies, and even taking into account (2) and (3), a URN offers minimal clear benefits over an HTTP URI: everything you can do with a URN, you can do with an HTTP URI, but the reverse is not the case. Even the fact that with a URN it is *required* to specify resolution mechanics in the consuming application can be replicated with HTTP URIs if one really must.

> In recent years the URN concept came close to be extinguished (!)...  Nowadays exist some "revival movements". We must avoid some old work-hypothesis before to compare/analyse in the context of SchemaOrg applications.
> How (and or what?) you will use an SchemaOrg/dictionaryWord ?
> == Curatorial authority ==
> There are milhűes of law-documents in Brasil, but all have a good and transparent ID in
>   http://www.lexml.gov.br/

Similarly, in the UK we have stable identifiers within the http://parliament.uk/ namespace.

> LexML and others are using the URN LEX schema,
>   https://en.wikipedia.org/wiki/Lex_(URN)


> where the "curatorial authority" is an explicit element in the schema.

Yes - I understand that a URN can include an authority if itís defined to do so. Thatís the most architecturally problematic aspect, to be honest!

If youíve defined a URN schema which includes an authority, and putting aside historical baggage, which are many and variedÖ what is the benefit of a lex: URN including an authority over an HTTP URI which also includes an authority and can refer to the same law?

(Short answer: there is none)
> == Using it with SchemaOrg ==
> A summary of what is showed in the issue#405...
> ... Imagine that we add a property urnResolverURL to schema_org_rdfa 
> to use in the context of "linking URN-like values".
> Examples in span tags with URN values,
>   <span property="urnResolverURL" 
>         value="http://www.lexml.gov.br/urn/urn:lex:br:federal:lei:2006-08-07;11340"
>    >Law Maria da Penha</span>
>   <span property="urnResolverURL" 
>         value="http://dx.MyDic.org/urn:mydic:en-GB:hello"
>    >hello</span>

All of this is straightforward - but it doesnít explain why those things would have URNs as their identifiers *in the first place*, it merely defines how to resolve those things which already do.

For example, as a transition mechanism, http://www.lexml.gov.br/urn/urn:lex:br:federal:lei:2006-08-07;11340 could, when resolved, include an RDFa triple which states that its subject is the same as urn:lex:br:federal:lei:2006-08-07;11340, and then we could all refer to http://www.lexml.gov.br/urn/urn:lex:br:federal:lei:2006-08-07;11340 and quietly ignore the fact that as an implementation detail it happens to have a URN encoded into its authority-defined path part.

Similarly, one could just refer to <http://mydic.org/en-gb/hello> - there are really no reasons that I can fathom for <urn:mydic:en-GB:hello> to be a preferable identifier, except for being a couple of characters shorter.

By doing this, no resolver URL is required, because things are identified with HTTP URIs, and the web works like the web still.

Meanwhile, stick to URNs for things which *canít* be resolved, because it makes no sense to do so outside of internal implementation details (for example, <urn:hash::sha256:5891b5b522d5df086d0ff0b110fbd9d21bb4fc7163af34d08286a2e846f6be03> is a potentially very useful kind of URN which makes little sense to be expressed through some other scheme in its abstract state; I might have an internal resolver which maps hashes to files in a content store which I donít expose to anybody, but nobody, including implementations of schema.org, should need to see that).


Mo McRoberts - Chief Technical Architect - Archives & Digital Public Space,
Zone 2.12, BBC Scotland, 40 Pacific Quay, Glasgow G51 1DA.

Inside the BBC? My movements this week: http://neva.li/where-is-mo
Received on Monday, 13 April 2015 16:31:34 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 17:49:40 UTC