Re: Schema for online dictionary and glossary?

On  2015-Apr-13, at 16:57, Peter Krauss <ppkrauss@gmail.com> wrote:

> Ok, good points... I need some lines to try to explain why I not agree with all, and why some are also good arguments for URNs.
> The "curatorial authority" is perhaps the most important (!), but first we must avoid historic "URN preconceptions”.

There’s nothing historic about my statements - I used to think URNs were the neatest thing since the introduction of digital watches. It turns out I was wrong.

> We must see "microservices" like DOI (see its http://dx.doi.org fast resolver) as an URN, even DOI not promoting itself as an URN.
> The "URN-resolver microservice" architecture is also a valid solution for LANs, not only for "all Web”.

The resolver architecture is only really valid in two situations:

1 - You can’t encode the authority/dereference point into the URI because of historical/architectural restrictions (see, e.g., ISBNs) or because it’s nonsensical (e.g., the concept of the year 2014 CE isn’t maintained by anybody, although arguably one might consider it an identifier within the ISO-standardised calendar, so...)

2 - You can have multiple entirely equivalent authorities which operate independently

3 - You can’t find a domain name, operated by anybody (including yourself), which is considered to have sufficient longevity to identify an authority.

Even (2) is shaky, because whomever is operating the resolver which maintains the list of those equivalent authorities is a de facto authority.

(3) is *really* shaky, because it basically assumes that human beings are incapable of managing even the most basic pieces of Internet-facing infrastructure with any degree of longevity, even with the option of outsourcing it to somebody who ought to be capable; it also works on the basis that you know enough not to trust yourself (or just follow the crowd, as often happens with DOI), but don’t know enough to find somebody who you can trust - all of which is a bit weird for the sake of $10/yr. The biggest problem with (3) is that it pretty much says that there’s no point doing any long-term planning around the Internet, and so a lot of us should probably give up and go home.

Barring the situations where (1) applies, and even taking into account (2) and (3), a URN offers minimal clear benefits over an HTTP URI: everything you can do with a URN, you can do with an HTTP URI, but the reverse is not the case. Even the fact that with a URN it is *required* to specify resolution mechanics in the consuming application can be replicated with HTTP URIs if one really must.

> In recent years the URN concept came close to be extinguished (!)...  Nowadays exist some "revival movements". We must avoid some old work-hypothesis before to compare/analyse in the context of SchemaOrg applications.
> 
> How (and or what?) you will use an SchemaOrg/dictionaryWord ?
> 
> == Curatorial authority ==
> 
> There are milhões of law-documents in Brasil, but all have a good and transparent ID in
>   http://www.lexml.gov.br/

Similarly, in the UK we have stable identifiers within the http://parliament.uk/ namespace.

> LexML and others are using the URN LEX schema,
>   https://en.wikipedia.org/wiki/Lex_(URN)

Right.

> where the "curatorial authority" is an explicit element in the schema.

Yes - I understand that a URN can include an authority if it’s defined to do so. That’s the most architecturally problematic aspect, to be honest!

If you’ve defined a URN schema which includes an authority, and putting aside historical baggage, which are many and varied… what is the benefit of a lex: URN including an authority over an HTTP URI which also includes an authority and can refer to the same law?

(Short answer: there is none)
> 
> == Using it with SchemaOrg ==
> 
> A summary of what is showed in the issue#405...
> 
> ... Imagine that we add a property urnResolverURL to schema_org_rdfa 
> to use in the context of "linking URN-like values".
> Examples in span tags with URN values,
> 
>   <span property="urnResolverURL" 
>         value="http://www.lexml.gov.br/urn/urn:lex:br:federal:lei:2006-08-07;11340"
>    >Law Maria da Penha</span>
> 
>   <span property="urnResolverURL" 
>         value="http://dx.MyDic.org/urn:mydic:en-GB:hello"
>    >hello</span>

All of this is straightforward - but it doesn’t explain why those things would have URNs as their identifiers *in the first place*, it merely defines how to resolve those things which already do.

For example, as a transition mechanism, http://www.lexml.gov.br/urn/urn:lex:br:federal:lei:2006-08-07;11340 could, when resolved, include an RDFa triple which states that its subject is the same as urn:lex:br:federal:lei:2006-08-07;11340, and then we could all refer to http://www.lexml.gov.br/urn/urn:lex:br:federal:lei:2006-08-07;11340 and quietly ignore the fact that as an implementation detail it happens to have a URN encoded into its authority-defined path part.

Similarly, one could just refer to <http://mydic.org/en-gb/hello> - there are really no reasons that I can fathom for <urn:mydic:en-GB:hello> to be a preferable identifier, except for being a couple of characters shorter.

By doing this, no resolver URL is required, because things are identified with HTTP URIs, and the web works like the web still.

Meanwhile, stick to URNs for things which *can’t* be resolved, because it makes no sense to do so outside of internal implementation details (for example, <urn:hash::sha256:5891b5b522d5df086d0ff0b110fbd9d21bb4fc7163af34d08286a2e846f6be03> is a potentially very useful kind of URN which makes little sense to be expressed through some other scheme in its abstract state; I might have an internal resolver which maps hashes to files in a content store which I don’t expose to anybody, but nobody, including implementations of schema.org, should need to see that).

M.

-- 
Mo McRoberts - Chief Technical Architect - Archives & Digital Public Space,
Zone 2.12, BBC Scotland, 40 Pacific Quay, Glasgow G51 1DA.

Inside the BBC? My movements this week: http://neva.li/where-is-mo

Received on Monday, 13 April 2015 16:31:34 UTC