W3C home > Mailing lists > Public > public-vocabs@w3.org > April 2015

Re: Schema for online dictionary and glossary?

From: Peter Krauss <ppkrauss@gmail.com>
Date: Tue, 14 Apr 2015 12:06:36 -0300
Message-ID: <CAHEREttC1=x1gnkcu=1c9PySAnOLq8CBMbUi0vjhWfWY0DbG_g@mail.gmail.com>
To: Mo McRoberts <Mo.McRoberts@bbc.co.uk>
Cc: "chaals@yandex-team.ru" <chaals@yandex-team.ru>, Bernard Vatant <bernard.vatant@mondeca.com>, Dan Brickley <danbri@google.com>, Anastasia Baryshnikova <asia.baryshnikova@gmail.com>, "public-vocabs@w3.org" <public-vocabs@w3.org>
2015-04-13 13:31 GMT-03:00 Mo McRoberts <Mo.McRoberts@bbc.co.uk>:

> On  2015-Apr-13, at 16:57, Peter Krauss <ppkrauss@gmail.com> wrote:
> > Ok, good points... I need some lines to try to explain why I not agree
> with all, and why some are also good arguments for URNs.
> > The "curatorial authority" is perhaps the most important (!), but first
> we must avoid historic "URN preconceptions”.
> There’s nothing historic about my statements - I used to think URNs were
> the neatest thing since the introduction of digital watches. It turns out I
> was wrong.

:-)  yes, I was also in this floating admiration along the years...

> > We must see "microservices" like DOI (see its http://dx.doi.org fast
> resolver) as an URN, even DOI not promoting itself as an URN.
> > The "URN-resolver microservice" architecture is also a valid solution
> for LANs, not only for "all Web”.
> The resolver architecture is only really valid in two situations:
> 1 - You can’t encode the authority/dereference point into the URI because
> of historical/architectural restrictions (see, e.g., ISBNs) or because it’s
> nonsensical (e.g., the concept of the year 2014 CE isn’t maintained by
> anybody, although arguably one might consider it an identifier within the
> ISO-standardised calendar, so...)
> 2 - You can have multiple entirely equivalent authorities which operate
> independently
> 3 - You can’t find a domain name, operated by anybody (including
> yourself), which is considered to have sufficient longevity to identify an
> authority.
> Even (2) is shaky, because whomever is operating the resolver which
> maintains the list of those equivalent authorities is a de facto authority.
> (3) is *really* shaky, because it basically assumes that human beings are
> incapable of managing even the most basic pieces of Internet-facing
> infrastructure with any degree of longevity, even with the option of
> outsourcing it to somebody who ought to be capable; it also works on the
> basis that you know enough not to trust yourself (or just follow the crowd,
> as often happens with DOI), but don’t know enough to find somebody who you
> can trust - all of which is a bit weird for the sake of $10/yr. The biggest
> problem with (3) is that it pretty much says that there’s no point doing
> any long-term planning around the Internet, and so a lot of us should
> probably give up and go home.
> Barring the situations where (1) applies, and even taking into account (2)
> and (3), a URN offers minimal clear benefits over an HTTP URI: everything
> you can do with a URN, you can do with an HTTP URI, but the reverse is not
> the case. Even the fact that with a URN it is *required* to specify
> resolution mechanics in the consuming application can be replicated with
> HTTP URIs if one really must.
Perhaps, to avoid "pollution" in this list (public-vocabs@w3.org),  we will
need a more private discussion about visions, architectures, policies, and
"community constraints" about URN adoption in a community.
I thing there are agree-points and, after discussion, we can show (back
here) some consensual views.

> > In recent years the URN concept came close to be extinguished (!)...
> Nowadays exist some "revival movements". We must avoid some old
> work-hypothesis before to compare/analyse in the context of SchemaOrg
> applications.
> >
> > How (and or what?) you will use an SchemaOrg/dictionaryWord ?
> >
> > == Curatorial authority ==
> >
> > There are milhões of law-documents in Brasil, but all have a good and
> transparent ID in
> >   http://www.lexml.gov.br/
> Similarly, in the UK we have stable identifiers within the
> http://parliament.uk/ namespace.

Hum... yes, the correct "authority name" is (I not sure) the Parliament,
but I think that the "Brasilian LexML portal *analog*" in UK is
for Europe is http://europa.eu/eu-law/legislation/

> > LexML and others are using the URN LEX schema,
> >   https://en.wikipedia.org/wiki/Lex_(URN)
> Right.


> > where the "curatorial authority" is an explicit element in the schema.
> Yes - I understand that a URN can include an authority if it’s defined to
> do so. That’s the most architecturally problematic aspect, to be honest!

Hum.. "most architecturally problematic aspect"?
I not see any problem in concrete use as www.lexml.gov.br... But we can
discuss with the  resolver architecture (first private to not polute...
them back with perhaps some consensual views and some good problems to
expose to the list)

> If you’ve defined a URN schema which includes an authority, and putting
> aside historical baggage, which are many and varied… what is the benefit of
> a lex: URN including an authority over an HTTP URI which also includes an
> authority and can refer to the same law?

About concrete use, see DOI example: the "only URN" string is used in a
context where URNs are recognized. Some algorithms can transform the string
into URL, as PubMed Central <http://www.ncbi.nlm.nih.gov/pmc/> do with the
DOIs of the scientific articles.
When you not have a context, you can use a template or another mechanism to
expose the URNs as URLs, or do it in the original content: any, original or
not, URL can be expressed back as a URN string because the URL prefix is a
knowed standard (by used community application context).
Example: http://dx.doi.org/10.10.1038/nphys1170 is an alternative
representation to *"urn:doi:10.10.1038/nphys1170"*
because "dx.doi.org" is a standard... all URNs need two "community
standards/consensus", the *URN Schema* and the *URN resolver* (a persistent
and stable URL).

> (Short answer: there is none)
> >
> > == Using it with SchemaOrg ==
> >
> > A summary of what is showed in the issue#405...
> >
> > ... Imagine that we add a property urnResolverURL to schema_org_rdfa
> > to use in the context of "linking URN-like values".
> > Examples in span tags with URN values,
> >
> >   <span property="urnResolverURL"
> >         value="
> http://www.lexml.gov.br/urn/urn:lex:br:federal:lei:2006-08-07;11340"
> >    >Law Maria da Penha</span>
> >
> >   <span property="urnResolverURL"
> >         value="http://dx.MyDic.org/urn:mydic:en-GB:hello"
> >    >hello</span>
> All of this is straightforward - but it doesn’t explain why those things
> would have URNs as their identifiers *in the first place*, it merely
> defines how to resolve those things which already do.
> For example, as a transition mechanism,
> http://www.lexml.gov.br/urn/urn:lex:br:federal:lei:2006-08-07;11340
> could, when resolved, include an RDFa triple which states that its subject
> is the same as urn:lex:br:federal:lei:2006-08-07;11340, and then we could
> all refer to
> http://www.lexml.gov.br/urn/urn:lex:br:federal:lei:2006-08-07;11340 and
> quietly ignore the fact that as an implementation detail it happens to have
> a URN encoded into its authority-defined path part.
> Similarly, one could just refer to <http://mydic.org/en-gb/hello> - there
> are really no reasons that I can fathom for <urn:mydic:en-GB:hello> to be a
> preferable identifier, except for being a couple of characters shorter.

Yes, but all that you say, I say in a summary that you deleted,
"Of course, more sophisticated constructors are possible with Schema Org
(to declare only URN strings)"

For "more sophisticated constructors" we can understand another SchemaOrg
properties and classes, but it only pollutes the discussion... Well,
perhaps it is didactic, lets the same example with "sophistications":

*  <div vocab="http://schema.org/ <http://schema.org/>" typeof="URNs" *
*     property="urnResolverBaseURL" value="http://www.lexml.gov.br/urn/
*  >*
*    <span property="urn"*
*          value="urn:lex:br:federal:lei:2006-08-07;11340"*
*     >Federal Basilian Law number 11340 - Maria da Penha</span>*

*    <span property="urn"*
*         value="urn:lex:it:stato:legge:2003-09-21;456"*
*     >Federal Italian Law number 456</span>*
*  </div>*

 ... it is illustrative only, perhaps not the better ... there are another
alternatives to express in SchemaOrg's RDFa.
(notice, with bold the authorities; and that the URNs LEX are transparent
identifiers, they aren't a kind of hash)

> By doing this, no resolver URL is required, because things are identified
> with HTTP URIs, and the web works like the web still.
See this "tutorial" to understand the point that diferenciate the
"usability" of URLs and URNs at the Web:
   "Give Things Names, Not Just Addresses",

> Meanwhile, stick to URNs for things which *can’t* be resolved, because it
> makes no sense to do so outside of internal implementation details (for
> example,
> <urn:hash::sha256:5891b5b522d5df086d0ff0b110fbd9d21bb4fc7163af34d08286a2e846f6be03>
> is a potentially very useful kind of URN which makes little sense to be
> expressed through some other scheme in its abstract state; I might have an
> internal resolver which maps hashes to files in a content store which I
> don’t expose to anybody, but nobody, including implementations of
> schema.org, should need to see that).
Yes, is another context, and perhaps another good application.


> M.
> --
> Mo McRoberts - Chief Technical Architect - Archives & Digital Public Space,
> Zone 2.12, BBC Scotland, 40 Pacific Quay, Glasgow G51 1DA.
> Inside the BBC? My movements this week: http://neva.li/where-is-mo
Received on Tuesday, 14 April 2015 15:07:09 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 17:49:40 UTC