W3C home > Mailing lists > Public > public-vocabs@w3.org > April 2015

Re: Schema for online dictionary and glossary?

From: Peter Krauss <ppkrauss@gmail.com>
Date: Mon, 13 Apr 2015 12:57:57 -0300
Message-ID: <CAHEREttz7Lk_ip1gw0cRR1jFAZnjC0kqCTExSH_D5Eb9kfNh9g@mail.gmail.com>
To: Mo McRoberts <Mo.McRoberts@bbc.co.uk>
Cc: "chaals@yandex-team.ru" <chaals@yandex-team.ru>, Bernard Vatant <bernard.vatant@mondeca.com>, Dan Brickley <danbri@google.com>, Anastasia Baryshnikova <asia.baryshnikova@gmail.com>, "public-vocabs@w3.org" <public-vocabs@w3.org>
Ok, good points... I need some lines to try to explain why I not agree with
all, and why some are also good arguments for URNs.
The "curatorial authority" is perhaps the most important (!), but first we
must avoid historic "URN preconceptions".

We must see "microservices" like DOI (see its http://dx.doi.org fast
resolver) as an URN, even DOI not promoting itself as an URN.
The "URN-resolver microservice" architecture is also a valid solution for
LANs, not only for "all Web".
In recent years the URN concept came close to be extinguished (!)...
Nowadays exist some "revival movements". We must avoid some old
work-hypothesis before to compare/analyse in the context of SchemaOrg
applications.

How (and or what?) you will use an SchemaOrg/dictionaryWord ?

*== Curatorial authority ==*

There are milhões of law-documents in Brasil, but all have a good and
transparent ID in
  http://www.lexml.gov.br/
LexML and others are using the URN LEX schema,
  https://en.wikipedia.org/wiki/Lex_(URN)

where the "curatorial authority" is an explicit element in the schema.
PS: not confuse the "lex" of LexML (that is about law) with the "lex" about
"lexical services".

*== URN contextualization example ==*
Moving the URN LEX (of law) example to the dictionary context: suppose
another (local) URN schema.

Suppose, as I show before, a "My Dictionary" schema, something like
*    "urn:mydic" <lang> ":" <word>*
and a deault response in the "URN MyDic Resolver" URL (like LexML or DOI
resolver),
that returns a standard catalographic entry page, or redirects to
respective Wiktionary link.
So, to express the URN of the (international) english word "mum" we need to
express the URN

*    urn:mydic:en:mum*

and if we define a URL that is the "urn:mydic resolver", the URL will
return, p.ex.
    https://en.wiktionary.org/wiki/mum
(in the old RFC2169 jargon it is an "N2U" resolver method or "URN to URL")

Another example,

    *urn:mydic:pt:mamãe*  will retrieve
https://pt.wiktionary.org/wiki/mam%C3%A3e

About authority? ":en" or ":pt" with no suffix are the default authority
(in the example Wiktionary).
To specify non-default authority, you can add the suffix with the authority
name,

*    urn:mydic:en;merriam-webster:hello*
       will retrieve  http://www.merriam-webster.com/dictionary/hello

*    urn:mydic:pt;dicionarioinformal:mamãe  *
       will retrieve  http://www.dicionarioinformal.com.br/mam%C3%A3e/

About "US vs GB"? It is not an authority problem, is, as you showed, a
question of (ISO standard) language choice.

*    urn:mydic:en-GB:hello*
*    urn:mydic:en-US:hello*

*== Using it with SchemaOrg ==*

A summary of what is showed in the *issue#405*...

... Imagine that we add a property urnResolverURL to schema_org_rdfa
to use in the context of "linking URN-like values".
Examples in span tags with URN values,

*  <span property="urnResolverURL" *
*
value="http://www.lexml.gov.br/urn/urn:lex:br:federal:lei:2006-08-07;11340
<http://www.lexml.gov.br/urn/urn:lex:br:federal:lei:2006-08-07;11340>"*
*   >Law Maria da Penha</span>*

*  <span property="urnResolverURL" *
*        value="http://dx.MyDic.org/urn:mydic:en-GB:hello
<http://dx.MyDic.org/urn:mydic:en-GB:hello>"*
*   >hello</span>*

Of course, more sophisticated constructors are possible with Schema Org (to
declare only URN strings).


PS: issue 405 is at
https://github.com/schemaorg/schemaorg/issues/405#issuecomment-88501504


2015-04-13 11:36 GMT-03:00 Mo McRoberts <Mo.McRoberts@bbc.co.uk>:

> I can count on the fingers of one hand the number of situations where URNs
> really are the best choice of identifier for a thing (although, for the
> record, we should of course all treat URIs are opaque wherever it’s
> feasible to do so).
>
> Dictionary words are a decidedly borderline case.
>
> URNs might work for words used in a language, with no specific definition
> ascribed (e.g., urn:lex:en-gb:hello), or even the collection of ‘all words
> spoken by humans’ — which at least takes advantage of one of the nicer
> properties of URNs, that being that often nobody actually needs to define
> what the members of the set are, because they can be conjured through use.
>
> However, as soon as you get into specific dictionaries or thesauri or even
> simple word-lists, then you have a arbitrary curatorial authority, and so
> there is pretty much zero benefit (and tangible downsides) to using a URN
> over an HTTP URI. The latter has the advantage of being dereferenceable
> with no hoop-jumping, but doesn’t have to be (so long as you don’t serve
> something *else* at those URIs, the sky doesn’t fall in), so it amounts to
> an optional future-proofing mechanism via a protocol everyone knows how to
> talk.
>
> M.
>
> On  2015-Apr-13, at 15:10, Peter Krauss <ppkrauss@gmail.com> wrote:
>
> > Sorry my english, I must say
> >    "in my opinion... I think that the choose of URN  is possible, but is
> a semantic/technological subtlety ..."
> > ... remembering  the context about this kind of choose.
> >
> > Please show here some examples of the "lot of ways..." that you are
> citing.
> >
> >
> > 2015-04-13 10:59 GMT-03:00 <chaals@yandex-team.ru>:
> > 13.04.2015, 12:05, "Peter Krauss" <ppkrauss@gmail.com>:
> >> It is a semantic/technological subtlety... I think this kind of demand
> is better to fix by URNs,
> >
> > to be blunt, I don't think the URN approach is really relevant to
> schema.org. It breaks our approach, in a whole lot of ways...
> >
> > cheers
> >
> >> examples (suppose a "mydic" for "my dictionary URN schema"),
> >>
> >>   urn:mydic:en:mum
> >>   urn:mydic:pt-br:mamãe
> >>   urn:myterm:en:environment
> >>   urn:myterm:pt-br:meio.ambiente
> >>
> >> the key to work with URNs and SchemaOrg is the URN-Resolver, see
> discussion at
> >>
> >>
> https://github.com/schemaorg/schemaorg/issues/405#issuecomment-88501504
> >>
> >>
> >> Peter
> >>
> >>
> >>
> >> 2015-04-13 5:21 GMT-03:00 Bernard Vatant <bernard.vatant@mondeca.com>:
> >> Anastasia
> >>
> >> Indeed schema.org currently lacks terms to describe linguistic /
> knowledge organization resources such as dictionaries. Maybe a future
> extension ...
> >>
> >> Meanwhile, you might wish to explore
> >> http://lov.okfn.org/dataset/lov/vocabs?tag=Vocabularies
> >> which provides some vocabularies designed to describe language and
> linguistic resources
> >>
> >> The first on the list, GOLD (http://purl.org/linguistics/gold)
> certainly provides expressivity beyond your needs.
> >> Lexvo.org ontology (http://lexvo.org/ontology) is simpler and used in
> lexvo.org terminological data base.
> >>
> >> Hope that helps.
> >>
> >>
> >> 2015-04-13 10:01 GMT+02:00 Dan Brickley <danbri@google.com>:
> >>
> >>
> >> On Mon, 13 Apr 2015 08:57 Anastasia Baryshnikova <
> asia.baryshnikova@gmail.com> wrote:
> >>
> >> Hi,
> >>
> >> I've investigated the issue but I don't seem to be able to find a
> solution. I have an online dictionary where every term is linked to
> abbreviations, definitions and translations in other languages. How do I
> annotate them with microdata?
> >> The closest thing I can think of is make every term a CreativeWork,
> with inLanguage property. But how do i link 2 terms that are translations
> of each other?
> >>
> >> Thanks a lot in advance!
> >>
> >>
> >> If you mean schema.org, there is not a lot of vocab for this kind of
> thing yet. But you might read around Wordnet in RDF e.g. starting at
> http://wordnet-rdf.princeton.edu/
> >>
> >> Dan
> >>
> >>
> >> Anastasia Baryshnikova
> >> Crossdictionary.com
> >>
> >>
> >>
> >>
> >>
> >>
> >> --
> >> Bernard Vatant
> >> Vocabularies & Data Engineering
> >> Tel :  + 33 (0)9 71 48 84 59
> >> Skype : bernard.vatant
> >> http://google.com/+BernardVatant
> >> --------------------------------------------------------
> >>
> >> Mondeca
> >> 35 boulevard de Strasbourg 75010 Paris
> >> www.mondeca.com
> >> Follow us on Twitter : @mondecanews
> >> ----------------------------------------------------------
> >>
> >>
> >
> >
> > --
> > Charles McCathie Nevile - web standards - CTO Office, Yandex
> > chaals@yandex-team.ru - - - Find more at http://yandex.com
> >
> >
>
>
> --
> Mo McRoberts - Chief Technical Architect - Archives & Digital Public Space,
> Zone 2.12, BBC Scotland, 40 Pacific Quay, Glasgow G51 1DA.
>
> Inside the BBC? My movements this week: http://neva.li/where-is-mo
>
>
>
>
>
>
>
>
> -----------------------------
> http://www.bbc.co.uk
> This e-mail (and any attachments) is confidential and
> may contain personal views which are not the views of the BBC unless
> specifically stated.
> If you have received it in
> error, please delete it from your system.
> Do not use, copy or disclose the
> information in any way nor act in reliance on it and notify the sender
> immediately.
> Please note that the BBC monitors e-mails
> sent or received.
> Further communication will signify your consent to
> this.
> -----------------------------
>
Received on Monday, 13 April 2015 15:58:26 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 17:49:40 UTC