Re: human readable labels for schema.org terms (and translated labels)

From: Thomas Francart <thomas.francart@sparna.fr>
Date: Wed, 15 Feb 2017 18:48:28 +0100
Message-ID: <CAPugn7URS=-1Su7Y_D_WiXemnPFkK0Kh5LAOELL_EF+7+ET1xw@mail.gmail.com>
To: Dan Brickley <danbri@google.com>
Cc: Charles McCathie Nevile <chaals@yandex-team.ru>, "schema.org Mailing List" <public-schemaorg@w3.org>

I support the idea of having human-readable labels in schema.org, at least
on the classes. I have attached a file on the git issue with the
human-readable labels generated from the CamelCase identifiers.

On a related topic, I would be interested in this to produce rendering and
datavizualisations of the schema.org classes in SKOS Play :
http://labs.sparna.fr/skos-play. If you want to try it out :

   1. Go to http://labs.sparna.fr/skos-play/upload
   2. In the field "On the web, in a file or a SPARQL endpoint", enter the
   URL of the schema.org RDFS raw file :
   3. Check the box "Transform an OWL ontology to SKOS"
   4. Click on next, and generate one of the available report "Alphabetical
   index, expanded", or dataviz like "Tree visualisation"

Having human readable labels would help, specifically for the "Permuted
index" views, or to generate translation tables. These kind of reports can
help reviewing new versions schema.org.



2017-02-15 18:16 GMT+01:00 Dan Brickley <danbri@google.com>:

> Chaals et al.,
> I was looking at what we have for human readable term labels in schema.org
> -
> currently they just mirror the last part of the URL exactly - i.e.
> see https://gist.github.com/danbri/f1add8f30b0c7444702e601970a788
> 41#file-_labels-txt
> for a list, but these are basically just the last part of their
> schema.org URLs, e.g. "action", "CreativeWork" etc.
> I remember we've discussed (also) having more sentency strings, e.g.
> to make translation easier. Do you have any thoughts on design of
> those? e.g. acronym-handling, case handling with a view to translating
> into non-English languages and scripts. Looking at
> https://en.wikipedia.org/wiki/Cyrillic_script it seems case can be
> represented there, but obviously there are other languages to
> consider. Currently we have exactly one term ID for each human
> readable label, but this requires "action" to be distinct from
> "Action".
> We could probably make a first pass at creating more human oriented
> labels by adding a space character into the name string whenever a
> capital letter is preceded by a lowercase letter or multi-letter
> capitalized acronym. We could then make a pass through and consider
> expanding anything cryptic. Many properties include acronyms, which
> are problematic both for intelligibility, and regarding
> capitalization.
> A related idea would be to partition the first sentence or two from
> the description/definition of each term, since these are sometimes
> very large blocks of hypertext. We could aim to make sure that these
> "core definition" strings include appropriate expansion or referencing
> of any acronyms or confusing terms in the actual term. We could then
> use these short definitions as a translation target, and treat them
> more conservatively than the larger definition text.
> Thoughts?
> Dan
> p.s. filed an issue https://github.com/schemaorg/schemaorg/issues/1523
> though I suspect there is another issue already which I failed to
> find...


