human readable labels for schema.org terms (and translated labels)

Chaals et al.,

I was looking at what we have for human readable term labels in schema.org -

currently they just mirror the last part of the URL exactly - i.e.

see https://gist.github.com/danbri/f1add8f30b0c7444702e601970a78841#file-_labels-txt
for a list, but these are basically just the last part of their
schema.org URLs, e.g. "action", "CreativeWork" etc.

I remember we've discussed (also) having more sentency strings, e.g.
to make translation easier. Do you have any thoughts on design of
those? e.g. acronym-handling, case handling with a view to translating
into non-English languages and scripts. Looking at
https://en.wikipedia.org/wiki/Cyrillic_script it seems case can be
represented there, but obviously there are other languages to
consider. Currently we have exactly one term ID for each human
readable label, but this requires "action" to be distinct from
"Action".

We could probably make a first pass at creating more human oriented
labels by adding a space character into the name string whenever a
capital letter is preceded by a lowercase letter or multi-letter
capitalized acronym. We could then make a pass through and consider
expanding anything cryptic. Many properties include acronyms, which
are problematic both for intelligibility, and regarding
capitalization.

A related idea would be to partition the first sentence or two from
the description/definition of each term, since these are sometimes
very large blocks of hypertext. We could aim to make sure that these
"core definition" strings include appropriate expansion or referencing
of any acronyms or confusing terms in the actual term. We could then
use these short definitions as a translation target, and treat them
more conservatively than the larger definition text.

Thoughts?

Dan

p.s. filed an issue https://github.com/schemaorg/schemaorg/issues/1523
though I suspect there is another issue already which I failed to
find...

Received on Wednesday, 15 February 2017 17:17:33 UTC