Re: [DPUB-ANNOTATION-UC] 2.1.3 general observation about language identification [I18N-ISSUE-458]

Thanks again for the comments.

I agree that language metadata is important and that the set of use cases
does not specifically include any metadata about the body or target
resources.  That was somewhat intentional, so as to avoid trying to list
out all of the possible descriptive features for resources, such as
creator, creation time, language, file format, license or other rights
statements, intended audience and so forth.  These are listed for the
annotation itself, as the primary resource of interest.

In the Web Annotation data model, the language is explicitly included along
with format and general class of the resource [1].  In the upcoming WD
(next week [2]), we also add creator and creation time.  For language we
refer to RFC 5646 as the value of the language property.  Is that
sufficient to cover the requirements?

Many thanks,



On Sat, Oct 10, 2015 at 1:19 PM, Phillips, Addison <>

> [1] 2.1.3 general observation about language identification
> Description:
>     2.1.3 general observation about language identification
>     Tags and annotations generally use natural language tokens (such as
> words). While Unicode allows text to be stored, passed, and processed
> without regard for the specific language, it is the case that strings can
> benefit from language metadata for character shaping, spell-checking, font
> selection and more. In additional, directionality information is usually
> desired.

Rob Sanderson
Information Standards Advocate
Digital Library Systems and Services
Stanford, CA 94305

Received on Sunday, 11 October 2015 09:24:11 UTC