Re: [web-annotation] exactly 0 or 1 language(s) from r12a via GitHub on 2016-05-17 (public-annotation@w3.org from May 2016)

From: r12a via GitHub <sysbot+gh@w3.org>
Date: Tue, 17 May 2016 14:18:29 +0000
To: public-annotation@w3.org
Message-ID: <issue_comment.created-219731858-1463494707-sysbot+gh@w3.org>
Having exchange brief emails with Ivan, i think we may need to take a 
step back and come at this from a wider perpective.  Ivan told me 

> I believe the language tag in our case is only used as a 'metadata',
 in your terminology, to indicate the language of the target or the 
body resource. How an annotation agent uses this information is beyond
 the specification; in many cases it actually cannot do anything with 
the resource (eg, the target), so this term is really only 
informational.

So the i18n WG's initial assumptions were incorrect.  Let me try to 
outline why i'm concerned.

If an application is going to use the language value provided to 
perform an operation on the text, it often needs to know what language
 the text is **actually** in.  For example, such an operation might be
 running a spellchecker, pronouncing the text in a voice browser, 
applying hyphenation, case conversion, line breaking and other 
language-sensitive actions, applying fonts, etc.  In these cases it's 
problematic if you have a list of languages as the value of your 
`language` property, because to process the text correctly, you 
actually need to know whether it's Japanese or whether it's French, 
for example, that you're dealing with.  This is the 'text-processing' 
application we mention above. In HTML this is the function of the 
`lang` attribute, which can only have one language as it's value, 
because it is indicating the **actual** language of the text.

The i18n WG tends to refer to another type of language annotation as 
'metadata'.  This typically indicates the intended linguistic audience
 of the resource as a whole, and it's possible to imagine that this 
could, for a multilingual resource, involve a property value that is a
 list of languages.

It may be that the 'language' property when referring to a *target* is
 of the metadata kind (since it's informative, the target is not being
 operated on, and the target ought to have its own text-processing 
language declarations), whereas it may be more useful to see the 
language of the *body* as of the text-processing kind, since that kind
 of information can be used to indicate to a voice browser how to 
pronounce the annotation, or to a graphical browser how to break lines
 of text when displaying the annotation, etc.(?)

In order to know how to specify the content of the values for the 
`language` property, then, it's useful to understand, at least to some
 extent, how the application is likely to use the information about 
the language. Which is why we originally raised the question in this 
issue.

Hopefully that clarifies our frame of reference, although it doesn't 
yet provide a clear way forward.
(There will, of course, be an additional question wrt text-processing 
language declarations, in that a content author may need to indicate 
that parts of the annotation are in different languages, though i'm 
not clear how much of an issue it will be for annotations if that 
level of detail is not provided. It may not be a common use case, or 
one that causes major difficulties if missing(?) However, it is 
usually important to have at least a default idea of which language to
 assume for purposes of processing the text of the annotation, in 
order to manage the text when it comes to display or use.)

-- 
GitHub Notification of comment by r12a
Please view or discuss this issue at 
https://github.com/w3c/web-annotation/issues/213#issuecomment-219731858
 using your GitHub account
Received on Tuesday, 17 May 2016 14:18:31 UTC