- From: r12a via GitHub <sysbot+gh@w3.org>
- Date: Tue, 17 May 2016 14:18:29 +0000
- To: public-annotation@w3.org
Having exchange brief emails with Ivan, i think we may need to take a step back and come at this from a wider perpective. Ivan told me > I believe the language tag in our case is only used as a 'metadata', in your terminology, to indicate the language of the target or the body resource. How an annotation agent uses this information is beyond the specification; in many cases it actually cannot do anything with the resource (eg, the target), so this term is really only informational. So the i18n WG's initial assumptions were incorrect. Let me try to outline why i'm concerned. If an application is going to use the language value provided to perform an operation on the text, it often needs to know what language the text is **actually** in. For example, such an operation might be running a spellchecker, pronouncing the text in a voice browser, applying hyphenation, case conversion, line breaking and other language-sensitive actions, applying fonts, etc. In these cases it's problematic if you have a list of languages as the value of your `language` property, because to process the text correctly, you actually need to know whether it's Japanese or whether it's French, for example, that you're dealing with. This is the 'text-processing' application we mention above. In HTML this is the function of the `lang` attribute, which can only have one language as it's value, because it is indicating the **actual** language of the text. The i18n WG tends to refer to another type of language annotation as 'metadata'. This typically indicates the intended linguistic audience of the resource as a whole, and it's possible to imagine that this could, for a multilingual resource, involve a property value that is a list of languages. It may be that the 'language' property when referring to a *target* is of the metadata kind (since it's informative, the target is not being operated on, and the target ought to have its own text-processing language declarations), whereas it may be more useful to see the language of the *body* as of the text-processing kind, since that kind of information can be used to indicate to a voice browser how to pronounce the annotation, or to a graphical browser how to break lines of text when displaying the annotation, etc.(?) In order to know how to specify the content of the values for the `language` property, then, it's useful to understand, at least to some extent, how the application is likely to use the information about the language. Which is why we originally raised the question in this issue. Hopefully that clarifies our frame of reference, although it doesn't yet provide a clear way forward. (There will, of course, be an additional question wrt text-processing language declarations, in that a content author may need to indicate that parts of the annotation are in different languages, though i'm not clear how much of an issue it will be for annotations if that level of detail is not provided. It may not be a common use case, or one that causes major difficulties if missing(?) However, it is usually important to have at least a default idea of which language to assume for purposes of processing the text of the annotation, in order to manage the text when it comes to display or use.) -- GitHub Notification of comment by r12a Please view or discuss this issue at https://github.com/w3c/web-annotation/issues/213#issuecomment-219731858 using your GitHub account
Received on Tuesday, 17 May 2016 14:18:31 UTC