[web-annotation] Processing language for multilingual resources

gsergiu has just created a new issue for 
https://github.com/w3c/web-annotation:

== Processing language for multilingual resources ==
The conclusion of ticket 
https://github.com/w3c/web-annotation/issues/337#issuecomment-238557004
 is that there is a M to N relationship between the dc:language of 
multilingual resources and the text processors that might process the 
annotation body and/or target.

Therefore the following proposal for the definition of the processing 
language property:

"This property represents the relationship between the language of the
 resources (Body or Target) and the text processors or classes of text
 processors that may process the resources for rendering, indexing or 
any NLP processing."

1.      Consequently I propose that the verbose representation of this
 property should include <language, processor_class, processor_id> 
tuples. 
It is recommended to use a vocabulary for processor classes like: 
textual_representation, audio_representation, visual_representation 
(i.e. image), text_indexing, nlp_processing
Example:
```
processingLanguage:{
  {language: [“en”, “fr”, “ro”],  processor_class: 
“textual_representation”},
  {language: “en”,  processor_class: “text_indexing”, processor_id : 
“<snowball_indexer_uri>”},
  {language: “ro”,  processor_class: “audio_representation”, 
processor_id : “<TTS_RO_uri>”}
}
```
 
2.      The minified representation could be compliant with the 
current specification, with the meaning that all text processors (all 
types) should use the same processing language.

3.      There are 2 open questions:

a.      Should this property be named “processing”?
b.      Should this information be embedded within the annotations 
(model) or in the protocol (own http request)?


Please view or discuss this issue at 
https://github.com/w3c/web-annotation/issues/341 using your GitHub 
account

Received on Tuesday, 9 August 2016 14:08:04 UTC