Re: [web-annotation] The textDirection and processingLanguage properties are not needed

I also had several complains on the processingLanguage, and I still 
have the feeling that this is not documented enough in the standard, 
and concrete usecases are needed to understand theri meaning. 
In the past discussions there were two types of scenarios discussed:
1. One was related to the "correct" representation of texts with 
multiple languages (e.g. european, arabic, chinese, hebrew...) 
Additionally to this scenario, there was the concern of audio 
2. There is the search scenario, where the NLPs need to know which 
algorithms to use, as they are language specific. 

again my feedback on the 2 scenario types:
1. I doubt that processingLanguage and textDirection are able to solve
 the (absolutely) correct representations of the text. Simply because 
the exact identification of the text parts written in different 
languages is needed. 
2. For the indexing/search scenario, processingLanguage might be 
sufficient, still ... I'm not convinced that this should have a single
 value! It is ok for text that are writen to >90% in one language, but
 absolutely not ok for texts which habe near 50-50% distribution!
Futhermore, it is not enough to have a definition of 
processingLanguage, which is anyway a little bit vague given that it 
is intended to serve to purposes at this stage (a dangerous approach).
Who sould set this property?
- This is for sure a property that will not be set by the end users. 
(they are likely to set the language property)
- is the client application the part of the system in charge of 
setting this value when the annotation is created? .. probalby in some
 exotic scenarios, as I don't expect that the NLP is applied before 
pressing the submit button. 
- is the server in charge of setting the processingLanguage? Well ....
 actually the server is the one that needs this value as input, in 
order to know how to tokenize, normalize, stemm the text. Should an 
automatic language detection algorithm be used? Should the server 
simply advertize the language of processing algorithms that were 
applied? If yes, why should be the server be constrained to use only 
one processingLanguage?
- I think this is mainly a kind of client-server negociation mechanism
 (the client should know which processing languages are supported by 
the server and choose one or more  of them). I think this is the first
 usecase to be addressed in order to provide a clear definition and 
meaning of the field.

GitHub Notification of comment by gsergiu
Please view or discuss this issue at
 using your GitHub account

Received on Wednesday, 3 August 2016 09:54:49 UTC