- From: Felix Sasaki <fsasaki@w3.org>
- Date: Tue, 12 Jul 2016 11:57:42 +0200
- To: "Martin J. Dürst" <duerst@it.aoyama.ac.jp>, Michael Smith <mike@w3.org>
- Cc: public-i18n-its-ig@w3.org
- Message-Id: <E6B2D815-A574-48EE-B570-F8D7220E6351@w3.org>
Thanks for the positive feedback and the good point about listing the supported languages, Martin. I am putting Mike directly into the loop, maybe he knows what languages are supported. I browsed the underlying library https://github.com/shuyo/language-detection <https://github.com/shuyo/language-detection> but did not find a list of languages. See also https://github.com/shuyo/language-detection/blob/wiki/ProjectHome.md <https://github.com/shuyo/language-detection/blob/wiki/ProjectHome.md> and this presentation https://github.com/shuyo/language-detection <https://github.com/shuyo/language-detection> the github project home page says that 53 languages are supported with 99% precision. Best, Felix > Am 12.07.2016 um 09:00 schrieb Martin J. Dürst <duerst@it.aoyama.ac.jp>: > > Hello Felix, > > This is good news. However, for language detection, it's important to know what languages the detector supports. Language detection is very well known for being rather easy (on documents above a certain length) for a given set of languages. However, it's impossible to detect a language that the detector doesn't know. So a list of (currently) supported languages, and maybe a suggestion of how to contribute to additional ones, would be very helpful. > > Regards, Martin. > > On 2016/07/12 15:18, Felix Sasaki wrote: >> Hi all, >> >> thanks to the Mike Smith there is now a language detection feature in the W3C validator. See >> >> https://validator.w3.org/nu/?doc=https%3A%2F%2Fw3.org&out=json <https://validator.w3.org/nu/?doc=https://w3.org&out=json> >> https://validator.w3.org/nu/?doc=https%3A%2F%2Fw3.org&out=xml <https://validator.w3.org/nu/?doc=https://w3.org&out=xml> >> >> For examples. >> >> Explanation from Mike: >> In the JSON output you should see that the JSON object has a “language” key at the top level, and in the XML you should that the root “messages” object has a “language” child element. >> The “language” value is a BCP 47 language tag. If the “language” is absent in the JSON/XML output, that indicates the language could not be determine with enough confidence. >> >> >> Example in curl: >> curl -X POST -H "Content-Type: text/html; charset=utf-8" -d 'HTML document here' "https://validator.w3.org/nu/?out=json" >> >> Output in JSON: >> >> { >> "messages": [ ... ], >> "language": "en" >> } >> >> >> This has a great potential to automatize language processing workflows on the web. >> >> - Felix >>
Received on Tuesday, 12 July 2016 09:58:10 UTC