- From: Felix Sasaki <fsasaki@w3.org>
- Date: Tue, 12 Jul 2016 08:18:08 +0200
- To: public-i18n-its-ig@w3.org
- Message-Id: <38BF99D8-2559-4C19-93B9-78BB2D5B400E@w3.org>
Hi all, thanks to the Mike Smith there is now a language detection feature in the W3C validator. See https://validator.w3.org/nu/?doc=https%3A%2F%2Fw3.org&out=json <https://validator.w3.org/nu/?doc=https://w3.org&out=json> https://validator.w3.org/nu/?doc=https%3A%2F%2Fw3.org&out=xml <https://validator.w3.org/nu/?doc=https://w3.org&out=xml> For examples. Explanation from Mike: In the JSON output you should see that the JSON object has a “language” key at the top level, and in the XML you should that the root “messages” object has a “language” child element. The “language” value is a BCP 47 language tag. If the “language” is absent in the JSON/XML output, that indicates the language could not be determine with enough confidence. Example in curl: curl -X POST -H "Content-Type: text/html; charset=utf-8" -d 'HTML document here' "https://validator.w3.org/nu/?out=json" Output in JSON: { "messages": [ ... ], "language": "en" } This has a great potential to automatize language processing workflows on the web. - Felix
Received on Tuesday, 12 July 2016 06:18:24 UTC