- From: Felix Sasaki <fsasaki@w3.org>
- Date: Tue, 12 Jul 2016 08:18:08 +0200
- To: public-i18n-its-ig@w3.org
- Message-Id: <38BF99D8-2559-4C19-93B9-78BB2D5B400E@w3.org>
Hi all,
thanks to the Mike Smith there is now a language detection feature in the W3C validator. See
https://validator.w3.org/nu/?doc=https%3A%2F%2Fw3.org&out=json <https://validator.w3.org/nu/?doc=https://w3.org&out=json>
https://validator.w3.org/nu/?doc=https%3A%2F%2Fw3.org&out=xml <https://validator.w3.org/nu/?doc=https://w3.org&out=xml>
For examples.
Explanation from Mike:
In the JSON output you should see that the JSON object has a “language” key at the top level, and in the XML you should that the root “messages” object has a “language” child element.
The “language” value is a BCP 47 language tag. If the “language” is absent in the JSON/XML output, that indicates the language could not be determine with enough confidence.
Example in curl:
curl -X POST -H "Content-Type: text/html; charset=utf-8" -d 'HTML document here' "https://validator.w3.org/nu/?out=json"
Output in JSON:
{
"messages": [ ... ],
"language": "en"
}
This has a great potential to automatize language processing workflows on the web.
- Felix
Received on Tuesday, 12 July 2016 06:18:24 UTC