- From: Felix Sasaki <fsasaki@w3.org>
- Date: Tue, 18 Mar 2008 07:30:11 +0900
- To: Henri Sivonen <hsivonen@iki.fi>
- CC: Karl Dubost <karl@w3.org>, Olivier Théreaux <ot@w3.org>, "public-qa-dev@w3.org list" <public-qa-dev@w3.org>
Henri Sivonen wrote: > On Mar 17, 2008, at 07:21, Karl Dubost wrote: > >> In Henri's Sivonen Thesis, he said in [partially implemented][3] it >> in HTML 5 Conformance checker (java). > > > I have improved the implementation since then. Validator.nu should now > contain a language tag validator that supports the features that are > actually used by the actual registry. (I have a vague recollection of > not implementing some bit of the RFC that was never used by the actual > registry.) cool! I added a bogus language tags in xml:lang at http://www.w3.org/People/fsasaki/ and validated with http://validator.nu/?doc=http%3A%2F%2Fwww.w3.org%2FPeople%2Ffsasaki%2F&schema=http%3A%2F%2Fs.validator.nu%2Fxhtml10%2Fxhtml-strict.rnc+http%3A%2F%2Fs.validator.nu%2Fxhtml10%2Fxhtml.sch+http%3A%2F%2Fc.validator.nu%2Fall-html4%2F&parser=xmldtd&laxtype=yes I got an error message saying # Error: Bad value bla-xmlangggggg-test for attribute xml:lang on XHTML element html: Subtags must next exceed 8 characters in length. From line 2, column 1; to line 2, column 75 ict.dtd">↩<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="bla-xmlangggggg-test">↩<head Syntax of language tag: An RFC 4646 language tag consists of hyphen-separated ASCII-alphanumeric subtags. There is a primary tag identifying a natural language by its shortest ISO 639 language code (e.g. en for English) and zero or more additional subtags adding precision. The most common additional subtag type is a region subtag which most commonly is a two-letter ISO 3166 country code (e.g. GB for the United Kingdom). IANA maintains a registry of permissible subtags. I think this should be "Subtags must *not* exceed 8 characters in length." I added another language tag which is wellformed, but not valid, and got the following message: # Error: Bad value en-1yz for attribute xml:lang on XHTML element body: Found reserved language extension subtag. From line 10, column 1; to line 10, column 24 ↩</head>↩↩<body xml:lang="en-1yz">↩ <p>< Syntax of language tag: An RFC 4646 language tag consists of hyphen-separated ASCII-alphanumeric subtags. There is a primary tag identifying a natural language by its shortest ISO 639 language code (e.g. en for English) and zero or more additional subtags adding precision. The most common additional subtag type is a region subtag which most commonly is a two-letter ISO 3166 country code (e.g. GB for the United Kingdom). IANA maintains a registry of permissible subtags. This looks like values from the language subtag registry are actually checked, though again the error message sounds a bit confusing: "Found reserved language extension subtag.". Maybe "Found language sub tag which is not registered"? I think it is great to see this application of RFC 4646 and you should make the LTRU WG (IETF) aware of this. Felix
Received on Monday, 17 March 2008 22:31:08 UTC