Re: Testing RFC 4646 values in markup languages

Hi Karl,

Karl Dubost wrote:
> Hi olivier, felix,
>
>
> I have created two test files for future use.
>
> * html 4.01 - http://www.w3.org/2008/03/test-bogus-lang.html
> * xhtml 1.1 - http://www.w3.org/2008/03/test-bogus-xml-lang.xhtml
>
> The files contain
>     lang="tartempion"
>     xml:lang="tartempion"

cool :)

>
> It's not only lang attributes. In HTML 4.01,
>
>     hreflang on A, LINK
>     lang     on All elements but APPLET, BASE, BASEFONT, BR, FRAME, 
> FRAMESET, IFRAME, PARAM, SCRIPT
>
>
> In XML, it is a bit tricky, it seems. By XML spec

why is it tricky?

>
>
>
> In the future, we might want to catch the values which are not into 
> [RFC 4646][1] or syntax errors.
>
> The syntax really became complex in between RFC 3066 and RFC 4646. A 
> lot of tests are needed to really check RFC constraints.


the amount of tests and of constraints depends on what you want to 
check. RFC 4646 defines two types of conformance: wellformed is 
basically a check against a (complicated, but still a) regular 
expression (ABNF) defined in sec. 2.1 of the RFC. valid means that you 
validate the values of sub tags (a language tag consists of a sequence 
of sub tags, separated by hyphen "-")  against the language sub tag 
registry.

IMO it would be sufficient to have a wellformed checker, which makes 
implementation easier.


> Felix pointed me to Language Tags, which has [a test suite][2].
>
> I wonder if we could make a call on the Q&A blog for implementing such 
> a module giving what we need as input and output. Then we could plug 
> it in the validator.

sounds good! Which languages are of value for you? Perl?

Btw., currently a successor for RFC 4646 is on its way, which will have 
a slightly modified ABNF, getting rid of the "extlang" production. I can 
give more details if necessary.

Felix

Received on Monday, 17 March 2008 08:53:13 UTC