Testing RFC 4646 values in markup languages

Hi olivier, felix,


I have created two test files for future use.

* html 4.01 - http://www.w3.org/2008/03/test-bogus-lang.html
* xhtml 1.1 - http://www.w3.org/2008/03/test-bogus-xml-lang.xhtml

The files contain
	lang="tartempion"
	xml:lang="tartempion"

It's not only lang attributes. In HTML 4.01,

	hreflang on A, LINK
	lang     on All elements but APPLET, BASE, BASEFONT, BR, FRAME,  
FRAMESET, IFRAME, PARAM, SCRIPT


In XML, it is a bit tricky, it seems. By XML spec



In the future, we might want to catch the values which are not into  
[RFC 4646][1] or syntax errors.

The syntax really became complex in between RFC 3066 and RFC 4646. A  
lot of tests are needed to really check RFC constraints. Felix pointed  
me to Language Tags, which has [a test suite][2].

I wonder if we could make a call on the Q&A blog for implementing such  
a module giving what we need as input and output. Then we could plug  
it in the validator.



# JAVA

In Henri's Sivonen Thesis, he said in [partially implemented][3] it in  
HTML 5 Conformance checker (java).

I also found [a page][6] of an I18N member who has [coded something in  
Java][7].


# PERL

It could be also implemented as a module for LogValidator in perl.

# HASKELL

I found a bit of [Haskell code for checking RFC 4646][4] from Stephane  
Bortzmeyer

shouldBeWellFormed tag =
    HUnit.TestCase (HUnit.assertBool (tag ++ " should be well-formed")
                         (Grammar.testTag tag == True))

shouldBeBroken tag =
    HUnit.TestCase (HUnit.assertBool (tag ++ " should *not* be well- 
formed")
                         (Grammar.testTag tag == False))

main = do
        brokenTags <- tagsFromFile brokenTagsFile
        wfTags <- tagsFromFile wfTagsFile
        let tests = HUnit.TestList (map shouldBeBroken (brokenTags) ++
                                    map shouldBeWellFormed (wfTags))
        HUnit.runTestTT tests








[1]: http://www.ietf.org/rfc/rfc4646.txt
[2]: http://www.langtag.net/test-suites.html
[3]: http://hsivonen.iki.fi/thesis/html5-conformance-checker.xhtml#lang
[4]: http://www.bortzmeyer.org/test-logiciel.html
[5]: http://www.w3.org/TR/html4/struct/dirlang.html#adef-lang
[6]: http://www.dpawson.co.uk/java/rfc4646.html
[7]: http://www.dpawson.co.uk/java/rfc46461.02.zip



--
Karl Dubost - W3C
http://www.w3.org/QA/
Be Strict To Be Cool

Received on Monday, 17 March 2008 05:22:30 UTC