- From: Jukka K. Korpela <jkorpela@cs.tut.fi>
- Date: Mon, 21 Nov 2016 15:11:17 +0200
- To: www-validator@w3.org
21.11.2016, 3:19, JC Ahangama wrote: > That page is written in Romanized Singhala, and rendered in the native > script using an Orthographic Smartfont. The page http://ahangama.com/election/whatnext-s.htm appears to be written in Sinhala, using the Sinhala alphabet (script), code Sinh, but using a technique based on “font trickery”: a special 8-bit font, containing Ascii in the lower range (0..0xFF) and Sinhala letters (perhaps in the same order as in the Sinhala block in Unicode) in the upper range. This trickery is entirely based on the assumption that browsers will use that special font. These days, the assumption can be satisfied more often than in the old days, as you can use @font-face to embed it, reaching near (but not quite) 100% coverage. > I feel the meta data, lang='si-Latn' conveys the correct information. It does not, and neither does the element <meta charset="utf-8">. First, there is no defined Latin (Roman) writing system for Sinhala, I’m afraid, so lang="si-Latn" is misleading. Second, the data is not in fact UTF-8 encoded. Interpreted as UTF-8 data, its <body> content starts with “[2016-11-14] (akuru loku karanna bravsara kavuLuvee ðakuNu agin allaagena mehi æþi” I don’t think any official or unofficial writing system for Sinhala uses Icelandic letters “ð” and “þ”. > However, I have not registered this notation. Please help me to do it > properly. I don’t think that’s the solution. The options as I see them are: 1) Keep doing what you have done and ignore the warning. It is, after all, just a warning message from an experimental checker, caused by experimental language-guessing, which is known to guess wrong rather often (though here the reason is that the content, interpreted according to the metadata of the document, is not in any human language, and the guesser just makes a wild guess). 2) Switch to using UTF-8 encoded Sinhala characters (and use just lang="si"). This is nontrivial, as most of the page content needs to be recoded. If you think you need to embed a font them, try and find a Unicode font that contains them (properly assigned to the correct code points). Yucca
Received on Monday, 21 November 2016 13:11:49 UTC