- From: John Cowan <cowan@ccil.org>
- Date: Mon, 13 Nov 2006 18:29:37 -0500
- To: CE Whitehead <cewcathar@hotmail.com>
- Cc: www-international@w3.org
CE Whitehead scripsit: > Hi, I am troubled by tags like frc, fro, and frm because I am wondering > what happens when a person using a search engine asks for pages > in French? Will the frc, fro, frm pages turn up too? In practice, search engines tend to ignore language tagging in favor of statistical analysis of text, since language tags are so often missing or incorrect. > It's quite possible that a person interested in French will be > interested in moyen Francais/Middle French (frc) and in Old French > (fro) if the search is for someone studying French. There are various contexts where multiple language tags can be specified, indicating that you will accept content in any of these, usually with a priority order. > The trouble with you all is you assume that people are just searching > for pages in their first language and that they have only one real > primary language they can accept pages in; clearly this cannot be the > case for fro (Old French) and frm (Moyen Francais). Not at all. > It's also conceivable that a person might want documents that are > written in either a Creole of French and Standard French. The same feature (the language-tag search list) can be used in that case as well. > One could of course list all of these in the meta content tags; for > example for my "Moyen francais" document I could list: lang=en, fr, frm Indeed. > but some applications used to put up pages at some web hosts embed > one's document into the body of a page they create; that's the case > with teacher web (http://teacherweb.com), as I pointed out once before. That's known to be a problem, yes. > Also, as I noted, some of the 17th Century new world documents were > in Middle French although you all have set the dates as 1400-1600 > (those dates can vary a bit; you'd be surprised also at the amount of > variation you can get in any given language at any given time before > literacy was so widespread) We didn't pick the dates, the ISO 639-2 Registration Authority (the Library of Congress) did. You can go to http://www.loc.gov/standards/iso639-2/php/iso639-2chform.php and request a change. > I note that for Arabic (which has as far as I know and I am no expert) > the following main subdivisions in its dialects, [...] you just have > to use the country codes--at least this is all I saw? As of ISO 639-2 and RFC 4646, yes. In 639-3 and 4646bis, about 30 different Arabic language tags will be available: aao Algerian Saharan Arabic abh Tajiki Arabic abv Baharna Arabic acm Mesopotamian Arabic acq Ta'izzi-Adeni Arabic acw Hijazi Arabic acx Omani Arabic acy Cypriot Arabic adf Dhofari Arabic aeb Tunisian Arabic aec Saidi Arabic afb Gulf Arabic ajp South Levantine Arabic apc North Levantine Arabic apd Sudanese Arabic arb Standard Arabic arq Algerian Arabic ars Najdi Arabic ary Moroccan Arabic arz Egyptian Arabic auz Uzbeki Arabic avl Eastern Egyptian Bedawi Arabic ayh Hadrami Arabic ayl Libyan Arabic ayn Sanaani Arabic ayp North Mesopotamian Arabic bbz Babalia Creole Arabic pga Sudanese Creole Arabic shu Chadian Arabic ssh Shihhi Arabic > Why not also have variants for dates, such as two digits plus the letter > c, with the two digits indicating the century (01-20; I assume that the > century would be redundant for the 21rst century variant of a language)? Those forms are too short, and there are problems with generic tags, as centuries are not the appropriate units for many languages. -- Is a chair finely made tragic or comic? Is the John Cowan portrait of Mona Lisa good if I desire to see cowan@ccil.org it? Is the bust of Sir Philip Crampton lyrical, http://ccil.org/~cowan epical or dramatic? If a man hacking in fury at a block of wood make there an image of a cow, is that image a work of art? If not, why not? --Stephen Dedalus
Received on Monday, 13 November 2006 23:29:55 UTC