W3C home > Mailing lists > Public > public-i18n-geo@w3.org > July 2006

Re: Language Best Practises: Please review

From: Martin Duerst <duerst@it.aoyama.ac.jp>
Date: Mon, 03 Jul 2006 19:48:57 +0900
Message-Id: <>
To: "Richard Ishida" <ishida@w3.org>, "GEO" <public-i18n-geo@w3.org>

Hello Richard,

Some comments:

Very through work!

The document should have a list of best practices right at the top.
(see e.g. the WebArch doc)

Changing the indent (front material up to TOC is indented ca. 2cm,
starting with 1 Intro, the indent changes to ca. 4cm) doesn't make
sense. Same with the additional indent for the references (which
would make sense if the indent were applied to the reference text
but not to the labels).

In 3.1:
"Metadata about the language of the intended audience is about the document as a whole."

The final clause sounds a bit unprofessional. Also, the repetition of the
'about' should be avoided. What about: "The language of the intended audience
is metadata on the document as a whole.

"Always declare the default text-processing language of the page, using attributes on the html tag, unless the intended audience speaks multiple languages."

The first comma is confusing. The comma seems to indicate that it's
just e.g. one way of doing things, but then this shouldn't be part
of the actual BP text.

further down:
"the intended audience is expected to read content in more than one language (eg. a multilingual blog, or a page aimed at more than one language community)": This is confusing. For a page aimed at more than one
language community, isn't the reason this page contains more than one
language just that the audience is NOT expected to read content in more
than one language? I.e. for the Canadian example, English and French
are there not because everybody reads English and French, but just
because some people don't read both languages.

"it may make more sense to declare the default text-processing language
lower down in the document than in the html tag."
'lower down' is extremely colloquial, and does not make clear that
this is lower down in the document hierarchy, and not lower down
in the document flow. Also, 'html tag' is colloquial. There are two
html tags, a start tag and an end tag.

"Best Practise 2: html declarations for multilingual docs"
Practise -> Practice; docs->documents; 'html declarations'->

BP1, 2, and 6 all deal with putting something on the HTML tag.
While I agree it's important because in the frequent monolingual case,
it's the only thing people have to do, it still somehow
feels like overkill.

"Best Practise 4: Should I use the lang or xml:lang attribute?"
Some BP titles directly give the BP. This is best; ideally, all
should be like that. BP2 just gives the topic, so it might be
improved. Having a BP as a question is really confusing.
BP7 again is a problem; it's a subclause, easily the worst
grammatical entity to go into a title.

Example 15/16 use a ">"/"&gt;" that isn't needed for this example and
probaly will confuse a few people.

BP8: the <code> text is too small, the style sheet has to be fixed
This BP should also say that the HTTP header may be preferable because
on some servers (Apache in particular), language negotiation is done
by looking at the headers in a HTTP HEAD subrequest.

BP10: "Dividing parallel text at the highest possible level, can simplify..."
Comma seems unnecessary/counterproductive.

Best Practise 11: Use RFC3066bis or its successor: There is absolutely
no mention of BCP 47, but this may be very helpful in this case. BCP 47
is the number the IETF uses to denote "RFC3066(bis) or its successor".

Best Practise 12: Use short language codes: These are language tags,
not language codes (language codes are those things in ISO 639-x).

"Although RFC 3066bis introduces script tags, as RFC 3066bis co-author, Addison Phillips, writes, "For virtually any content that does not use a script tag today, it remains the best practise not to use one in the future"."
This is a nice quote, but doesn't look appropriate in the context of
this document.

"In the past, there was often some confusion about which ISO language code to choose, since there often 2-letter and 3-letter alternatives for the same language (and sometimes two 3-letter alternatives). This question is now moot because you should only use language tags specified in the IANALanguage Subtag Registry, and only one subtag exists per language in that registry (the shortest one)."
This implies that this question was open in RFC 3066. This is wrong.
RFC 3066 made it very clear that if there was a two-letter code,
that had to be used, and there were two-letter codes for all languages
that had two three-letter codes. Pointing to the subtag registry
(after adding a space, probably best by including IANA into the link)
is a good idea, but it shouldn't result in creating confusion where
there was none.

Best Practise 13: Use Hans and Hant codes
It would be better if this practice was worded more generally, e.g.
"Use script codes to distinguing language variants that differ by
script, rather than using a country where this variant is prevalent."

[Ed. note: This best practise has also been rewritten to reflect changes in RFC 306bis.] RFC 306bis -> 3066bis

BP14: Yet another way to title a BP: pros and cons. Adding just one
word (Consider) creates something that says what it means out of
something that doesn't sound like a best practice at all.

Example 22 uses the actual attribute value, since these two-letter codes are typically recognizable by speakers of the language.
I strongly doubt this. Have you done some research? Did you ask
people on the street, or is there some data you are basing this on?

Regards,    Martin.

At 02:13 06/07/01, Richard Ishida wrote:
>I have once more gone through 
>today, and I feel happy about removing the term 'primary language' in 
>favour of 'language of the intended audience' (or similar).  I would like 
>to bring closure to this.
>Please look through the text (look for change marks) and tell me if you see 
>any problems.  I will take silence for assent and will assume that everyone 
>is happy if I don't hear otherwise by Wednesday a.m. (UK).
>I also made some editorial improvements, and a couple of significant 
>changes to the document:
>1. I moved some of the text that was buried in a best practise to what is 
>now section 4.2.  I think this constitutes a major improvement to the document.
>2. I turned these headings:
>How to declare the text-processing language -> Using attributes to declare 
>the text-processing language
>How to specify primary language metadata -> Using metadata to specify the 
>language of the intended audience
>I think this is much better from the user's point of view (esp. in the 
>techniques index) since we can't expect them to know the difference between 
>text-processing and primary language - nor to care much.  But they will be 
>interested in how they should use attributes.
>3. I edited, and in one case rewrote, some best practises in light of 
>changes in RFC 3066bis.
>4. I tweaked the "How to ..." headings, since these are no longer techniques.
>Please also comment on these changes, if you feel the need, in the same 
>Richard Ishida
>Internationalization Lead
>W3C (World Wide Web Consortium)

#-#-#  Martin J. Du"rst, Assoc. Professor, Aoyama Gakuin University
#-#-#  http://www.sw.it.aoyama.ac.jp       mailto:duerst@it.aoyama.ac.jp     
Received on Monday, 3 July 2006 12:38:58 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 6 January 2015 20:28:04 UTC