RE: Language Best Practises: Please review

Hi Martin,

Thanks for your (very detailed) comments. I've been working through them...

> From: Martin Duerst [mailto:duerst@it.aoyama.ac.jp] 
> Sent: 03 July 2006 11:49
...
> The document should have a list of best practices right at the top.
> (see e.g. the WebArch doc)

I'll consider that.

> 
> Changing the indent (front material up to TOC is indented ca. 
> 2cm, starting with 1 Intro, the indent changes to ca. 4cm) 
> doesn't make sense. Same with the additional indent for the 
> references (which would make sense if the indent were applied 
> to the reference text but not to the labels).

Yes, that was just a quick fix.  I'll look at again shortly.  I was
wondering whether we could get away from the full width text, since that
makes documents harder to read, but I'm not sure how to do it.  I plan to
check out some other recent specs, like MWBP, to see if they have any good
ideas.

XXXX

> 
> In 3.1:
> "Metadata about the language of the intended audience is 
> about the document as a whole."
> 
> The final clause sounds a bit unprofessional. Also, the 
> repetition of the 'about' should be avoided. What about: "The 
> language of the intended audience is metadata on the document 
> as a whole.

Actually it is an intentional echo construct intended to draw a parallel
between the two subclauses. It doesn't sound unprofessional to me.  


> 
> BP1:
> "Always declare the default text-processing language of the 
> page, using attributes on the html tag, unless the intended 
> audience speaks multiple languages."
> 
> The first comma is confusing. The comma seems to indicate 
> that it's just e.g. one way of doing things, but then this 
> shouldn't be part of the actual BP text.
> 

Yes. Fixed.



> further down:
> "the intended audience is expected to read content in more 
> than one language (eg. a multilingual blog, or a page aimed 
> at more than one language community)": This is confusing. For 
> a page aimed at more than one language community, isn't the 
> reason this page contains more than one language just that 
> the audience is NOT expected to read content in more than one 
> language? I.e. for the Canadian example, English and French 
> are there not because everybody reads English and French, but 
> just because some people don't read both languages.

I think you are constraining your thinking too much to just one of the
scenarios mentioned above. A multilingual blog in two languages is typically
aimed at an audience that speaks both languages, and switches between
languages depending on the preference of the writer.  A page with parallel
content is a somewhat different scenario, although the audience of the
document itself is still a multilingual community.


> 
> "it may make more sense to declare the default 
> text-processing language lower down in the document than in 
> the html tag."
> 'lower down' is extremely colloquial, and does not make clear 
> that this is lower down in the document hierarchy, and not 
> lower down in the document flow. Also, 'html tag' is 
> colloquial. There are two html tags, a start tag and an end tag.

This isn't a specification, so I'm happy to be a little colloquial, as long
as the meaning is not impaired.  I suspect that people reading this will be
aware that putting language information in the end tag probably won't be a
good idea, we don't need to spell that out for them ;-)

In fact, having just heard a great deal of feedback from designers about
WCAG 2.0, and having reviewed it myself, I'm keen to avoid sounding too
'speccy', and find myself wondering whether I should try to deformalised
some more of the text. (But I won't.)

> 
> "Best Practise 2: html declarations for multilingual docs"
> Practise -> Practice; docs->documents; 'html declarations'-> expand

Changed.

> 
> BP1, 2, and 6 all deal with putting something on the HTML tag.
> While I agree it's important because in the frequent 
> monolingual case, it's the only thing people have to do, it 
> still somehow feels like overkill.
> 
> "Best Practise 4: Should I use the lang or xml:lang attribute?"
> Some BP titles directly give the BP. This is best; ideally, 
> all should be like that. BP2 just gives the topic, so it 
> might be improved. Having a BP as a question is really confusing.
> BP7 again is a problem; it's a subclause, easily the worst 
> grammatical entity to go into a title.

Hmm. This is not easy.  See a separate mail to follow.

> 
> Example 15/16 use a ">"/">" that isn't needed for this 
> example and probaly will confuse a few people.

I replaced it with an image, although I'm not sure that's much better: it
moves away from being a real example, and adds lots of markup to the
example... 

> 
> BP8: the <code> text is too small, the style sheet has to be 
> fixed 

That's a browser issue.  There is no special sizing applied to the <code>
text, and mine looks the same size as the normal text (and certainly wider).

XXXX

>This BP should also say that the HTTP header may be 
> preferable because on some servers (Apache in particular), 
> language negotiation is done by looking at the headers in a 
> HTTP HEAD subrequest.

I don't understand why the HTTP header may be preferable in the case of
language negotiated content.  The language negotiation process does seem to
have the side-effect of sending Content-Language information with the HTTP
header, but I don't see why that relates to advice to use http headers for
declaring language.  The negotiation is based on information that comes from
the browser, not the document.

I did, however, add the following para:

"Sometimes a server has been set up to automatically serve a
language-specific version of a resource based on the user's browser settings
(content negotiation). In this case, your server is likely to send language
information in the Content-Language header."


> 
> BP10: "Dividing parallel text at the highest possible level, 
> can simplify..."
> Comma seems unnecessary/counterproductive.

Fixed.

> 
> Best Practise 11: Use RFC3066bis or its successor: There is 
> absolutely no mention of BCP 47, but this may be very helpful 
> in this case. BCP 47 is the number the IETF uses to denote 
> "RFC3066(bis) or its successor".
> 
> Best Practise 12: Use short language codes: These are 
> language tags, not language codes (language codes are those 
> things in ISO 639-x).

Fixed 

> 
> "Although RFC 3066bis introduces script tags, as RFC 3066bis 
> co-author, Addison Phillips, writes, "For virtually any 
> content that does not use a script tag today, it remains the 
> best practise not to use one in the future"."
> This is a nice quote, but doesn't look appropriate in the 
> context of this document.

I don't see why.


> 
> "In the past, there was often some confusion about which ISO 
> language code to choose, since there often 2-letter and 
> 3-letter alternatives for the same language (and sometimes 
> two 3-letter alternatives). This question is now moot because 
> you should only use language tags specified in the 
> IANALanguage Subtag Registry, and only one subtag exists per 
> language in that registry (the shortest one)."
> This implies that this question was open in RFC 3066. This is wrong.
> RFC 3066 made it very clear that if there was a two-letter 
> code, that had to be used, and there were two-letter codes 
> for all languages that had two three-letter codes. Pointing 
> to the subtag registry (after adding a space, probably best 
> by including IANA into the link) is a good idea, but it 
> shouldn't result in creating confusion where there was none.

Reworded.

> 
> Best Practise 13: Use Hans and Hant codes It would be better 
> if this practice was worded more generally, e.g.
> "Use script codes to distinguing language variants that 
> differ by script, rather than using a country where this 
> variant is prevalent."

No. Definitely don't want to do this, since I don't think we need to
encourage people to use script codes in general.  On the other hand, general
use of zh-Hant and zh-Hans is a very good idea, needs to be widely known,
and is currently in need of more visibility.

> 
> [Ed. note: This best practise has also been rewritten to 
> reflect changes in RFC 306bis.] RFC 306bis -> 3066bis
> 
> BP14: Yet another way to title a BP: pros and cons. Adding 
> just one word (Consider) creates something that says what it 
> means out of something that doesn't sound like a best practice at all.
> 
> Example 22 uses the actual attribute value, since these 
> two-letter codes are typically recognizable by speakers of 
> the language.
> I strongly doubt this. Have you done some research? Did you 
> ask people on the street, or is there some data you are 
> basing this on?

I think most people will, yes.  Have you evidence to the contrary? I have
softened the wording, nontheless.

Received on Friday, 7 July 2006 11:36:34 UTC