W3C home > Mailing lists > Public > www-international@w3.org > April to June 2007

RE: [Ltru] Re: For review: Tagging text with no language

From: CE Whitehead <cewcathar@hotmail.com>
Date: Tue, 17 Apr 2007 14:59:39 -0400
Message-ID: <BAY114-F3603DB583E6D879D524639B3510@phx.gbl>
To: petercon@microsoft.com, ltru@lists.ietf.org, www-international@w3.org

>From: Karen_Broome@spe.sony.com [mailto:Karen_Broome@spe.sony.com]
> > Let's look at a real use case. Would you say this page
> > is in "en" and "zxx" and that the sections of code
> > have no linguistic value even though they are clearly
> > intended to be read by humans and not machines? Or does
> > context matter?

Such as comments in programming code.

I do personally happen to think that maybe there is a use for a variant of 
zxx that indicates it is a programming language (not a math formula or 
and then there is a use to allow some additional identification of the 
language(s) involved
(clearly the basic code could be in one language, English, but user-defined 
functions might make use of French or German abbreviations and then the 
comments would actually be in French or German).

In the meantime,
one thing that might help for tagging code snippets in a page
is have an outer paragraph or whatever
(or "fr" or "de" or what have you, the language that is closest to the 
language in which the code is based)
with maybe a short header or something in the appropriate language

and just inside this a section pre which is where the code is
you could define the language of
as zxx

Just my two cents' worth.

--C. E. Whitehead

>You're asking to have one tag that covers an entire document even though 
>that document is mixed. What if I insert a quotation in Spanish in this 
>mail? ¿Que vamos a hacer? It's no different. If you must use a single tag 
>for the whole thing, then clearly this is predominantly in English. I don't 
>know what you'd do if it were closer to 50-50.
> > Interpreting "no linguistic content" as "not a human
> > language, could be a programming language" could cause
> > some problems. There may be a use case for programming
> > languages to have their own tag if this is deemed
> > appropriate for the 639 standards or IANA registry, and
> > these languages are different than say, instrumental
> > music in the Library of Congress or a sound effects
> > track in a film (both zxx, I'd say).
>Your argument is akin to someone saying that someone may want to code audio 
>in Unicode. ISO 639 has defined a scope, human languages. Programming 
>languages, electrical schematics, dance notation, bridge-hand notation, 
>math formulas and engineering drawings are all graphic content that can be 
>interpreted by humans. Some of these can be represented in text, but that 
>does not change the fact that they are not a form of the kind of things 
>coded by ISO 639, human languages.
> > I think programming languages have specific
> > identification and parsing needs and as such need to
> > be treated differently.
>As I suggested earlier, the scope defined by ISO 639 does not force RFC 
>4646bis to be limited to the same scope -- in fact, it cannot be. 
>("Language tags" already code things other than linguistic variety, written 
>form in particular.) So, if you want to propose variant subtags to 
>differentiate programming code from music notation, then I don't see why 
>that couldn't be done.
>But it would be out of scope for ISO 639 to code such a distinction, and it 
>would be a non-conforming re-interpretation to say that zxx does not apply 
>programming languages.
> > The code in the article above should be rendered in
> > Braille, for example, so it must be parsed. This makes
> > it different from non-linguistic content.
>You're confusing the language of content with the representation mode in 
>some communicative technology. English content in Braille is still English, 
>and so clearly different from zxx. That is not in any way comparable to 
>discussing code in a programming language.
> > How would you classify the page I cite?
>As I mentioned above, this question is no different than asking how to come 
>up with one tag for a page that contains content in both English and 
>Spanish. On a *practical* level, I would tag that article as en and ignore 
>the fact that it contains XML code snippets; but if someone was being 
>careful to tag elements within the document correctly, then the code 
>snippets should be tagged zxx. (That is, unless you want to register 
>variant subtags to differentiate between different kinds of non-linguistic 

One might want to do so.

Mortgage rates near historic lows. Refinance $200,000 loan for as low as 
Received on Tuesday, 17 April 2007 19:00:33 UTC

This archive was generated by hypermail 2.3.1 : Wednesday, 21 September 2016 22:37:28 UTC