W3C home > Mailing lists > Public > www-international@w3.org > April to June 2007

Re: For review: Tagging text with no language

From: CE Whitehead <cewcathar@hotmail.com>
Date: Fri, 13 Apr 2007 12:52:48 -0400
Message-ID: <BAY114-F18FE22AEE4CB978328E54AB35D0@phx.gbl>
To: nobody@xyzzy.claranet.de, www-international@w3.org

Hi, I personally wish that programming languages and their so-called absence 
of language could get some kind of annotation but maybe annotation of 
programming languages as other than zxx is not appropriate for this 
particular article.

Some more notes on this are below.


--C. E. Whitehead

>Jon Hanna wrote:
> >> Tagging source code snippets as "zxx" would be barbaric.
> > Source code snippets often do contain no linguistic data.
> > However the following is in English:
> > alert("This is English");//This comment is English too.
>Maybe.  It also depends on what you're doing.  If your task
>is "translation" then...
>   alarm("Das is deutsch");//Dieser Kommentar ist auch deutsch

Unless you translate the script into German and create a compiler for it.
Which would be interesting;
since really it has to be admitted that words like

etc. are English words.  Even if this is all to be tagged as zxx .

I do sort of think that ultimately that the language used in programming 
languages maybe could be an internationalization issue, whether it is or is 
not right now.

(for example you could have
alerte/attention I'm not sure which

and so forth--or whatever the French want for these words,

& you could put these in Greek or Swahili too;
would you still tag it all zxx ?

When  there is enough demand in other countries from people who aren't 
comfortable with English  who want to learn programming, who knows?

In the meantime, people who are not comfortable with English are out of luck 
to some degree [if you want to learn programming, you must learn English 
then programming, or English through programming, according to your personal 
tastes --
but this is getting a bit far from the internationalization of content and 
touching on the internationalization of content creation ]

Here are copies of what I found in the recent discussion on this in case 
anyone needs a quick review; there may be more in earlier discussion; I have 
not followed it all; sorry:

Frank Ellermann <nobody@xyzzy.claranet.de>:
>Tagging source code snippets as "zxx" would be barbaric.  But it's a
>case where "" is clearly better than "und".  Actually I think "" is
>always better than "und" unless I intend to flag something for later
>review.  In the context of Richard's article and XML documents, for
>other purposes it might be different.  The use of "und" in XHTML 1.0
>is IMO only a temporary kludge until the DTD is fixed.

Peter Constable <petercon@microsoft.com>:
>ISO 639 indicates that programming languages are out of scope. I interpret 
>that to mean that no > programming language or group of programming 
>languages is positively represented – i.e. none of > the entities 
>represented (either individually or as part of a group) is a programming 
>language. I interpret “zxx” to mean “the content so tagged is not any 
>instance of the kind of entities encompassed by this coding standard”. That 
>would entail that “zxx” could appropriately be applied > to content that is 
>in a programming language with as much appropriateness as applying it to a 
>part > number or random text or an empty file or telemetry from a space 
>probe: you may or may not be > able to interpret the content, but you 
>certainly cannot interpret it in terms of any human language.

John Cowan  cowan@ccil.org   http://ccil.org/~cowan:
>BCP 47 explicitly excludes computer languages from its scope, as do the
>ISO 639 family of standards.  So "zxx" is the only available tag.

Stephen Deach <sdeach@adobe.com>:
>One can have a separate debate over whether "zxx" or "art" should be used 
>for computer-programming languages, or whether computer-programming (as a 
>group or individually) > deserve their own tag(s); but that is not an 
>"Internationalization" issue.
>...is likely no working script anymore.  Or in the case of
>ABNF in an Internet-Draft a translator better stays away from
>it, including comments.  I'd use xml:lang="" or maybe -- if I
>could take advantage of it with XSLT -- xml:lang="i-default".
>Theoretical example, in practice I care only about the ASCII
>output at the moment, meta-data details are a waste of time
>for this output style.  But it could interest translators, if
>they get hold of the XML source.
> > I still don't buy that "" != "und".
>I think it depends on the context.  In a context where either
>"und" or "" are allowed they are semantically identical to
>their syntactically invalid counterpart.
>In a context where both are allowed you're free to assign a
>meaning to "und" slightly different from "" (roughly RESET).
> > If "" != "und" then the RFC is buggy, since it clearly
> > requires that the latter not be used if a protocol permits
> > the former - a decision which only makes sense if they are
> > equivalent.
>I think SHOULD NOT means that you need a very good excuse for
>using it anyway.  One good excuse would be the NMTOKEN in the
>XHTML 1.0 DTDs, that's explicitly listed as MAY in the RFC.
>Another excuse could be "I need it for some CSS magic", but
>the RFC doesn't say that this is a _good_ excuse.  Obviously
>a good excuse are old documents written before the NMTOKEN
>was replaced by a CDATA in some version of XML.  Or any old
>tool designed to produce old XML.
>It's "only" a SHOULD NOT, so consumers are not permitted to
>crash and burn if somebody (ab)uses "und".  With a MUST NOT
>I'd agree that "und" would be always wrong, and "abort with
>error" would be a perfect implementation for a consumer.

MSN is giving away a trip to Vegas to see Elton John.  Enter to win today. 
Received on Friday, 13 April 2007 16:52:53 UTC

This archive was generated by hypermail 2.3.1 : Wednesday, 21 September 2016 22:37:28 UTC