W3C home > Mailing lists > Public > www-international@w3.org > April to June 2007

Re: For review: Tagging text with no language

From: Frank Ellermann <nobody@xyzzy.claranet.de>
Date: Fri, 13 Apr 2007 16:07:33 +0200
To: www-international@w3.org
Message-ID: <461F8EA5.3079@xyzzy.claranet.de>
Cc: ltru@lists.ietf.org

Jon Hanna wrote:

>> Tagging source code snippets as "zxx" would be barbaric.
> Source code snippets often do contain no linguistic data.
> However the following is in English:

> alert("This is English");//This comment is English too.

Maybe.  It also depends on what you're doing.  If your task
is "translation" then...

  alarm("Das is deutsch");//Dieser Kommentar ist auch deutsch

...is likely no working script anymore.  Or in the case of
ABNF in an Internet-Draft a translator better stays away from
it, including comments.  I'd use xml:lang="" or maybe -- if I
could take advantage of it with XSLT -- xml:lang="i-default".

Theoretical example, in practice I care only about the ASCII
output at the moment, meta-data details are a waste of time
for this output style.  But it could interest translators, if
they get hold of the XML source.

> I still don't buy that "" != "und".

I think it depends on the context.  In a context where either
"und" or "" are allowed they are semantically identical to
their syntactically invalid counterpart.

In a context where both are allowed you're free to assign a
meaning to "und" slightly different from "" (roughly RESET).

> If "" != "und" then the RFC is buggy, since it clearly
> requires that the latter not be used if a protocol permits
> the former - a decision which only makes sense if they are
> equivalent.

I think SHOULD NOT means that you need a very good excuse for
using it anyway.  One good excuse would be the NMTOKEN in the
XHTML 1.0 DTDs, that's explicitly listed as MAY in the RFC.

Another excuse could be "I need it for some CSS magic", but
the RFC doesn't say that this is a _good_ excuse.  Obviously
a good excuse are old documents written before the NMTOKEN
was replaced by a CDATA in some version of XML.  Or any old
tool designed to produce old XML.

It's "only" a SHOULD NOT, so consumers are not permitted to
crash and burn if somebody (ab)uses "und".  With a MUST NOT
I'd agree that "und" would be always wrong, and "abort with
error" would be a perfect implementation for a consumer.

Frank
Received on Friday, 13 April 2007 14:15:11 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 2 June 2009 19:17:13 GMT