My point was intended to be that this started out as a debate on the
interpretation of "". I don't see that any existing tag other
than "und" makes any sense as an alternate interpretation of
"", so why the long debate and the discussion of the various
re-translations of what "und" means in Danish, Swedish, German,
Italian, et al. (At the nuance level of this discussion, such
translations are ALWAYS imperfect.)
I have no objection to your interpretations of what happens if
someone explicitly says "mis", "mul", etc.
I woudl note that "und" makes no statement of whether
the content is a single or several languages, only that any/all language
information is "not defined" for the specified scope.
If the user says nothing or says xml:lang="", it can't mean
anything other than "undefined" (for which the tag is
"und") or "unspecified" (for which there is no tag;
regardless of whether the user intent was "I don't know" vs
"I don't care" vs "I don't want to guess" vs "I
don't want to say"; and regardless of if it is all a single language
or a mixture of several languages). The tag "und" means that no
input regarding the language is (languages are) provided. Why not say no
xml:lang specifier and xml:lang="" are both interpreted as
xml:lang="und" and be done with it.
It is quite clear that "zxx" means "I know this is not a
linguistic"; that "art" means it is "an artificial
language (invented or other non-natural language)"; that
"mul" means "there are a mixture of languages which I may
or may not choose to identify at a lower level in the document"; and
that "mis" means it is "a language but I have no better
identifier for it".
One can have a separate debate over whether "zxx" or
"art" should be used for computer-programming languages, or
whether computer-programming (as a group or individually) deserve their
own tag(s); but that is not an "Internationalization"
issue.
At 2007.04.12-18:15(-0700), Mark Davis wrote:
I think I agree with you in
spirit, but not in precise details. The tag "und" means
"undetermined", so when I encounter it I don't know whether the
content contains one language, many languages, or no language. The tag
"zxx" would mean that there is no language content,
"mis" would mean that there is at least some language content,
and "mul" would mean that there is language content, with more
than one language.
I think to try to consider what the motivations of the tagger are may
lead to misleading impressions. Assume for the moment that the tag is
correct. From the perspective of the tagger, using "und" could
mean, as you say, that the tagger doesn't know or care (or want to
communicate, or what to spend the time to determine) what the language is
or whether there is any language content there at all. There could be
quite a variety of motivations for the tagger's using "und";
the key is what the reader of "und" can assume about the
content, which is essentially nothing. With "mis", the
situation is similar, but slightly narrower. The tagger may still not
know much, or care much, but maybe cared enough to determine that there
was something there, or maybe there was language content there, but there
is no language code that correctly matches it (protogermanic, perhaps).
Similarly, using "may not" language is a bit too strong in your
phrase "Whereas zxx says I 'may not' apply any of those
language-based services because it is not a 'natural' language".
Having content tagged with "zxx" doesn't restrict me from doing
anything I want to; it just means if it was tagged correctly, it does not
contain any language content. (I might decide that the tagger was
mistaken -- when we at Google look at the tagging people actually do of
web content, there is a fairly high percentage of both invalid tags and
valid-but-incorrect tags.)
Mark
On 4/12/07, Stephen Deach
<sdeach@adobe.com>
wrote:
- I think much of this discussion is dealing with terminology
differences that are so narrow that one is discussing "the number of
angels who can dance on the head of a pin". (In other words we are
debating theology, not practice.) In reality, specifications are worded
as carefully as possible, but interpretation is open to the reader's most
common definition/redefinition/translation of the exact
terminology. -- So rather than debate what the "exact
meaning" of a word/phrase is in each of these languages, maybe we
should take a looser interpretation of what is written and then clarify
the intent.
- My reading of the ISO spec is that "und/undetermined" means
"I don't know (or care, or am unwilling to state) what the language
is (and have no closer alternative language identifier given the
available options)". From a practical viewpoint, "und"
indicates I can't assume any specific/preferred linguistic definitions
for words in the content, nor can I assume any specific/preferred
pronunciation-, spelling-, hyphenation-, and/or grammar-rules on the
content; though I am allowed to attempt my own linguistic analysis to
guess at the language. (Whereas zxx says I 'may not' apply any of those
language-based services because it is not a 'natural' language and should
not attempt any linguistic analysis to guess at the language.) I can't
see any practical difference between "und" and ""
(except that "" is disallowed in some processing environments)
so why can't the documents simply say that 'a missing specification' or
'xml:lang=""' (should either occur), will be interpreted as
"und".
- It has been a while since I considered myself fluent in Swedish (and
I intentionally ignored the lack of the dieresis in the original text as
an indication that the translations were "lossy"). I just
thought that some comment would force the necessary clarification of the
translations.
- At 2007.04.13-01:26(+0200), Kent Karlsson wrote:
- Stephen
Deach
wrote:
- >
sv.xml:
<language type="und">obestämt
språk</language>
I thought "obestamt" was
"unstated".
"Obestämt"
literally means "undetermined". "Unstated" would be
"osagt", "outtalat", or "ej angett"
("not given", closer to the current German
translation).
Though
I would agree that xml:lang="" is closer to
"unstated" than "undetermined". I'm not sure
that that nit-picking leads anywhere in this case. But
"unstated" is not the same as "undetermined";
it may well be determined, but just not stated... So maybe there is a
difference worth bothering about.
/kent k
---Steve Deach
sdeach@adobe.com
--
Mar
---Steve Deach
sdeach@adobe.com