Re: [Ltru] RE: For review: Tagging text with no language

The summary looks good. This discussion raises 2 items for the LTRU group.

Q1. What tag should be used where it is definitely a language, but there is
no code available yet? (This is an area where ISO 15924 is ahead of ISO 639
(and 3166), since it has Zzzz: Code for uncoded script.)

Q2. Clarify the wording around "und" vs "".

Mark

On 4/12/07, Richard Ishida <ishida@w3.org> wrote:
>
> Ok, I just found some additional emails between John and Mark that had
> been
> scooped into my LTRU folder by my mail client because [LTRU] was prepended
> to the subject. They shift the baseline for discussion and agreement.
>
> What I now see as the summary of where we are is:
>
> [a] Determined, and a language (for which a subtag exists): <subtag(s)>
>
> [b] Determined, and not a language: zxx
>
> [c] Determined, but not a language for which a subtag exists: ???
>
> [d] Undetermined, and not sure whether it is a language or not:
> xml:lang=""
> if available, otherwise und
>
> I have revised the article at [1] to make it clearer that whether text is
> linguistic or not is unimportant wrt use of '' and und.
>
>
> I'm still troubled however by the passage in RFC 4646, however, so I'll
> repeat those comments here and copy the LTRU folks:
>
> I'm also a little worried about the wording in section 4.1 of RFC 4646[2]
> about und, which quite clearly says that you shouldn't use und unless the
> *protocol* demands it, or sometimes when matching tags.  This doesn't make
> any distinction between specifying the language of a resource and turning
> off language declarations for a range of embedded text.  It seems that
> this
> suggests a way in which xml:lang='' and xml:lang="und" are not equivalent,
> since there are no such restrictions on xml:lang="". In my opinion, the
> text
> of RFC 4646 needs some work, both to relax the use of und in scenarios
> where
> 'undefined text' occurs in a context with a defined language, and to
> clarify
> the relationship of und to xml:lang=''.
>
> RI
>
> [1] http://www.w3.org/International/questions/qa-no-language#answer
>
> [2] http://www.rfc-editor.org/rfc/rfc4646.txt
>
> ============
> Richard Ishida
> Internationalization Lead
> W3C (World Wide Web Consortium)
>
> http://www.w3.org/People/Ishida/
> http://www.w3.org/International/
> http://people.w3.org/rishida/blog/
> http://www.flickr.com/photos/ishida/
>
>
>
> > -----Original Message-----
> > From: www-international-request@w3.org
> > [mailto:www-international-request@w3.org] On Behalf Of Richard Ishida
> > Sent: 12 April 2007 14:56
> > To: www-international@w3.org
> > Subject: RE: For review: Tagging text with no language
> >
> >
> > So it seems the alternatives that John is suggesting are:
> >
> > Determined, and a language (for which a subtag exists): <subtag(s)>
> > Determined, and not a language: zxx
> > Determined, but not a language for which a subtag exists: ???
> >
> > Undetermined, and not sure whether it is a language or not:
> > xml:lang="" (if
> > available)
> > Undetermined, but sure that it's a language: und
> >
> > The implications of this for X/HTML are that there is no way
> > to say that
> > text is undetermined if you are not sure whether it's a
> > language or not.
> >
> > This is very different from Jon Hanna's proposal at
> > http://lists.w3.org/Archives/Public/www-international/2007JanM
> ar/0178.html
> >
> > Can we please discuss this.  I'm particularly hoping for
> > contributions from
> > John, Jon, Mark, Martin and Addison (though he's on vacation
> > at the moment).
> >
> > For my part, having experienced, even when trying to write
> > this email, how
> > difficult it is to succinctly talk about the difference
> > between something
> > that is unidentified and may or may not be a language, I'm a
> > little leery
> > about accepting the evidence in the mail below, John.  Can we
> > be sure that
> > the people who drafted that text were conciously making the
> > distinction you
> > mention rather than just being a little imprecise in wording?
> >
> > I'm also a little worried about the wording in section 4.1 of
> > RFC 4646[1]
> > about und, which quite clearly says that you shouldn't use
> > und unless the
> > *protocol* demands it, or sometimes when matching tags.  This
> > doesn't make
> > any distinction between specifying the language of a resource
> > and turning
> > off language declarations for a range of embedded text.  It
> > seems that this
> > suggests another way in which xml:lang='' and xml:lang="und" are not
> > equivalent. In my opinion, either the text of RFC 4646 needs
> > some work,
> > either to relax the use of und in scenarios where undefined
> > text occurs in a
> > context that is defined, or to clarify the relationship of und to
> > xml:lang=''.
> >
> > RI
> >
> >
> > [1] http://www.rfc-editor.org/rfc/rfc4646.txt
> >
> > ============
> > Richard Ishida
> > Internationalization Lead
> > W3C (World Wide Web Consortium)
> >
> > http://www.w3.org/People/Ishida/
> > http://www.w3.org/International/
> > http://people.w3.org/rishida/blog/
> > http://www.flickr.com/photos/ishida/
> >
> >
> >
> > > -----Original Message-----
> > > From: www-international-request@w3.org
> > > [mailto:www-international-request@w3.org] On Behalf Of John Cowan
> > > Sent: 11 April 2007 21:24
> > > To: Mark Davis
> > > Cc: John Cowan; CE Whitehead; www-international@w3.org
> > > Subject: Re: For review: Tagging text with no language
> > >
> > >
> > > Mark Davis scripsit:
> > >
> > > > I believe that that is adding an interpretation to "und"
> > > which is not
> > > > borne out by either the source standards, nor in common usage.
> > >
> > > ISO 639-2 says merely "Undetermined", but this is placed in a
> > > column labeled "English name of language", so I think it's
> > > fair to read it as "Undetermined language".  But ISO 639-3
> > > is, I think, definitive.
> > > http://www.sil.org/iso639-3/scope.asp#S says (in part):
> > >
> > >     The identifier [und] (undetermined) is provided for those
> > >     situations in which a language or languages must be indicated
> > >     but the *language* cannot be identified [emphasis added].
> > >
> > > By contrast, "zxx" is explained in the next sentence thus:
> > >
> > >     The identifier [zxx] (no linguistic content) may be applied in a
> > >     situation in which a language identifier is required by system
> > >     definition, but the item being described does not actually
> > >     contain linguistic content.
> > >
> > > In any case, the document I'm commenting on says that "zxx"
> > > is non-linguistic content, and that "und" and "" are
> > > synonymous and represent linguistic content.  Whatever "und"
> > > may or may not mean, I think there's no doubt that "" can be
> > > applied to both linguistic and non-linguistic content.
> > >
> > > --
> > > You escaped them by the will-death              John Cowan
> > > and the Way of the Black Wheel.                 cowan@ccil.org
> > > I could not.  --Great-Souled Sam
> > > http://www.ccil.org/~cowan
> > >
> >
> >
>
>
> _______________________________________________
> Ltru mailing list
> Ltru@ietf.org
> https://www1.ietf.org/mailman/listinfo/ltru
>



-- 
Mark

Received on Thursday, 12 April 2007 16:13:16 UTC