Re: For review: Tagging text with no language

Just for the record, the history of the zxx semantic is as follows.

I needed a tag to indicate that the audio track of a film was "silent" 
while the subtitles of the film were in English, French, etc. In this 
scenario, the audio track has "no linguistic content" though the title 
cards that describe the words the silent film actors are supposed to be 
reading have their own language code (these films were localized so the 
audio track always had "no linguistic content" while the title cards may 
have been in French, German, etc.) -- at least this was Peter Constable's 
interpretation and after this discussion, he applied for the tag. 

However, at the time, I disagreed with Peter and felt it was more useful 
in my industry to define a tag for our needs using the "Q" space of ISO 
639-2. "Silent" is not the same thing as "no linguistic content" and has 
more meaning in my industry. This industry-specific tag does not belong in 
the ISO standard, so I did not make an application there. In my mind, a 
film with "no linguistic content" would be a film such as "Koyaanisqatsi" 
where there may be chants and songs, but there is no linguistic narrative 
advancing anything resembling your average Hollywood plot. 

I think that perhaps the semantic of this tag has changed slightly since 
its inception so I'm sending this refresher on the history of the tag, for 
what it's worth.

Regards,

Karen Broome
Metadata Systems Designer
Sony Pictures Entertainment
310.244.4384

www-international-request@w3.org wrote on 04/11/2007 01:24:27 PM:

> 
> Mark Davis scripsit:
> 
> > I believe that that is adding an interpretation to "und" which is not
> > borne out by either the source standards, nor in common usage.
> 
> ISO 639-2 says merely "Undetermined", but this is placed in a column
> labeled "English name of language", so I think it's fair to read it
> as "Undetermined language".  But ISO 639-3 is, I think, definitive.
> http://www.sil.org/iso639-3/scope.asp#S says (in part):
> 
>    The identifier [und] (undetermined) is provided for those
>    situations in which a language or languages must be indicated
>    but the *language* cannot be identified [emphasis added].
> 
> By contrast, "zxx" is explained in the next sentence thus:
> 
>    The identifier [zxx] (no linguistic content) may be applied in a
>    situation in which a language identifier is required by system
>    definition, but the item being described does not actually
>    contain linguistic content.
> 
> In any case, the document I'm commenting on says that "zxx" is
> non-linguistic content, and that "und" and "" are synonymous and
> represent linguistic content.  Whatever "und" may or may not mean,
> I think there's no doubt that "" can be applied to both linguistic
> and non-linguistic content.
> 
> -- 
> You escaped them by the will-death              John Cowan
> and the Way of the Black Wheel.                 cowan@ccil.org
> I could not.  --Great-Souled Sam http://www.ccil.org/~cowan
> 
> 

Received on Wednesday, 11 April 2007 21:13:30 UTC