RE: [Ltru] Re: For review: Tagging text with no language

Peter wrote:

> Excluding operators and such, which clearly are not words of 
> English, the Microsoft C compiler recognizes this vocabulary:
> 
> #define #error #import #undef #elif #if #include
> #else #ifdef #line #endif #ifndef #pragma
> auto double int struct break else long switch
> case enum register typedef char extern return union
> const float short unsigned continue for signed void
> default goto sizeof volatile do if static while
> __asm dllimport2 __int8 naked2 __based1 __except
> __int16 __stdcall __cdecl __fastcall __int32
> thread2 __declspec __finally __int64 __try
> dllexport2 __inline __leave
> 
> Some of those may be borrowings from English, but this is not English.

Let's look at a real use case. Would you say this page is in "en" and 
"zxx" and that the sections of code have no linguistic value even though 
they are clearly intended to be read by humans and not machines? Or does 
context matter?

http://www.xml.com/pub/a/2000/11/29/schemas/part1.html?page=7

Interpreting "no linguistic content" as "not a human language, could be a 
programming language" could cause some problems. There may be a use case 
for programming languages to have their own tag if this is deemed 
appropriate for the 639 standards or IANA registry, and these languages 
are different than say, instrumental music in the Library of Congress or a 
sound effects track in a film (both zxx, I'd say). 

I think programming languages have specific identification and parsing 
needs and as such need to be treated differently. The code in the article 
above should be rendered in Braille, for example, so it must be parsed. 
This makes it different from non-linguistic content.

Are we sure we want to lump programming languages in with the "zxx" 
semantic? More and more people use programming language terms in their 
everyday speech.

How would you classify the page I cite?

Regards,

Karen Broome

Received on Monday, 16 April 2007 18:57:17 UTC