W3C home > Mailing lists > Public > public-html-comments@w3.org > March 2015

Fwd: <code> element and scripting languages

From: Andrea Rendine <master.skywalker.88@gmail.com>
Date: Fri, 13 Mar 2015 23:18:57 +0100
Message-ID: <CAGxST9mQGKOyQbXoxUCsFUD1B4kiWFasWyc1ZD_8o8kZvZ1RmQ@mail.gmail.com>
To: public-html-comments@w3.org
Gannon, please, before Librarians Flash Mob clashes against my door and
whatever is inside (!), please tell me in layman's terms what the stuff
about ISO 639-1 and -2 can say in order to prevent what I have in mind.

With my proposal I referenced BCP Best Current Practices 47, "Tags for
Identifying Languages", as specified by the document
https://tools.ietf.org/html/bcp47 and referenced by all modern HTML
specifications:
*"The lang attribute (in no namespace) specifies the primary language for
the element's contents and for any of the element's attributes that contain
text. Its value must be a valid BCP 47 language tag, or the empty string.
Setting the attribute to the empty string indicates that the primary
language is unknown. [BCP47]"*

Now, this is what BCP 47 says about Private use subtags:
"Private use subtags are used to indicate distinctions in language that are
important in a given context by private agreement. The following rules
apply to private use subtags:

   1.  Private use subtags are separated from the other subtags defined in
this document by the reserved single-character subtag 'x'.

   2.  Private use subtags MUST conform to the format and content
constraints defined in the ABNF for all subtags; that is, they MUST consist
solely of letters and digits and not exceed eight characters in length.

   3.  Private use subtags MUST follow all primary language, extended
language, script, region, variant, and extension subtags in the tag.
Another way of saying this is that all subtags following the singleton 'x'
MUST be considered private use.  Example: The subtag 'US' in the tag
"en-x-US" is a private use subtag.

   4.  A tag MAY consist entirely of private use subtags.

   5.  No source is defined for private use subtags.  Use of private use
subtags is by private agreement only.

   6.  Private use subtags are NOT RECOMMENDED where alternatives exist or
for general interchange.  See Section 4.6 for more information on private
use subtag choice." (BCP 47, page 17, what a luck!)

The previous paragraph talks about extension subtags. It defines a
mechanism for extending language tags for use in various applications.
They are intended to identify information that is commonly used in
association with languages or language tags but that is not part of
language identification." I don't consider registering extension subtags
because they'd undergo a proper submission procedure that I have no
authority to propose, and also because extension subtags must follow a
primary language tag (which wouldn't be preferable in our case). Extension
subtags are identified by a singleton, which can be whatever letter,
EXCLUDED x- because x- identifies private use.

At page 3-4 in the same spec (BCP 47), the rules for the language tag
formation specify that a language tag may be constituted by a private use
subtag alone, identified with an "x" singleton and consisting of 1*("-"
(1*8alphanum)), which means, if I understand, that
x-php
x-alpha1
are valid private use subtags. As a language tag may consist of this alone,
they're also valid language tags.
x-c++
x-javascript
would NOT be valid: the first uses symbols other than the alphanumeric
range, the second is too long. Thus the need for an agreement.
As x- defines a private use, a string corresponding to an existing language
subtag is not to be interpreted in its original meaning, i.e.
x-US
is valid, but
en-x-US
has nothing to do with United States (unless its underlying agreement
decides so, but even in that case, it isn't the same than en-US).

So in short, in your opinion:would
lang="x-perl", lang="en-US-x-perl"
be valid? If not, why?
Received on Friday, 13 March 2015 22:19:24 UTC

This archive was generated by hypermail 2.3.1 : Friday, 13 March 2015 22:19:24 UTC