- From: Gannon Dick <gannon_dick@yahoo.com>
- Date: Fri, 13 Mar 2015 12:17:48 -0700
- To: public-html-comments@w3.org, Andrea Rendine <master.skywalker.88@gmail.com>
- Cc: iso639@dkuug.dk
Hi Andrea, I wouldn't want a flash mob showing up at you door with pitchforks and torches (the American kind that burn). Linguists and Librarians in mobs can be dangerous. Doubly so when they grouse mob-like with big words at great length because of their professional training. The US Library of Congress is the registration authority for the ISO 639-[X] Language codes [1]. ISO 639-1 Terminology Languages has no default (Not a Language, like NaN (Not a Number) for numbers) ISO 639-2 Bibliographic Languages has a default, the set is generally understood to be of human or historical human origin. The code is identical to the ISO 639-3. ISO 639-3 has the default you seek, with the registration requirements wiggle room you need: zxx No linguistic content No linguistic content zxx Not applicable Not applicable So, to avoid flash mob disruptions, especially on Beer Friday, it is wise to use the proper ISO 639-3 codes. Depending upon the HTML schema version location you specify, this could be a validation anomaly since some schema call for a 2 Alpha string and a 3 Lower Case Alpha will fail validation. There is no ISO 639-1 2 Alpha string semantically correct. Be safe & Cheers, --Gannon [1] http://www.loc.gov/standards/iso639-2/iso639jac.html -------------------------------------------- On Fri, 3/13/15, Andrea Rendine <master.skywalker.88@gmail.com> wrote: Subject: <code> element and scripting languages To: public-html-comments@w3.org Date: Friday, March 13, 2015, 1:19 PM I came up the idea I am going to write after reading these lines: "There is no formal way to indicate the language of computer code being marked up. Authors who wish to mark code elements with the language used, e.g. so that syntax highlighting scripts can use the right rules, can use the class attribute, e.g. by adding a class prefixed with "language-" to the element. (http://www.w3.org/html/wg/drafts/html/master/semantics.html#the-code-element)" I don't think this is the best way to recognize code snippets. @class attribute is not meant to convey any semantic meaning. On the other hand, I had a funny experience some days ago while looking at an automated translation of a page in my language. This page contained PHP and JS code snippets, as well as a native scripting language. This means that it was full of control expressions such as "if ... else", "while", "function", "print" and so on.As you can easily imagine, these words in the snippet had been translated, thus making the snippets themselves useless. So I thought: @lang could be used for this purpose on code-snippet elements and generally speaking in HTML documents.Why @lang? Well, I took this idea from seeing WHATWG HTML spec, which is written in a language denoted by lang="en-GB-x-hixie" and I thought to an extension of this concept. Actually, it would be a compact way to declare 2 things:1. apart from strings and comments, the core of the related element is NOT the same "language" as the text and it is not meant to be translated. It doesn't stretch the meaning of the "lang" concept: first off, it's always a matter of language e.g. in non-English pages because control expressions are generally in English (or in a natural language which must not be translated, anyway), then it contains contraptions and abstract terms which define it as a real "language" which is different from the plain text.2. the element is to be identified according to its programming language, e.g. for highlighting syntax. As a side note, there's a CSS selector based on lang attributes, and jQuery-based highlighting plugins, as well as any other library based on CSS selectors, would benefit from this. When I thought about this, I didn't think about a change in the BCP47 spec, which I consider out of range. Instead, I looked at that specification.It leaves room for partial customization through the use of "private use subtags" in the form of a string consisting of "x-" followed by up to 8 alphabetic characters. This subtag can either follow a primary/regional language tag, or be present as stand-alone.Private use subtags are "private" by definition, and they are meant to be used in limited groups according to agreements specific for these groups. But nobody would prevent HTML community to build such an "agreement" in the spec, so that a series of "private use subtags" such as "x-perl" or "x-php" can be used by Web authors (an agreement would be necessary for language names such as Javascript or C++, because either too long or containing non-alphabetic characters).This means that a snippet in the form <code lang="x-php">, for example, would be both easy to understand, easy to target for syntax highlight extensions, and able to tell its content apart from parent elements defining a language for the whole document. If, on the other side, in the snippet there are strings or comments in a natural language is to be considered, something like <code lang="en-US-x-php"> could be used. In "public-html" mailing list I received suggestions such as using translate="no" in order to prevent automated translation; and separately create private use attributes such as data-code-lang, or propose new attributes like programming-lang, to express the programming language. I should add lang="" however, because as said above, it is difficult to consider a code snippet like a paragraph in natural language.The different proposals have something really good but they're partial - they only focus on preventing translation or programming language indication. Maybe there's a way to achieve both.Please tell me what you think about it.Thanks.
Received on Friday, 13 March 2015 19:18:16 UTC