Re: <code> element and scripting languages from Andrea Rendine on 2015-03-16 (public-html-comments@w3.org from March 2015)

From: Andrea Rendine <master.skywalker.88@gmail.com>
Date: Mon, 16 Mar 2015 23:52:49 +0100
To: "public-html-comments@w3.org" <public-html-comments@w3.org>
Message-ID: <CAGxST9=xmXQWtuMhhX2y8f0fc0=36igcZQvE0uAQnmOo9GFy5w@mail.gmail.com>
Hi Gannon, glad to receive your reply!
Now I understand better what you meant with your mail from some years ago
(before the weekend if I'm correct ;) )
But still, does it affect private use subtags? I mean, aren't those rules
the ones which apply on 2- and 3-letters primary language subtag?
I know that I can't propose a language like, for example, cod-PHP as if it
were en-US, because such use is incorrect. But what about x-singleton
subtags? Would x-php be in conflict with the spec? or en-US-x-php? For what
I see, some sites use such syntax. If you look at source code of html spec,
you'll see that the language is... en-US-x-hixie. "x-hixie" is a private
use subtag and it works on a private agreement. BCP-47 says that the
language could also be just x-hixie.
Does this bring to validity issues? Would x-jscript or x-css or x-html be
incorrect? Note that I mean as I write. "x" singleton (which cannot be used
in extension subtags) followed by hyphen, followed by 1-8 letters or digits.

2015-03-16 19:46 GMT+01:00 Gannon Dick <gannon_dick@yahoo.com>:

> Hi Andrea,
>
> In addition to the properties of the @lang attribute mentioned below there
> is another, and it overshadows the rest ... Web Specifications are "for the
> tourists".  This is why EUROPA.EU has 24 languages, VATICAN.VA has 10 and
> why "Americans" speak 108 languages @Home, last time I counted the US
> Census report.
>
> The @lang attribute is appropriate to use in conjunction with the <code>
> element, IMHO, as long as the effect of a partisan implementer audience is
> neutralized.  This is especially so with semantic data.  This problem is
> 1000 years old - Iceland is in fact energy self-sufficient and Greenland,
> well let's just say Erik the Red earned his place in the PageRank Hall of
> Fame.
>
> ISO Standards have a different outlook with respect to the Open World
> Assumption - they would like it to work properly sometime before the
> tourists realize they have been played.  To do this they must account for
> seasonal pairity shifts in time.  It is a matter of intellectual integrity
> for the specifications, if I didn't just answer my own question about the
> evolution of the WWW and GPS Navigation.
>
> The ISO language specification is by type:
> 1) Terminology 2 character codes "for the tourists".
> 2) Bibliographic 3 character codes for scholarship without Confirmation
> Bias or suggesting a BCP/best answer.
> 3) Miscellaneous, to discourage "obvious" conclusions about a finite set
> of language labels.  In the case of "zxx" the code does mean "not a human
> language". To be frank, it means to gadgets (user agents) that "they" have
> not located and identified intelligent life (language tool users).  The web
> of things has not passed its Turing Test yet, glowing reports of imminent
> success notwithstanding.
>
> --Gannon
>
>
>
>
> --------------------------------------------
> On Sun, 3/15/15, Stuart Wakefield <me@stuartwakefield.co.uk> wrote:
>
>  Subject: Re: <code> element and scripting languages
>  To: "Andrea Rendine" <master.skywalker.88@gmail.com>
>  Cc: "public-html-comments@w3.org" <public-html-comments@w3.org>
>  Date: Sunday, March 15, 2015, 3:33 AM
>
>  Hi
>  Andrea,
>  Using the lang
>  attribute to identify the programming language within an
>  elements text content does seem appropriate.
>  I couldn't find specific
>  guidance in current recommendations, I do note that HTML
>  4.01 recommendations did have the following guidance in
>  section 8.1.1:"The lang attribute's
>  value is a language code that identifies a natural language
>  spoken, written, or otherwise used for the communication of
>  information among people. Computer languages are explicitly
>  excluded from language codes."It is unclear whether the HTML 5
> recommendations
>  drop this guidance on purpose. The HTML 4.01
>  guidance on the lang attribute does seem much clearer in
>  general in usage and intent than the corresponding advice in
>  the HTML5 recommendation:"Language
>  information specified via the lang attribute may be
>  used by a user agent to control rendering in a variety of
>  ways. Some situations where author-supplied language
>  information may be helpful include:Assisting search enginesAssisting
> speech synthesizersHelping a user agent select glyph variants for high
>  quality typographyHelping a
>  user agent choose a set of quotation
>  marksHelping a user agent
>  make decisions about hyphenation,
>  ligatures, and spacingAssisting spell checkers and grammar
>  checkers"All would seem to
>  be appropriate for this use case.
>  How would be appropriate to handle,
>  for example, comments in a natural language within a section
>  marked up as a machine readable language?
>  It would be useful to know, what the
>  initial reasons were for HTML 4.01 authors to discount
>  computer / machine readable languages from the original
>  recommendation and whether the HTML5 recommendation omits
>  this advice intentionally.
>  In the interim my advice would be to
>  use translate no and data attributes, to the best of my
>  knowledge this is the most widely accepted way of achieving
>  this despite it lacking in supplying useful meta about the
>  content in a meaningful way.
>  The improvements you've
>  suggested would, given its acceptance, certainly increase
>  the semantic richness of this type of content over that
>  approach.
>  Stuart
>  On 13 Mar
>  2015, at 18:19, Andrea Rendine <master.skywalker.88@gmail.com>
>  wrote:
>
>  I came up the idea I
>  am going to write after reading these lines:
>  "There is no
>  formal way to indicate the language of computer code being
>  marked up. Authors who wish to
>  mark code elements with the language used, e.g. so that
>  syntax highlighting scripts can use the right rules, can use
>  the class attribute, e.g. by adding a class prefixed with
>  "language-" to the element. (
> http://www.w3.org/html/wg/drafts/html/master/semantics.html#the-code-element
> )"
>  I don't think this
>  is the best way to recognize code snippets. @class attribute
>  is not meant to convey any semantic meaning.
>  On the other hand, I
>  had a funny experience some days ago while looking at an
>  automated translation of a page in my language. This page
>  contained PHP and JS code snippets, as well as a native
>  scripting language. This means that it was full of control
>  expressions such as "if ... else",
>  "while", "function", "print"
>  and so on.As you can easily
>  imagine, these words in the snippet had been translated,
>  thus making the snippets themselves useless.
>  So I thought: @lang
>  could be used for this purpose on code-snippet elements and
>  generally speaking in HTML documents.Why @lang? Well, I
>  took this idea from seeing WHATWG HTML spec, which is
>  written in a language denoted by
>  lang="en-GB-x-hixie" and I thought to an extension
>  of this concept. Actually, it would be a compact way to
>  declare 2 things:1. apart from strings
>  and comments, the core of the related element is NOT the
>  same "language" as the text and it is not meant to
>  be translated. It doesn't stretch the meaning of the
>  "lang" concept: first off, it's always a
>  matter of language e.g. in non-English pages because control
>  expressions are generally in English (or in a natural
>  language which must not be translated, anyway), then it
>  contains contraptions and abstract terms which define it as
>  a real "language" which is different from the
>  plain text.2. the element is to
>  be identified according to its programming language, e.g.
>  for highlighting syntax. As a side note, there's a CSS
>  selector based on lang attributes, and jQuery-based
>  highlighting plugins, as well as any other library based on
>  CSS selectors, would benefit from this.
>  When I thought about
>  this, I didn't think about a change in the BCP47 spec,
>  which I consider out of range. Instead, I looked at that
>  specification.It leaves room for
>  partial customization through the use of "private use
>  subtags" in the form of a string consisting of
>  "x-" followed by up to 8 alphabetic characters.
>  This subtag can either follow a primary/regional language
>  tag, or be present as stand-alone.Private use subtags
>  are "private" by definition, and they are meant to
>  be used in limited groups according to agreements specific
>  for these groups. But nobody would prevent HTML community to
>  build such an "agreement" in the spec, so that a
>  series of "private use subtags" such as
>  "x-perl" or "x-php" can be used by Web
>  authors (an agreement would be necessary for language names
>  such as Javascript or C++, because either too long or
>  containing non-alphabetic characters).This means that a
>  snippet in the form <code lang="x-php">, for
>  example, would be both easy to understand, easy to target
>  for syntax highlight extensions, and able to tell its
>  content apart from parent elements defining a language for
>  the whole document. If, on the other side, in the snippet
>  there are strings or comments in a natural language is to be
>  considered, something like <code
>  lang="en-US-x-php"> could be used.
>  In
>  "public-html" mailing list I received suggestions
>  such as using translate="no" in order to prevent
>  automated translation; and separately create private use
>  attributes such as data-code-lang, or propose new attributes
>  like programming-lang, to express the programming language.
>  I should add lang="" however, because as said
>  above, it is difficult to consider a code snippet like a
>  paragraph in natural language.The different
>  proposals have something really good but they're partial
>  - they only focus on preventing translation or programming
>  language indication. Maybe there's a way to achieve
>  both.Please
>  tell me what you think about it.Thanks.
>
>
Received on Monday, 16 March 2015 22:53:17 UTC