- From: Andrea Rendine <master.skywalker.88@gmail.com>
- Date: Mon, 16 Mar 2015 23:52:49 +0100
- To: "public-html-comments@w3.org" <public-html-comments@w3.org>
- Message-ID: <CAGxST9=xmXQWtuMhhX2y8f0fc0=36igcZQvE0uAQnmOo9GFy5w@mail.gmail.com>
Hi Gannon, glad to receive your reply! Now I understand better what you meant with your mail from some years ago (before the weekend if I'm correct ;) ) But still, does it affect private use subtags? I mean, aren't those rules the ones which apply on 2- and 3-letters primary language subtag? I know that I can't propose a language like, for example, cod-PHP as if it were en-US, because such use is incorrect. But what about x-singleton subtags? Would x-php be in conflict with the spec? or en-US-x-php? For what I see, some sites use such syntax. If you look at source code of html spec, you'll see that the language is... en-US-x-hixie. "x-hixie" is a private use subtag and it works on a private agreement. BCP-47 says that the language could also be just x-hixie. Does this bring to validity issues? Would x-jscript or x-css or x-html be incorrect? Note that I mean as I write. "x" singleton (which cannot be used in extension subtags) followed by hyphen, followed by 1-8 letters or digits. 2015-03-16 19:46 GMT+01:00 Gannon Dick <gannon_dick@yahoo.com>: > Hi Andrea, > > In addition to the properties of the @lang attribute mentioned below there > is another, and it overshadows the rest ... Web Specifications are "for the > tourists". This is why EUROPA.EU has 24 languages, VATICAN.VA has 10 and > why "Americans" speak 108 languages @Home, last time I counted the US > Census report. > > The @lang attribute is appropriate to use in conjunction with the <code> > element, IMHO, as long as the effect of a partisan implementer audience is > neutralized. This is especially so with semantic data. This problem is > 1000 years old - Iceland is in fact energy self-sufficient and Greenland, > well let's just say Erik the Red earned his place in the PageRank Hall of > Fame. > > ISO Standards have a different outlook with respect to the Open World > Assumption - they would like it to work properly sometime before the > tourists realize they have been played. To do this they must account for > seasonal pairity shifts in time. It is a matter of intellectual integrity > for the specifications, if I didn't just answer my own question about the > evolution of the WWW and GPS Navigation. > > The ISO language specification is by type: > 1) Terminology 2 character codes "for the tourists". > 2) Bibliographic 3 character codes for scholarship without Confirmation > Bias or suggesting a BCP/best answer. > 3) Miscellaneous, to discourage "obvious" conclusions about a finite set > of language labels. In the case of "zxx" the code does mean "not a human > language". To be frank, it means to gadgets (user agents) that "they" have > not located and identified intelligent life (language tool users). The web > of things has not passed its Turing Test yet, glowing reports of imminent > success notwithstanding. > > --Gannon > > > > > -------------------------------------------- > On Sun, 3/15/15, Stuart Wakefield <me@stuartwakefield.co.uk> wrote: > > Subject: Re: <code> element and scripting languages > To: "Andrea Rendine" <master.skywalker.88@gmail.com> > Cc: "public-html-comments@w3.org" <public-html-comments@w3.org> > Date: Sunday, March 15, 2015, 3:33 AM > > Hi > Andrea, > Using the lang > attribute to identify the programming language within an > elements text content does seem appropriate. > I couldn't find specific > guidance in current recommendations, I do note that HTML > 4.01 recommendations did have the following guidance in > section 8.1.1:"The lang attribute's > value is a language code that identifies a natural language > spoken, written, or otherwise used for the communication of > information among people. Computer languages are explicitly > excluded from language codes."It is unclear whether the HTML 5 > recommendations > drop this guidance on purpose. The HTML 4.01 > guidance on the lang attribute does seem much clearer in > general in usage and intent than the corresponding advice in > the HTML5 recommendation:"Language > information specified via the lang attribute may be > used by a user agent to control rendering in a variety of > ways. Some situations where author-supplied language > information may be helpful include:Assisting search enginesAssisting > speech synthesizersHelping a user agent select glyph variants for high > quality typographyHelping a > user agent choose a set of quotation > marksHelping a user agent > make decisions about hyphenation, > ligatures, and spacingAssisting spell checkers and grammar > checkers"All would seem to > be appropriate for this use case. > How would be appropriate to handle, > for example, comments in a natural language within a section > marked up as a machine readable language? > It would be useful to know, what the > initial reasons were for HTML 4.01 authors to discount > computer / machine readable languages from the original > recommendation and whether the HTML5 recommendation omits > this advice intentionally. > In the interim my advice would be to > use translate no and data attributes, to the best of my > knowledge this is the most widely accepted way of achieving > this despite it lacking in supplying useful meta about the > content in a meaningful way. > The improvements you've > suggested would, given its acceptance, certainly increase > the semantic richness of this type of content over that > approach. > Stuart > On 13 Mar > 2015, at 18:19, Andrea Rendine <master.skywalker.88@gmail.com> > wrote: > > I came up the idea I > am going to write after reading these lines: > "There is no > formal way to indicate the language of computer code being > marked up. Authors who wish to > mark code elements with the language used, e.g. so that > syntax highlighting scripts can use the right rules, can use > the class attribute, e.g. by adding a class prefixed with > "language-" to the element. ( > http://www.w3.org/html/wg/drafts/html/master/semantics.html#the-code-element > )" > I don't think this > is the best way to recognize code snippets. @class attribute > is not meant to convey any semantic meaning. > On the other hand, I > had a funny experience some days ago while looking at an > automated translation of a page in my language. This page > contained PHP and JS code snippets, as well as a native > scripting language. This means that it was full of control > expressions such as "if ... else", > "while", "function", "print" > and so on.As you can easily > imagine, these words in the snippet had been translated, > thus making the snippets themselves useless. > So I thought: @lang > could be used for this purpose on code-snippet elements and > generally speaking in HTML documents.Why @lang? Well, I > took this idea from seeing WHATWG HTML spec, which is > written in a language denoted by > lang="en-GB-x-hixie" and I thought to an extension > of this concept. Actually, it would be a compact way to > declare 2 things:1. apart from strings > and comments, the core of the related element is NOT the > same "language" as the text and it is not meant to > be translated. It doesn't stretch the meaning of the > "lang" concept: first off, it's always a > matter of language e.g. in non-English pages because control > expressions are generally in English (or in a natural > language which must not be translated, anyway), then it > contains contraptions and abstract terms which define it as > a real "language" which is different from the > plain text.2. the element is to > be identified according to its programming language, e.g. > for highlighting syntax. As a side note, there's a CSS > selector based on lang attributes, and jQuery-based > highlighting plugins, as well as any other library based on > CSS selectors, would benefit from this. > When I thought about > this, I didn't think about a change in the BCP47 spec, > which I consider out of range. Instead, I looked at that > specification.It leaves room for > partial customization through the use of "private use > subtags" in the form of a string consisting of > "x-" followed by up to 8 alphabetic characters. > This subtag can either follow a primary/regional language > tag, or be present as stand-alone.Private use subtags > are "private" by definition, and they are meant to > be used in limited groups according to agreements specific > for these groups. But nobody would prevent HTML community to > build such an "agreement" in the spec, so that a > series of "private use subtags" such as > "x-perl" or "x-php" can be used by Web > authors (an agreement would be necessary for language names > such as Javascript or C++, because either too long or > containing non-alphabetic characters).This means that a > snippet in the form <code lang="x-php">, for > example, would be both easy to understand, easy to target > for syntax highlight extensions, and able to tell its > content apart from parent elements defining a language for > the whole document. If, on the other side, in the snippet > there are strings or comments in a natural language is to be > considered, something like <code > lang="en-US-x-php"> could be used. > In > "public-html" mailing list I received suggestions > such as using translate="no" in order to prevent > automated translation; and separately create private use > attributes such as data-code-lang, or propose new attributes > like programming-lang, to express the programming language. > I should add lang="" however, because as said > above, it is difficult to consider a code snippet like a > paragraph in natural language.The different > proposals have something really good but they're partial > - they only focus on preventing translation or programming > language indication. Maybe there's a way to achieve > both.Please > tell me what you think about it.Thanks. > >
Received on Monday, 16 March 2015 22:53:17 UTC