Re: <code> element and scripting languages from Andrea Rendine on 2015-03-30 (public-html-comments@w3.org from March 2015)

From: Andrea Rendine <master.skywalker.88@gmail.com>
Date: Mon, 30 Mar 2015 13:59:27 +0200
To: Alexandre Louis Marc Morgaut <alexandre.morgaut@gmail.com>
Cc: "public-html-comments@w3.org" <public-html-comments@w3.org>, blink-dev@chromium.org, "w3c-wai-ig@w3.org" <w3c-wai-ig@w3.org>
Message-ID: <CAGxST9k5GOUS7iWr=5DVvDFfxNt0DYug+0+MQs7yVnLeUPknjg@mail.gmail.com>
I definitely love the idea of using extensions, or why not, one day even
native browser commands, to save snippets as files! This is a very good use
case in which a more strict language definition would be needed. In that
case, should the value of language notation be the extension or the name?
In the latter case, it'll be extensions/user agents responsibility to store
a list of programming languages associated with their common extensions.
In either case, relying on @class will mean that any extension dedicated to
this job will have to fetch the attribute value, examine the single token
and decide if any of it defines a language, possibly without causing
misunderstandings (what if there's another class matching a language?).
This would be be avoided by a complex syntax like "language-****", but even
in that case, for the sake of interoperability I guess there's the need for
a strong, separate language notation (and in this scenario @lang, my
original proposal, is utterly pointless...). The same for WAI-ARIA. Of
course a programming language has rendering issues. Of course it also has
accessibility requirements. But I guess there should be a native HTML
notation capable of serving more than one purpose.
I'll just end with a note trying to save poor ol' <pre>. Its use is
completely and definitely presentational, it's true. But there are cases
when a text is to be represented along with line breaks and spaces, and
this does not only matter with codes. For instance, think about futurist
poetry, ASCII art, reporting email texts. In all these cases, you either
have to add a parsing algorithm substituting line breaks with <br />
elements, or you keep those line breaks as they are and find a way to tell
UAs that those line breaks, spaces and so on have to be rendered because
they are "part of content" and therefore semantically relevant.
This is the role of an element or an attribute, then. You should either
have something like @pre boolean attribute (but it should be global), or
you use <pre>, occasionally *as container* for other semantic elements.
Also consider that one of my examples above, as you have surely noticed,
applies syntax highlight to <pre> rather than <code>. This is wrong on any
point of view, but think what would happen to hundreds of pages, were <pre>
no longer conforming.

2015-03-30 9:31 GMT+02:00 Alexandre Louis Marc Morgaut <
alexandre.morgaut@gmail.com>:

>
> I copy this answer the WAI and Chromium Blink-dev mailing list as I think
> they should probably be concerned
> In short for them the issue raised in this thread is about the fact the
> way to tell HTML which programming language is shown in <code> elements is
> probably too loosely defined, or at least to loosely respected in practice.
> And, as an extension, worst, <code> elements are sometime entirely omitted
> in favor of <pre> elements
>
>
> I agree with Andrea that the current situation is far from ideal with such
> widely inconsistent practices (@data-language=css , @class=language-css,
> @class=css, ...)
>
> It has an impact on how translation tools can handle such content but also
> on how speech accessibility tools can render it in audio version
>
> I'm less categoric on the fact that @class can't do the job
> It is true that while it is currently already encouraged, only very few
> syntax highlighter libs respect the "language-*" value pattern
>
> If we stay with @class, few actions might help:
> 1) Add at very least a W3C Note, referenced in this <code> element spec,
> explaining why the "language-*" should be respected
> 2) Add also, if not already existing, an accessibility related Note to the
> WA documentation and refer to it from the <code> element spec
> 3) Maybe see with Microformat if it could had such specification on its
> end as a "source","snippet",  or related, microformat
>
> Of course, as I said before, using @language could be an interesting
> alternative
> It may still benefit from a W3C Note and an Accessibility recommendation
>
> Another solution that I didn't mention, and neither saw mentioned yet,
> could be to go with an "aria-attribute"
> In which case such decision would not rely on the HTML working group by on
> the WAI ARIA one
>
> I may go a little out of topic regarding this thread, but while talking
> about the <code> element and mentioning the over use, when not abusive, of
> the <pre> element
> to render programming languages, I think it would make a lot of sense if
> <code> element could have these default CSS rules:
>
> CODE: {
>     white-space: pre;
>     font-family: Monospace;
> }
>
> I don't know any programming language that would suffer from that (tell me
> if there is some) and the <pre> element to me only exists for
> presentational purpose which have been widely pushed to the responsibility
> of CSS. The good thing is also that it maintain the possibility to use it
> either inline or as a block depending on inner break-lines.
> As it has been mentioned in this thread, we reached a point to which the
> <pre> element has also been used as a replacement of the <code> element
> which is a big problem when coming back
> to good semantic support of such content by tools (accessibility,
> translation, automation, ...)
>
>
> I'll end with another automation use case, to highlight importance of
> <code> over <pre> & right way to define the programming language
>
> I can easily imagine useful chrome / Firefox extensions that would parse
> pages showing snippets from <code> elements, and save all of them in files
> which extensions would be determined based on the "programming language"
> information, and the name either from:
> - the @id of the <code> element
> - or its label (via <label> or @aria-labeledBy)
> - or the best related <Hn> defined title (potentially with a suffix number
> if shared by several <code> elements)
> - and/or a generic name (like "script", "style", "source") followed by a
> suffix number if many
> potentially of grouped in a zip archive
>
> I'm pretty sure some other developers would love that too
>
>
> Alexandre Morgaut
> http://about.me/amorgaut
>
> On Mar 27, 2015, at 2:05 PM, Andrea Rendine <master.skywalker.88@gmail.com>
> wrote:
>
> Martin, IDK about plang but I guess Michael Pieters weren't serious when
> suggesting that (was he?)
> However, language is well-known to authors as I said before. There was a
> non-canonical (and pretty useless) habit of specifying @language on
> <script> elements, with "javascript" +  the intended version, in order to
> hide higher version JS scripts to legacy user agents. I.e.
> <script type="text/javascript" language="JavaScript1.3"> would have masked
> this script to UAs whose compatibility wasn't above JS 1.2.
> So I guess that nobody would use @language for "en-US". Also consider that
> the spec uses class="language-python" as an example of programming language
> markup.
>
> About why the way things are now is not good (in my opinion?) Look:
> Prism.js => uses class="language-****" as per spec suggestion
> SyntaxHighlighter 3.0 (by Alex Gorbatchev) => class="brush: js" (and I'll
> spare you the other parameters in the example for your sanity) (also, it
> applies to <pre> but not to <code> which is pointless on a strictly
> semantical POV)
> SyntaxHighlighter Evolved (WP plugin, so completely different approach,
> but still worth mentioning) => "wrap your code in [language], such as
> [php]code" (as per documentation). Outputs a series of
> <code class="htmlscript plain"> elements (its use of <code> is impressively
> wrong...)
> highlightjs.org => class="html"
> Do you notice any consistency? I'm not speaking about authors changing
> highlighter, though it'd be worth considering. I also talk about semantic
> value. A feature that UAs can look at and know, consistently and
> interoperably, what kind of script we are talking about?
>
>
>
Received on Monday, 30 March 2015 12:00:00 UTC