Re: [Selectors4] Semantic Pseudo Elements from Eduard Pascual on 2011-05-10 (www-style@w3.org from May 2011)

From: Eduard Pascual <herenvardo@gmail.com>
Date: Tue, 10 May 2011 22:01:21 +0200
To: Christoph Päper <christoph.paeper@crissov.de>
Cc: W3C style mailing list <www-style@w3.org>
Message-ID: <BANLkTi=juBy0ZBW0z3-P+1oiTA8C5Kx0wA@mail.gmail.com>
On Tue, May 10, 2011 at 7:11 PM, Christoph Päper
<christoph.paeper@crissov.de> wrote:
>
> Btw., to get this straight, Eduard Pascal is an alter ego of Tab Atkins Jr. (or vice versa)?
Nope, I'm not. This is not the first time Tab and me show very similar
ideas and views on web standards, so the confusion is understandable.

To clarify this: Tab has been a member of the W3C working group for
some years (before I had ever posted to a mailing list). I used to be
a quite active contributor to this list and the HTML5 public mailing
list (now quite less active due to schedule constraints); but I
haven't been formally a member of any working group.

As a matter of fact, I just deleted a half-written reply as I saw that
Tab had already replied saying almost everything I was going to say.

There is a point, however, I'd like to dig a bit deeper:

> But ‘em’ (and possibly ‘emphasis’ too) is an element name in and therefore selector for at least one other markup language. It is
> much cleaner to provide a generic mechanism.

Why would matter, when styling Markdown code, whether something is an
element on some other language?

Let's look at a clearer example: on programming languages, there has
never been any issue for "if" being a keyword on both C and Perl
(among plenty of others). Let me push that further with the "continue"
keyword: the meaning it has in Perl is entirely unrelated to what it
does in C and other C-ish languages, yet there is no conflict. This is
because a compiler knows pretty well whether it is compiling Perl, or
C, or whatever else.

This can also be applied to markup languages: when parsing Markdown,
for example, the parser couldn't care less about what elements exist
on HTML, or docbook, or any other markup language (you may care
*later* if you are actually converting it to HTML for rendering, but
by then parsing will be long done). If _sometext_ stands for emphasis,
then it's perfectly reasonable to describe it as an "emphasis node"
with the text "sometext". Let's take a more elaborate example:
_This text is underlined_, this text is not, but _this_one_is_.

== This may be a headline ==
== This is a headline, elsewhere
Yet another headline format
===========================
That could be parsed into a tree like this:
DOCUMENT
+- PARAGRAPH
   +- UNDERLINE
      +- "This text is underlined"
   +- ", this text is not, but"
   +- UNDERLINE
      +- "this_one_is"
   + "."
+- HEADLINE
   +- "This may be a headline"
+- HEADLINE
   +- "This is a headline, elsewhere"
+- HEADLINE
   +- "Yet another headline format"
(note that I could have labelled the nodes differently, such as "P",
"EM", "H", etc). Once the parser feeds that tree to a CSS processor,
the CSS doesn't care at all how the structure was represented on the
source markup. It has "element" nodes (with their names) and text
nodes nicely arranged on a tree structure. If you toss in a stylesheet
like this, things should just work:
PARAGRAPH { display: block; margin-bottom: 1em; }
HEADLINE { font: "Arial" large; }
UNDERLINE { text-decoration: underline; }

So you can actually style Markdown code with CSS. All you need is a
CSS processor and a Markdown parser that generates the tree and passes
it to the processor. Scripts that turn Markdown into HTML for
rendering are exactly that: parsers that generate a tree from the
source and pass it to the browser's CSS engine. The only issue they
have is that, in order to "inject" the tree into the browser's engine,
they serialize it into what the browser best understands, HTML. That
step quite bloats the process, but it's just a workaround to the fact
that they are relying on a third party CSS engine (the browser's), and
thus they need to do some marshalling to pass it the tree.
Keep in mind that the notation I have used (the "+-", the indentation,
and the quotes) is entirely arbitrary: I'm just trying to convey
through an e-mail something that would be represented as an in-memory
data structure.

Onto your other request, about highlighting source code. I'm making my
own example this time, so I can show off some more possibilities: look
at this C# statement:
foreach(byte b in mySocket.getDataStream())
    process(b);
This could be parsed (actually, not even parsed, tokenizing is enough)
into something like this:
KEYWORD (kind='flow')
   +- "foreach"
+- OPERATOR (pair='opener')
   +- "("
+- KEYWORD (kind='type-alias')
   +- "byte"
+- WHITESPACE
+- SYMBOL
   +- "b"
+- WHITESPACE
+- KEYWORD (kind='contextual')
   +- "in"
+- WHITESPACE
+- SYMBOL
   +- "mySocket"
+- OPERATOR
   +- "."
+- SYMBOL (framework-reference='http://msdn.microsoft.com/...')
   +- "getDataStream"
+- OPERATOR (pair='opener')
   +- "("
+- OPERATOR (pair='closer')
   +- ")"
+- OPERATOR (pair='closer')
   +- ")"
+- WHITESPACE
+- SYMBOL
   +- "process"
+- OPERATOR (pair='opener')
   +- "("
+- SYMBOL
   +- "b"
+- OPERATOR (pair='closer')
   +- ")"
+- STATEMENT-TERMINATOR

The "STATEMENT-TERMINATOR" could have been perfectly described as an
"OPERATOR", but it just serves as an example on how a some operators
could be special-cased if the need arises.
Now, you can go and put something like this together:
OPERATOR, STATEMENT-TERMINATOR { color: blue; }
KEYWORD { color: blue; font-weight: bold; }
KEYWORD[type='contextual'] { font-weight: normal; } /* Overrides the
declaration above for contextual keywords */
SYMBOL { color: #093; font-style: italic; }
...
There you go: css-based syntax highlighting.

I have used the example to also introduce some arbitrary keywords (let
me insist that the notation is arbitrary and irrelevant, we are just
describing in-memory data structures). Of course, a better parser
could create "PARENTHESIS-EXPRESSION" and "BLOCK" elements and make
the relevant stuff descendant from those, creating a more "tree-ish"
tree instead of something so lineal (so, for example, you could add
borders to block boxes from your CSS). However, I'm not in the mood of
defining a C# parser right now :P

At this point, I hope you can see that, once the parser has done its
job, CSS is perfectly capable of dealing with the rest. In fact, the
better the parser does its job, the more options you have later to
define your styles, because the CSS engine will know more about the
structure of the content being styled (but it doesn't know, nor even
cares, how such structure was defined and/or represented on the
source).
As a matter of fact, if browsers didn't do something like that with
the HTML markup, CSS couldn't style such markup, because CSS is
independent of the actual document language used.

>
> >> How would the selector for ‘[[Foo]]’ or ‘{Bar}’ look like?
> >
> > … a parser could make an element named "link" or "a" or whatever.
>
> ‘:link’ already exists.
But that doesn't matter. To begin, if the parser made a "link"
element, the selector would be just "link", not ":link". And it
doesn't matter that an element named "link" already exists on HTML,
because:
1) A wiki-markup parser (I'm guessing that example was wiki-markup,
but this would apply to anything else as well) will not try to parse
wiki-markup as if it were HTML.
2) CSS only deals with the in-memory tree representation, and couldn't
care less about what did that tree came from. It doesn't need to care:
the tree itself is all it needs.

An even better example: look at this chunk of XML and CSS:
<table>
    <data-row>
        <data-entry>Whatever</data-entry>
        <data-entry>Something</data-entry>
    </data-row>
    <data-row>
        <data-entry>The meaning of life</data-entry>
        <data-entry type="number">42</data-entry>
    </data-row>
    ...
</table>

table { display: table; }
data-row { display: table-row; }
data-entry { display: table-cell; }
data-entry[type='number'] { text-align: right; }
...

You see, these elements are just made up on the fly. Who knows if
HTML6 will include some of those? Nobody. Who cares? Nobody. It
doesn't matter what HTML6 (or any other version) defines, because that
example is just not HTML. The parser needs to know what the source is,
so it can produce a tree and feed it to the CSS renderer. The CSS only
needs the tree, and doesn't care at all about how it was produced.
In fact, <table> already exists in HTML, but I had to give it the
"display: table;" anyway. The only reason tables are rendered as such
in HTML in the absence of a stylesheet is that, in fact, there is a
stylesheet: the browser stylesheet that defines default styles such as
"table { display: table; }" for HTML. And CSS doesn't care at all
about what "table" elements are. If "display" is "table", then render
as a table; otherwise render as whatever the display property
indicates.

I hope these examples help clarifying things.
Received on Tuesday, 10 May 2011 20:02:09 UTC