- From: Eduard Pascual <herenvardo@gmail.com>
- Date: Tue, 10 May 2011 22:01:21 +0200
- To: Christoph Päper <christoph.paeper@crissov.de>
- Cc: W3C style mailing list <www-style@w3.org>
On Tue, May 10, 2011 at 7:11 PM, Christoph Päper <christoph.paeper@crissov.de> wrote: > > Btw., to get this straight, Eduard Pascal is an alter ego of Tab Atkins Jr. (or vice versa)? Nope, I'm not. This is not the first time Tab and me show very similar ideas and views on web standards, so the confusion is understandable. To clarify this: Tab has been a member of the W3C working group for some years (before I had ever posted to a mailing list). I used to be a quite active contributor to this list and the HTML5 public mailing list (now quite less active due to schedule constraints); but I haven't been formally a member of any working group. As a matter of fact, I just deleted a half-written reply as I saw that Tab had already replied saying almost everything I was going to say. There is a point, however, I'd like to dig a bit deeper: > But ‘em’ (and possibly ‘emphasis’ too) is an element name in and therefore selector for at least one other markup language. It is > much cleaner to provide a generic mechanism. Why would matter, when styling Markdown code, whether something is an element on some other language? Let's look at a clearer example: on programming languages, there has never been any issue for "if" being a keyword on both C and Perl (among plenty of others). Let me push that further with the "continue" keyword: the meaning it has in Perl is entirely unrelated to what it does in C and other C-ish languages, yet there is no conflict. This is because a compiler knows pretty well whether it is compiling Perl, or C, or whatever else. This can also be applied to markup languages: when parsing Markdown, for example, the parser couldn't care less about what elements exist on HTML, or docbook, or any other markup language (you may care *later* if you are actually converting it to HTML for rendering, but by then parsing will be long done). If _sometext_ stands for emphasis, then it's perfectly reasonable to describe it as an "emphasis node" with the text "sometext". Let's take a more elaborate example: _This text is underlined_, this text is not, but _this_one_is_. == This may be a headline == == This is a headline, elsewhere Yet another headline format =========================== That could be parsed into a tree like this: DOCUMENT +- PARAGRAPH +- UNDERLINE +- "This text is underlined" +- ", this text is not, but" +- UNDERLINE +- "this_one_is" + "." +- HEADLINE +- "This may be a headline" +- HEADLINE +- "This is a headline, elsewhere" +- HEADLINE +- "Yet another headline format" (note that I could have labelled the nodes differently, such as "P", "EM", "H", etc). Once the parser feeds that tree to a CSS processor, the CSS doesn't care at all how the structure was represented on the source markup. It has "element" nodes (with their names) and text nodes nicely arranged on a tree structure. If you toss in a stylesheet like this, things should just work: PARAGRAPH { display: block; margin-bottom: 1em; } HEADLINE { font: "Arial" large; } UNDERLINE { text-decoration: underline; } So you can actually style Markdown code with CSS. All you need is a CSS processor and a Markdown parser that generates the tree and passes it to the processor. Scripts that turn Markdown into HTML for rendering are exactly that: parsers that generate a tree from the source and pass it to the browser's CSS engine. The only issue they have is that, in order to "inject" the tree into the browser's engine, they serialize it into what the browser best understands, HTML. That step quite bloats the process, but it's just a workaround to the fact that they are relying on a third party CSS engine (the browser's), and thus they need to do some marshalling to pass it the tree. Keep in mind that the notation I have used (the "+-", the indentation, and the quotes) is entirely arbitrary: I'm just trying to convey through an e-mail something that would be represented as an in-memory data structure. Onto your other request, about highlighting source code. I'm making my own example this time, so I can show off some more possibilities: look at this C# statement: foreach(byte b in mySocket.getDataStream()) process(b); This could be parsed (actually, not even parsed, tokenizing is enough) into something like this: KEYWORD (kind='flow') +- "foreach" +- OPERATOR (pair='opener') +- "(" +- KEYWORD (kind='type-alias') +- "byte" +- WHITESPACE +- SYMBOL +- "b" +- WHITESPACE +- KEYWORD (kind='contextual') +- "in" +- WHITESPACE +- SYMBOL +- "mySocket" +- OPERATOR +- "." +- SYMBOL (framework-reference='http://msdn.microsoft.com/...') +- "getDataStream" +- OPERATOR (pair='opener') +- "(" +- OPERATOR (pair='closer') +- ")" +- OPERATOR (pair='closer') +- ")" +- WHITESPACE +- SYMBOL +- "process" +- OPERATOR (pair='opener') +- "(" +- SYMBOL +- "b" +- OPERATOR (pair='closer') +- ")" +- STATEMENT-TERMINATOR The "STATEMENT-TERMINATOR" could have been perfectly described as an "OPERATOR", but it just serves as an example on how a some operators could be special-cased if the need arises. Now, you can go and put something like this together: OPERATOR, STATEMENT-TERMINATOR { color: blue; } KEYWORD { color: blue; font-weight: bold; } KEYWORD[type='contextual'] { font-weight: normal; } /* Overrides the declaration above for contextual keywords */ SYMBOL { color: #093; font-style: italic; } ... There you go: css-based syntax highlighting. I have used the example to also introduce some arbitrary keywords (let me insist that the notation is arbitrary and irrelevant, we are just describing in-memory data structures). Of course, a better parser could create "PARENTHESIS-EXPRESSION" and "BLOCK" elements and make the relevant stuff descendant from those, creating a more "tree-ish" tree instead of something so lineal (so, for example, you could add borders to block boxes from your CSS). However, I'm not in the mood of defining a C# parser right now :P At this point, I hope you can see that, once the parser has done its job, CSS is perfectly capable of dealing with the rest. In fact, the better the parser does its job, the more options you have later to define your styles, because the CSS engine will know more about the structure of the content being styled (but it doesn't know, nor even cares, how such structure was defined and/or represented on the source). As a matter of fact, if browsers didn't do something like that with the HTML markup, CSS couldn't style such markup, because CSS is independent of the actual document language used. > > >> How would the selector for ‘[[Foo]]’ or ‘{Bar}’ look like? > > > > … a parser could make an element named "link" or "a" or whatever. > > ‘:link’ already exists. But that doesn't matter. To begin, if the parser made a "link" element, the selector would be just "link", not ":link". And it doesn't matter that an element named "link" already exists on HTML, because: 1) A wiki-markup parser (I'm guessing that example was wiki-markup, but this would apply to anything else as well) will not try to parse wiki-markup as if it were HTML. 2) CSS only deals with the in-memory tree representation, and couldn't care less about what did that tree came from. It doesn't need to care: the tree itself is all it needs. An even better example: look at this chunk of XML and CSS: <table> <data-row> <data-entry>Whatever</data-entry> <data-entry>Something</data-entry> </data-row> <data-row> <data-entry>The meaning of life</data-entry> <data-entry type="number">42</data-entry> </data-row> ... </table> table { display: table; } data-row { display: table-row; } data-entry { display: table-cell; } data-entry[type='number'] { text-align: right; } ... You see, these elements are just made up on the fly. Who knows if HTML6 will include some of those? Nobody. Who cares? Nobody. It doesn't matter what HTML6 (or any other version) defines, because that example is just not HTML. The parser needs to know what the source is, so it can produce a tree and feed it to the CSS renderer. The CSS only needs the tree, and doesn't care at all about how it was produced. In fact, <table> already exists in HTML, but I had to give it the "display: table;" anyway. The only reason tables are rendered as such in HTML in the absence of a stylesheet is that, in fact, there is a stylesheet: the browser stylesheet that defines default styles such as "table { display: table; }" for HTML. And CSS doesn't care at all about what "table" elements are. If "display" is "table", then render as a table; otherwise render as whatever the display property indicates. I hope these examples help clarifying things.
Received on Tuesday, 10 May 2011 20:02:09 UTC