grammars and parsing for regular and presentation attributes from Cameron McCormack on 2013-01-19 (www-svg@w3.org from January 2013)

From: Cameron McCormack <cam@mcc.id.au>
Date: Sat, 19 Jan 2013 12:08:35 +1100
To: "www-svg@w3.org" <www-svg@w3.org>
CC: "Tab Atkins Jr." <jackalmage@gmail.com>
Message-ID: <50F9F213.8010403@mcc.id.au>

We have already resolved (and it's also in our list of requirements for 
SVG 2) to make the spec support presentation attribute parsing case 
insensitively, e.g. fill="Red" should work just like style="fill:Red" would.

I couldn't see an issue in the tracker about allowing all CSS syntax in 
presentation attributes, e.g. fill="/**/red", although I'm sure we've 
brought it up before.  IE and Chrome both support this, while Opera and 
Firefox do not.  (Chrome also supports fill="re\64", while IE does not.) 
  I think it's a natural progression to parse these attributes entirely 
with the CSS parser.  What are people's current thoughts on this?

I am also wondering what to do about the parsing of non-presentation 
attributes.  I find it strange that for example the definition of both a 
<rect> element's x="" attribute and say the stroke-width property refer 
to the <length> type, given that we currently have non-presentation 
attribute parsing defined using EBNF grammars while properties use the 
CSS grammar syntax.  I want to remove the definitions of the data types 
like <length> in types.html and for properties at least, reference 
css3-values, but the question of what non-presentation attributes should 
refer to remains.

I don't think we want to duplicate the CSS <length> syntax (which 
includes calc() expressions now) in EBNF.  Maybe we can still utilise 
the CSS parser and invoke it with some flags that disable escapes and 
comments?

Should <rect x=" 10"> be valid by the way?

And have we decided to allow calc() expressions in lengths in 
non-presentation attributes, like <rect x="calc(10px + 20%)"> (even not 
considering the general plan to convert attributes like this to properties)?


I was wondering if we could eliminate the EBNF in the spec entirely and 
rely only on CSS grammar syntax, but that's probably not feasible, at 
least for complicated attributes like <path d="">.  I think what we need 
is a defined way for EBNF grammars to refer to CSS grammar 
non-terminals.  (We could visually differentiate EBNF and CSS grammar 
non-terminals to try to avoid confusion in how they're parsed.)

Let's take <text x=""> as an example.  That's a 
white-space-or-comma-separated list of <length> values.  If we use EBNF 
for the attribute as a whole, we could write:

   list-of-lengths ::= <length> | <length> comma-wsp list-of-lengths
   comma-wsp ::= (wsp+ ","? wsp*) | ("," wsp*)
   wsp ::= (#x20 | #x9 | #xD | #xA)

So angle-bracketed non-terminals reference CSS grammar symbols, and bare 
identifiers refer to EBNF non-terminals rather than being literals as in 
CSS grammar syntax.

(We should also consider aligning our set of white space characters with 
those in CSS, where CR isn't supported, remembering that XML will 
normalize CRs to LFs when parsing.)

There might be some trickiness with white space parsing within the 
<length>, as I think CSS parsing will normally consume any trailing 
white space, which we might not want to do here.

What do people think (and Tab in particular, since he's the css3-syntax 
editor)?

Received on Saturday, 19 January 2013 01:08:29 UTC