Re: SVG requires a parser in a parser?

Rainer Prosi wrote:
> 
> Hello SVGers!
> 
> I read the SVG spec for the first time now and was surprised about the
> implementation of paths.
> 
> using a construct of
> <path d="M 10 10 L40 0" style="fill:none; stroke:black; stroke-width:100">
>      <data d="L 20 20"/>
>      <data d="L 0 20"/>
>      <data d="L 20 0"/>
>      <data d="z"/>
> </path>
> 
> is more inconvenient too validate and parse than e.g:
> <path closed="true" fill="none" stroke="black" strokeWidth="100">
>      <M>10 10</M>
>      <L>40 0</L>
>      <L>20 20</L>
>      <L>0 20</L>
>      <L>20 0</L>
> </path>
> 
> The number of bytes used is also only insignificantly higher 

True, in that case. However, this is much shorter:

<path d="M 10 10 L40 0" style="fill:none; stroke:black;
stroke-width:100"
 d="L20,20 0,20 20,0z"/>
</path>

and moving on to actual real-world graphics, rather than brief examples,
the difference becomes very marked indeed.

The vast majority of SVG file size in real examples ends up being path
data. One of the things that the SVG group noticed very early on was
that PGML, which had a point-oriented syntax similar to the one you
describe, was very verbose and VML, which had a shorter path-like
syntax, was 50% smaller or more. SVG is more compact still, and also has
been carefully designed to compress well using deflate compression
(which HTTP/1.1 can do on the fly). The WG has done experiments with
different syntaxes before settling on the current one.

It has become apparent over the years that file size is something which
Web designers care about passionately, and a verbose format would be a
non-starter. Of course, for learning examples and so on then a widely
spaced, indented and generally pretty printed layout can be used.

> and all entities can be accessed and validated by generic XML 
> parsers without having to parse attributes again.

In fact your example would not allow this, since you put the coordinates
as content, although it could be reformulated to do so:

<path closed="true" fill="none" stroke="black" strokeWidth="100">
     <M x="10" y="10"/>
     <L x="40" y="0/>
     <L x="20" y="20/>
     <L x="0"  y="20/>
     <L x="20" y="0/>
</path>

This would allow checking that each M or L element had exactly one pair
of points; it would not however detect the errors in this example 

<path>
 <L x="an" y="example"/>
 <M x="hello" y="world"/>
</path>

The EBNF for path syntax allows a much closer degree of checking to be
applied; it also allows a DOM implementation to not bother parsing the
path data if it does not need to (for example, text-only display; text
editing; link validation, and suchlike tasks).

 
> The same holds for this construct:
> <image x="100" y="100" style="width: 100px; height: 100px" (...)/>
> 
> which IMO deserves a
> <image x="100" y="100" width="100" height="100" (...)/>.

Recall that style can be specified on parent elements, in a style
element, or in an external style sheet, and will cascade in accordance
with the normal CSS rules. This is much easier to accomplish when using
the normal DOM 2 CSS Object Model, with a single style attribute which
is just one part of the complete cascade.

> What are the reasons for the syntax as chosen by the SVG group?

I hope these comments answer your question. It was not a case of
oversight, but of deliberate design. 

In the end, it comes down to "what information is being modelled by the
XML structure?". In SVG, the basic unit is a graphical object (a path,
for example) not a point or a coordinate. Similarly, in most textual
markup languages, the basic unit is a paragraph or a phrase; markup
rarely descends to the detail of individual nouns, verbs, adverbs and so
on.

--
Chris

Received on Thursday, 22 July 1999 12:22:40 UTC