SVG and proper XML design

Quick summary:

  The fact that the "d" data attribute in the "path" tag requires some
unnecessarily compressed string syntax does completely violate good
XML grammar design.

The long version:

  I've been using SVG for a few years and have been a proponent of it
for just about as long. Recently, I started writing a native Python
SVG parser using the SAX module. Things were going smoothly until I
made a shocking discovery:

  In critically important parts of the XML structure, SVG reverts to
long-string syntax!!!!

  Why is this a problem?
  Consider the perspective of the language implementer (i.e. someone
who write a program to parse and/or modify SVG). If an XML grammar is
properly designed, the implementer can easily use simple, standard
tools, such as SAX, to parse the language quickly and efficiently--
this is the strength of an XML language. In SVG, however, many crucial
parts of the implementation are hidden behind arbitrary and
confounding syntax. This means that the implementer will have to
change their design paradigm to accommodate multiple arbitrary string
formats.

  Here is an example.
  The "path" tag is perhaps the most versatile shape definition. Every
"basic shape" is really just a subset of all possible paths, but I
understand their redundancy for ease of design. However, I cannot
understand why the definition of a path's shape is completely
contained within the undescriptive "d" data attribute. Furthermore,
the format for this string is completely unpredictable and extremely
painful to parse. It wouldn't be so bad if we could count on a single
tokenizing character such as a space or comma, but according to the
recommendation, to minimize the amount of space taken up by a path
definition, spaces are sometimes optional!

  This means I can no longer use a simple string tokenizer, but
instead have to write a state-machine style parser. This totally
violates the purpose of an XML language. XML grammars are not about
saving space! They sacrifice meaningless savings in space for the
benefit of ease-of-use and ease-of-implementation.

  So what is my recommendation?
  Simple: Create an XML grammar definition for paths and other
arbitrary string formats such as style. It would be much easier to
parse the language of it the grammar included something like:

<path>
  <moveto point="1,2"/>
  <lineto point="1,2"/>
  <bezier order="2" points="1,2 3,4 5,6"/>
  <closepath/>
</path>

  Naturally, this is much easier to parse using standard XML tools as
well as simple string tokenizers.

  Why does this matter?
  I believe that SVG would be much more widely adopted if it were
easier to implement. For the longest time, I was discouraged by the
lack of implementations of SVG parsers and renderers-- which is why I
started to write my own. But as I got more involved, I discovered that
SVG is not as easy to handle as other XML languages, and required a
lot of extra unnecessary work. Hence, I have stopped work on that
module from frustration and have taken time to write this post.

  Please reply with your comments. I'm sorry if this topic has been
discussed at length before, but I think it is a really important
subject and would really like to explore the possibility of change in
the recommendation.

  Bill Dwyer

Received on Tuesday, 10 April 2007 04:13:51 UTC