- From: Bill Dwyer <themadcreator@gmail.com>
- Date: Mon, 9 Apr 2007 16:58:05 -0700
- To: www-svg@w3c.org
Quick summary: The fact that the "d" data attribute in the "path" tag requires some unnecessarily compressed string syntax does completely violate good XML grammar design. The long version: I've been using SVG for a few years and have been a proponent of it for just about as long. Recently, I started writing a native Python SVG parser using the SAX module. Things were going smoothly until I made a shocking discovery: In critically important parts of the XML structure, SVG reverts to long-string syntax!!!! Why is this a problem? Consider the perspective of the language implementer (i.e. someone who write a program to parse and/or modify SVG). If an XML grammar is properly designed, the implementer can easily use simple, standard tools, such as SAX, to parse the language quickly and efficiently-- this is the strength of an XML language. In SVG, however, many crucial parts of the implementation are hidden behind arbitrary and confounding syntax. This means that the implementer will have to change their design paradigm to accommodate multiple arbitrary string formats. Here is an example. The "path" tag is perhaps the most versatile shape definition. Every "basic shape" is really just a subset of all possible paths, but I understand their redundancy for ease of design. However, I cannot understand why the definition of a path's shape is completely contained within the undescriptive "d" data attribute. Furthermore, the format for this string is completely unpredictable and extremely painful to parse. It wouldn't be so bad if we could count on a single tokenizing character such as a space or comma, but according to the recommendation, to minimize the amount of space taken up by a path definition, spaces are sometimes optional! This means I can no longer use a simple string tokenizer, but instead have to write a state-machine style parser. This totally violates the purpose of an XML language. XML grammars are not about saving space! They sacrifice meaningless savings in space for the benefit of ease-of-use and ease-of-implementation. So what is my recommendation? Simple: Create an XML grammar definition for paths and other arbitrary string formats such as style. It would be much easier to parse the language of it the grammar included something like: <path> <moveto point="1,2"/> <lineto point="1,2"/> <bezier order="2" points="1,2 3,4 5,6"/> <closepath/> </path> Naturally, this is much easier to parse using standard XML tools as well as simple string tokenizers. Why does this matter? I believe that SVG would be much more widely adopted if it were easier to implement. For the longest time, I was discouraged by the lack of implementations of SVG parsers and renderers-- which is why I started to write my own. But as I got more involved, I discovered that SVG is not as easy to handle as other XML languages, and required a lot of extra unnecessary work. Hence, I have stopped work on that module from frustration and have taken time to write this post. Please reply with your comments. I'm sorry if this topic has been discussed at length before, but I think it is a really important subject and would really like to explore the possibility of change in the recommendation. Bill Dwyer
Received on Tuesday, 10 April 2007 04:13:51 UTC