Re: Trying out SVG and MathML parsing

On Jun 19, 2008, at 21:47, Philip Taylor wrote:

>
> Henri Sivonen wrote:
>> There was some discussion about SVG parsing on IRC today. Since I  
>> happened to have something almost ready, I figured I'd put a build  
>> out there before I head to Midsummer/St.John festivities (national  
>> holiday; big deal over here). [...]
>
> I tried this by taking a few hundred random SVG files from  
> Wikipedia, passing them through html2xml to produce XHTML output,  
> then visually comparing against the originals. The problems I  
> noticed are:

Thank you!

> * Lots have attributes like xlink:href and sodipodi:version and  
> i:vieworigin, which make html2xml's output ill-formed since it  
> doesn't provide an appropriate xmlns. (This may have masked other  
> problems from me, since it made most of the images unviewable.)

Sorry about that. I have made another build that has a layer that  
masks these behaviors. I really should write an XML serializer.
http://about.validator.nu/htmlparser/htmlparser-svg-demo2.zip

> * HTML5's treatment of <font> (i.e. exiting from the SVG mode)  
> breaks a number of images:

Yeah. This is not cool.

> http://upload.wikimedia.org/wikipedia/en/b/b5/Lindos5.svg

This file isn't Web-compatible. :-/

> http://upload.wikimedia.org/wikipedia/en/1/17/Nilt-Political_Attitudes-NIRELAND-2006.svg
> http://upload.wikimedia.org/wikipedia/en/b/be/PersCorpINtax_wi_5.svg
> http://upload.wikimedia.org/wikipedia/en/4/40/Telecom.svg

In all these, <font> appears as a child of <defs>. At minimum,  
breaking out of foreign content should be adjusted not to happen on  
<font> when the current elements is <defs> in the SVG namespace.

Hixie, would that work given the data you looked at when deciding  
which start tags break out of foreign content?

> * In many cases, Illustrator's fancy doctype tricks like:
>
>  <!DOCTYPE svg PUBLIC "-//W3C//DTD SVG 1.1//EN"
>    "http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd" [
>      <!ENTITY ns_svg "http://www.w3.org/2000/svg">
>      <!ENTITY ns_xlink "http://www.w3.org/1999/xlink">
>  ]>
>  <svg xmlns="&ns_svg;" xmlns:xlink="&ns_xlink;" ...
>
> make the text "]>" appear in the <body> (because HTML5 breaks out of  
> the doctype when it sees the first '>').

This is by design, since the algorithm is meant to handle the kind of  
SVG fragments one would paste in the middle of XHTML, and one wouldn't  
paste that kind of Illustrator cruft in the middle of XHTML.

> (Also it triggers <http://bugzilla.validator.nu/show_bug.cgi?id=255>.)

Fixed. (Good catch! In case you are curious: The code for returning  
from a bogus character reference in attribute value incorrectly  
tweaked text coalescing buffer position in the way that would be  
proper when returning from a bogus character reference in text content.)

> * Often the sizes seem to get broken so the SVG-in-XHTML image is  
> tiny or huge, e.g. <http://upload.wikimedia.org/wikipedia/en/6/69/CDGlogo.svg 
> > vs <http://philip.html5.org/misc/CDGlogo.xhtml>. (I don't know  
> enough about SVG sizing to understand why this problem occurs.)

This happens due to the next bug (viewbox vs. viewBox):

> * The "gradientUnits" attribute is converted into "gradientunits"  
> which doesn't work, breaking <http://upload.wikimedia.org/wikipedia/en/5/54/Microsoft_Windows_XP_Logo.svg 
> >. (<http://bugzilla.validator.nu/show_bug.cgi?id=256>.)

Fixed.

Thank you.

-- 
Henri Sivonen
hsivonen@iki.fi
http://hsivonen.iki.fi/

Received on Monday, 23 June 2008 14:33:11 UTC