Some notes on SVG parsing in HTML 5

Hello WG.

I’ve taken a brief look at the commented out SVG parsing language in
HTML 5.  Below are some pertinent notes for ACTION-2395.  Something
being a “parse error” means that the document is non-conforming.


Parsing this document:

  <svg xmlns='http://www.w3.org/2000/svg'>
    <circle r='100'/>
  </svg>

as text/html would be non-conforming, since it doesn't begin with an
<html> tag, but would be parsed into this document:

  <html><head></head><body><svg xmlns='http://www.w3.org/2000/svg'>
    <circle r='100'/>
  </svg></body></html>


Parsing this document:

  <!DOCTYPE html>
  <html>
    <head>
      <title></title>
    </head>
    <body>
      <svg>
        <circle r='100'/>
      </svg>
    </body>
  </html>

would be non-conforming, since the <svg> tag is missing an
xmlns='http://www.w3.org/2000/svg' attribute.  That element would,
however, be an {http://www.w3.org/2000/svg}svg element.


Parsing:

  <!DOCTYPE html>
  <html>
    <head>
      <title></title>
    </head>
    <body>
      <svg xmlns='http://www.w3.org/2000/svg'>
        <a xlink:href='somewhere'/>
      </svg>
    </body>
  </html>

would be conforming, despite not having an xmlns:xlink="" declaration in
scope.


Having on an SVG element an xmlns:xlink="" attribute whose value isn't
the XLink namespace makes the document non-conforming.


Including:

  <SVG XMLNS='http://www.w3.org/2000/svg'>
    <AniMatemotion/>
  </svg>

in an appropriate place is conforming, despite the case differences in
the element and attribute names.  Further, these attribute and element
names are mapped to the appropriate case when the element/attribute is
created.  So the above is equivalent to:

  <svg xmlns='http://www.w3.org/2000/svg'>
    <animateMotion/>
  </svg>

Any element names that aren't recognised as SVG or HTML ones will be
created as plain Element objects (I think).  Any such elements, or
attributes that aren't recognised as one of the SVG attributes that
is not all lowercase, will be case folded to lowercase. Thus:

  <svg xmlns='http://www.w3.org/2000/svg'>
    <g myCustomAttribute='abc'/>
  </svg>

will be parsed as:

  <svg xmlns='http://www.w3.org/2000/svg'>
    <g mycustomattribute='abc'/>
  </svg>

No foreign content elements imply any start or end tags.


XML-style CDATA sections are supported.


A <!DOCTYPE> in an SVG fragment will make the document non-conforming,
as will an XML declaration.


SVG Tiny 1.2 elements aren't considered, and so <textArea> will parse
as an HTML <textarea> element and break out of foreign content mode.
<font> is recognised as an <svg:font> element when in foreign content
mode, however.  I have checked that all mixed case SVG 1.1 element names
are in the HTML 5 table of case mappings.  (I haven't checked the list
of attribute case mappings.)


Prefixed SVG elements cannot be used, and prefixes other than xlink
cannot be used for the XLink attributes.  xml:base and friends will
parse in the same way.


There's a comment <!--XXXSVG need to define processing for </script> to
match HTML5's </script> processing --> but I'm not sure what processing
this means.


Inside a <foreignObject>, <desc> or <title> element, direct child
elements will be parsed as if they weren’t in a foreign content context.
So for example, a <font> child of <title> will become an HTML <font>,
and a <rect> child would be parsed as an element
{http://www.w3.org/1999/xhtml}rect, but an <svg> child would be parsed
as a {http://www.w3.org/2000/svg}svg again.


In foreign content mode, a <font> start tag with a color, face or size
attribute will cause the document to be non-conforming.  <!-- the
attributes here are required so that SVG <font> will go through as SVG
but legacy <font>s won't -->  I'll note that color is a valid attribute
to use on <font> in SVG (being the presentation attribute for the color
property), but that it would be extremely uncommon, although not without
some effect: arbitrary content glyphs could be using currentColor as
fill/stroke on some shapes, which thus reference the color property set
on the <font>.


The following start tags cause a parse error inside foreign content:
b, big, blockquote, body, br, center, code, dd, div, dl, dt, em, embed,
h1, h2, h3, h4, h5, h6, head, hr, i, img, li, listing, menu, meta, nobr,
ol, p, pre, ruby, s, small, span, string, strike, sub, sup, tbale, tt,
u, ul and var.  <!-- this list was determined empirically by studying
over 6,000,000,000 pages that were specifically not XML pages -->


Any SVG element can use the self-closing <syntax/>.


All of the character references that can be used in PCDATA and RCDATA
sections can be used in foreign content, too.


All of the attribute syntaxes allowed on HTML elements (double quoted,
single quoted, unquoted and minimized) are allowed on foreign content.


Various other kinds of invalid syntax which causes a parse error but
still parses will behave the same in foreign content (such as correction
of mis-nested open/close tags).


-- 
Cameron McCormack ≝ http://mcc.id.au/

Received on Tuesday, 17 February 2009 12:24:32 UTC