Re: Some notes on SVG parsing in HTML 5

Hi, Cam-

Thanks for doing this.

Cameron McCormack wrote (on 2/17/09 11:23 PM):
> Hello WG.
> 
> I’ve taken a brief look at the commented out SVG parsing language in
> HTML 5.  Below are some pertinent notes for ACTION-2395.  Something
> being a “parse error” means that the document is non-conforming.
> 
> 
> Parsing this document:
> 
>   <svg xmlns='http://www.w3.org/2000/svg'>
>     <circle r='100'/>
>   </svg>
> 
> as text/html would be non-conforming, since it doesn't begin with an
> <html> tag, but would be parsed into this document:
> 
>   <html><head></head><body><svg xmlns='http://www.w3.org/2000/svg'>
>     <circle r='100'/>
>   </svg></body></html>

So, it would still display?


> Parsing this document:
> 
>   <!DOCTYPE html>
>   <html>
>     <head>
>       <title></title>
>     </head>
>     <body>
>       <svg>
>         <circle r='100'/>
>       </svg>
>     </body>
>   </html>
> 
> would be non-conforming, since the <svg> tag is missing an
> xmlns='http://www.w3.org/2000/svg' attribute.  That element would,
> however, be an {http://www.w3.org/2000/svg}svg element.

I favor a solution that does magic-namespaces myself, but don't feel
strongly.


> Parsing:
> 
>   <!DOCTYPE html>
>   <html>
>     <head>
>       <title></title>
>     </head>
>     <body>
>       <svg xmlns='http://www.w3.org/2000/svg'>
>         <a xlink:href='somewhere'/>
>       </svg>
>     </body>
>   </html>
> 
> would be conforming, despite not having an xmlns:xlink="" declaration in
> scope.

This seems inconsistent with the restriction above.


> Having on an SVG element an xmlns:xlink="" attribute whose value isn't
> the XLink namespace makes the document non-conforming.
> 
> 
> Including:
> 
>   <SVG XMLNS='http://www.w3.org/2000/svg'>
>     <AniMatemotion/>
>   </svg>
> 
> in an appropriate place is conforming, despite the case differences in
> the element and attribute names.  Further, these attribute and element
> names are mapped to the appropriate case when the element/attribute is
> created.  So the above is equivalent to:
> 
>   <svg xmlns='http://www.w3.org/2000/svg'>
>     <animateMotion/>
>   </svg>

I don't have a problem with the casing error-correction, but it should
be flagged a such (that is, it should raise some sort of warning for the
error console during parsing.)


> Any element names that aren't recognised as SVG or HTML ones will be
> created as plain Element objects (I think).  Any such elements, or
> attributes that aren't recognised as one of the SVG attributes that
> is not all lowercase, will be case folded to lowercase. Thus:
> 
>   <svg xmlns='http://www.w3.org/2000/svg'>
>     <g myCustomAttribute='abc'/>
>   </svg>
> 
> will be parsed as:
> 
>   <svg xmlns='http://www.w3.org/2000/svg'>
>     <g mycustomattribute='abc'/>
>   </svg>

This is a big problem.  This means that the most recent HTML
specification, and not the implementations, is the bottleneck for
development of new SVG elements and attributes.  This is one of the
flaws with the case-folding.  We can somewhat ameliorate this by using
lower-case names in the future (which is fine with me), and for
attributes this should work fine (since they would be in the null NS),
but would cause a problem for new elements.

I propose instead that any unrecognized unprefixed elements and
attributes inside an SVG block retain their case, and be placed in the
SVG NS.  This may still cause some mistakes, but in general will be the
right thing to do.


> No foreign content elements imply any start or end tags.

Not sure what this means.


> XML-style CDATA sections are supported.
> 
> 
> A <!DOCTYPE> in an SVG fragment will make the document non-conforming,
> as will an XML declaration.

That's fine... but will it stop it from rendering?


> SVG Tiny 1.2 elements aren't considered, and so <textArea> will parse
> as an HTML <textarea> element and break out of foreign content mode.

Why would this be?


> <font> is recognised as an <svg:font> element when in foreign content
> mode, however.  I have checked that all mixed case SVG 1.1 element names
> are in the HTML 5 table of case mappings.  (I haven't checked the list
> of attribute case mappings.)
> 
> 
> Prefixed SVG elements cannot be used, and prefixes other than xlink
> cannot be used for the XLink attributes.  xml:base and friends will
> parse in the same way.

There is existing content (though maybe not much) that uses prefixed SVG
elements, so this should be allowed.

Since this is legal in XHTML, consistency would dictate that it should
be allowed in text/html as well.  What is the rationale for not allowing it?


> There's a comment <!--XXXSVG need to define processing for </script> to
> match HTML5's </script> processing --> but I'm not sure what processing
> this means.

Yeah, it would be good to clarify that.  In general, I agree with the
sentiment.


> Inside a <foreignObject>, <desc> or <title> element, direct child
> elements will be parsed as if they weren’t in a foreign content context.
> So for example, a <font> child of <title> will become an HTML <font>,
> and a <rect> child would be parsed as an element
> {http://www.w3.org/1999/xhtml}rect, but an <svg> child would be parsed
> as a {http://www.w3.org/2000/svg}svg again.
>
>
> In foreign content mode, a <font> start tag with a color, face or size
> attribute will cause the document to be non-conforming.  <!-- the
> attributes here are required so that SVG <font> will go through as SVG
> but legacy <font>s won't -->  I'll note that color is a valid attribute
> to use on <font> in SVG (being the presentation attribute for the color
> property), but that it would be extremely uncommon, although not without
> some effect: arbitrary content glyphs could be using currentColor as
> fill/stroke on some shapes, which thus reference the color property set
> on the <font>.

Seems fine.


> The following start tags cause a parse error inside foreign content:
> b, big, blockquote, body, br, center, code, dd, div, dl, dt, em, embed,
> h1, h2, h3, h4, h5, h6, head, hr, i, img, li, listing, menu, meta, nobr,
> ol, p, pre, ruby, s, small, span, string, strike, sub, sup, tbale, tt,
> u, ul and var.  <!-- this list was determined empirically by studying
> over 6,000,000,000 pages that were specifically not XML pages -->

It seems odd that text/html content in SVG in text/html would not be
allowed.  Should this be relaxed?


> Any SVG element can use the self-closing <syntax/>.

I would say MUST.  I don't mind error correction here, in some sane
cases, but again it should raise a parsing error for the error console.


> All of the character references that can be used in PCDATA and RCDATA
> sections can be used in foreign content, too.
> 
> 
> All of the attribute syntaxes allowed on HTML elements (double quoted,
> single quoted, unquoted and minimized) are allowed on foreign content.

Fine, but this seems inconsistent with the stricture against non-XHTML
elements (above).

If by "foreign content" you mean SVG content, then I don't mind the
quote error-correction, as long as it is logged as an error, as
mentioned above.


> Various other kinds of invalid syntax which causes a parse error but
> still parses will behave the same in foreign content (such as correction
> of mis-nested open/close tags).

Ok.

Regards-
-Doug Schepers
W3C Team Contact, SVG and WebApps WGs

Received on Tuesday, 17 February 2009 22:27:26 UTC