Re: several messages about New Vocabularies in text/html

Ian Hickson wrote on 04/02/2008 07:56:43 PM:
>
> On Wed, 2 Apr 2008, Sam Ruby wrote:
> > >
> > >    http://wiki.whatwg.org/wiki/Extensions
> >
> > I have now contributed to that page.  Feel free to identify where the
> > proposal is not detailed enough or to identify any flaws that may, or
> > may not, prove fatal.
>
> The proposal seems to be "do what Microsoft documented in their
namespaces
> whitepaper as being the IE8 Beta 1 behaviour". However, the whitepaper
> doesn't actually say what the processing model is, and IE8 beta 1 doesn't

> seem to implement anything like what the whietpaper implies should happen

> anyway.
>
> If you could describe in your own words what the processing model you are

> proposing is, that would be something I could evaluate.

To explain the motivation, it helps if I start at the beginning.

You haven't said that HTML 5 will support SVG in the HTML5 serialization of
the format.  I am going to make a working assumption that this will
ultimately be the case.  Feel free to challenge that assumption.

Compatibility with existing graphics packages is a desirable goal.  I note
that this is not yet listed as accepted on the New_Vocabularies page on the
wiki.  Feel free to challenge that assumption.

>From my experience, inkscape seems to be a popular tool for generating SVG
images.  I base that on random samplings of wikipedia and other sources.
Feel free to challenge that assumption.

The top google search result for "inkscape filetype:svg" at the moment is
http://en.wikipedia.org/wiki/Image_talk:Bart-logo.svg

I will assert that that logo, while not terribly interesting by itself, is
not atypical of the types of output that inkscape produces.  Feel free to
challenge that assumption.

It would be helpful if the DOM produced by processing whatever HTML5 syntax
for SVG is adopted would be the same as the one produced by processing the
XML serialization of inline SVG inside XHTML today.  Feel free to challenge
that assumption.

With all these unknowns and variables and assumptions in mind...

If you take that document (i.e., "Bart-logo") and process it using an HTML5
parser as HTML 5 is defined today, the biggest structural discrepancy
between the document that is produced by the HTML5 parser and an XHTML5
parser is that empty elements are not processed as void elements, but
instead are processed as open tags.  This problem is common enough that
users would be extremely frustrated if this were not addressed.  Feel free
to challenge that assumption.

The number of unique occurances (in terms of element names) of empty
elements in that small document alone indicate that an exhaustive list of
all possible void elements is not practical.  The fact that <defs/> is
included in that list in this document, an element which typically is a
container of other elements, also argues against the feasibility of such a
list.  Feel free to challenge that assumption.

All of this argues for a new processing "state" for handling SVG content.
Perhaps such a state could also be useful for MathML.  And perhaps even for
future vocabularies that might be created, either now or in the future.
But that's outside the scope of this email for the moment.

If there is a new state, there needs to be a trigger.  You've rejected
<math> as a simple trigger.  Perhaps <svg> would work.  Perhaps not.  In
prior discussions you rejected xmlns attributes as a trigger.  The
overwheling number of problematic xmlns attributes at that time had a value
of "".  Others had values that I did not recognize.  Or the XHTML
namespace.

Any trigger has the potential for generating potential false positives.  In
the case of MathML and SVG, it migth be useful to see if xmlns attributes
with the specific values specified for those standards generates any false
positives.  In particular, it would be clearly be problematic if such an
tag were not closed.  Let's proceed under the assumption that such a
trigger can be found.  And for now, lets make a working assumption that the
presence of an xmlns attribute on an element not recognized as being valid
in HTML5 and with a value in a short list of (possibly browser and/or
installation dependent) known namepaces is an acceptable trigger.  Feel
free to challenge that assumption.  In particular, it opens up the
possiblity that one browser may support MathML and another one may not.
This reduces interoperability.  But realistically, one can't force any
particular browser to natively support MathML, a reality that will have to
be dealt with somehow.  Onward...

Given this one document and that one trigger and that one additional
parsing mode which has exactly one deviation from the current HTML5 parsing
rules, the DOM that is produced is now vaguely recognizable.

The one key remaining difference is that the value of the namespace for all
these elements is wrong.  If, in this new mode, the value of attributes
named 'xmlns' (note: without a colon, i.e., not an xmlns: prefix) were
respected on that element, and all DOM nodes which are created in this
parsing phase which do NOT have an xmlns attribute simply copy the value of
the namespace of the parent, then you will have a DOM tree that, while
incorrect for RDF and DC and SODIPODI and INKSCAPE elements and attributes,
may produce the desired graphic.  Feel free to challenge that assumption.
But if this turns out to be true, then the amount of specific parsing logic
necessary to support this document is fairly small and surgical.

If this approach is adopted, then we can begin evaluating it against
existing documents.  And explore whether it would be useful for handing
MathML.  And see how it would handle nesting of SVG inside of MathML.  We
can also explore whether it is necessary to get those other elements and
attributes right.  There is one case (xlink) where this is likely to be
important for SVG.  Mixed case attribute and element names will ultimately
be an issue.  CDATA may be worth handling in this processing mode.
Ideally, such would also be handled in an equally surgical manner.

But before going too far into the weeds, it would be helpful if some more
of the requirements and assumptions which are identified not only in this
email but also on New_Vocabularies page were verified.

- Sam Ruby

Received on Thursday, 3 April 2008 02:38:08 UTC