Re: several messages about New Vocabularies in text/html from Ian Hickson on 2008-04-03 (public-html@w3.org from April 2008)

From: Ian Hickson <ian@hixie.ch>
Date: Thu, 3 Apr 2008 03:07:37 +0000 (UTC)
To: Sam Ruby <rubys@us.ibm.com>
Cc: public-html@w3.org, www-math@w3.org
Message-ID: <Pine.LNX.4.62.0804030243180.24456@hixie.dreamhostps.com>
On Wed, 2 Apr 2008, Sam Ruby wrote:
> 
> To explain the motivation, it helps if I start at the beginning.

I understand the problem. It's the solution I don't understand.


> You haven't said that HTML 5 will support SVG in the HTML5 serialization 
> of the format.  I am going to make a working assumption that this will 
> ultimately be the case.  Feel free to challenge that assumption.

Assuming we can find a way to make it work, it's certainly one of the more 
obvious options for resolving the use case of embedding diagrams in HTML.


> Compatibility with existing graphics packages is a desirable goal.  I 
> note that this is not yet listed as accepted on the New_Vocabularies 
> page on the wiki.  Feel free to challenge that assumption.

The wiki was misleading. I've removed the headers which were really just 
there to help organise my work. I agree that "Compatibility with existing 
graphics packages" is a desirable goal. (Whether we can meet it is another 
issue.)


> From my experience, inkscape seems to be a popular tool for generating 
> SVG images.  I base that on random samplings of wikipedia and other 
> sources. Feel free to challenge that assumption.

It seems reasonable, though I expect Adobe Illustrator is far more 
popular, and that would be a better target today.


> The top google search result for "inkscape filetype:svg" at the moment 
> is http://en.wikipedia.org/wiki/Image_talk:Bart-logo.svg
> 
> I will assert that that logo, while not terribly interesting by itself, 
> is not atypical of the types of output that inkscape produces.  Feel 
> free to challenge that assumption.

That seems like a good assumption.


> It would be helpful if the DOM produced by processing whatever HTML5 
> syntax for SVG is adopted would be the same as the one produced by 
> processing the XML serialization of inline SVG inside XHTML today.  
> Feel free to challenge that assumption.

I agree that it would be helpful.


> With all these unknowns and variables and assumptions in mind...
> 
> If you take that document (i.e., "Bart-logo") and process it using an 
> HTML5 parser as HTML 5 is defined today, the biggest structural 
> discrepancy between the document that is produced by the HTML5 parser 
> and an XHTML5 parser is that empty elements are not processed as void 
> elements, but instead are processed as open tags.

I don't know that that's the biggest problem, but it certainly is a big 
problem, yes.


> This problem is common enough that users would be extremely frustrated 
> if this were not addressed.  Feel free to challenge that assumption.

I think this issue, along with several others, would be big issues, yes.


> The number of unique occurances (in terms of element names) of empty 
> elements in that small document alone indicate that an exhaustive list 
> of all possible void elements is not practical.

For SVG, I agree. The SVG language doesn't really have some elements that 
are always empty and some that always have children. Typical XML 
serialisations of SVG use the /> syntax in a manner that cannot be 
predicted from the tag name and which is not similar to how HTML generally 
works today.


> All of this argues for a new processing "state" for handling SVG 
> content. Perhaps such a state could also be useful for MathML.  And 
> perhaps even for future vocabularies that might be created, either now 
> or in the future.

I agree that such a hypothetical mode, if one could be found, would be a 
superior way of solving the problem of embedding SVG and MathML into 
text/html than the current working proposal of hard-coding tag names into 
the tree constructor.


> If there is a new state, there needs to be a trigger.  You've rejected 
> <math> as a simple trigger.  Perhaps <svg> would work.  Perhaps not.  
> In prior discussions you rejected xmlns attributes as a trigger.  The 
> overwheling number of problematic xmlns attributes at that time had a 
> value of "".  Others had values that I did not recognize.  Or the XHTML 
> namespace.
>
> Any trigger has the potential for generating potential false positives.  
> In the case of MathML and SVG, it migth be useful to see if xmlns 
> attributes with the specific values specified for those standards 
> generates any false positives.  In particular, it would be clearly be 
> problematic if such an tag were not closed.  Let's proceed under the 
> assumption that such a trigger can be found.

This is the assumption that I have the most trouble with in your e-mail.

Even if we find such a trigger, it doesn't solve the problem, because if 
someone (author A) using a new browser writes a page that uses this 
feature, and then someone (author B) using an old browser copies and 
pastes from A's page into his page, accidentally including the trigger, 
the second page will look fine to most users, but to the users of the new 
browser, it will be broken.

Say the trigger is <newsyntax>. Now assume someone writes:

 <p>foo <newsyntax> ... </newsyntax> bar </p>

...and that such a page works well in new browsers. Given how people copy 
and paste content on the Web, especially how people copy and paste _new_ 
syntax on the Web, even before it is implemented, it is very likely that 
someone will copy just the "foo" part, accidentally including the 
<newsyntax> bit:

 <p>bla bla foo <newsyntax> bla bla </p>

This will now effectively "poison" the <newsyntax> idea, since the pages 
that result from this cargo-cult copy-and-paste attitude will render badly 
in browsers that support the new syntax.


> And for now, lets make a working assumption that the presence of an 
> xmlns attribute on an element not recognized as being valid in HTML5 and 
> with a value in a short list of (possibly browser and/or installation 
> dependent) known namepaces is an acceptable trigger.

Many unknown elements on the Web today have xmlns="" attributes, so this 
may also not be a good trigger. But let's ignore that for the purposes of 
your e-mail...


> Given this one document and that one trigger and that one additional 
> parsing mode which has exactly one deviation from the current HTML5 
> parsing rules, the DOM that is produced is now vaguely recognizable.
>
> The one key remaining difference is that the value of the namespace for 
> all these elements is wrong.  If, in this new mode, the value of 
> attributes named 'xmlns' (note: without a colon, i.e., not an xmlns: 
> prefix) were respected on that element, and all DOM nodes which are 
> created in this parsing phase which do NOT have an xmlns attribute 
> simply copy the value of the namespace of the parent, then you will have 
> a DOM tree that, while incorrect for RDF and DC and SODIPODI and 
> INKSCAPE elements and attributes, may produce the desired graphic.  
> Feel free to challenge that assumption. But if this turns out to be 
> true, then the amount of specific parsing logic necessary to support 
> this document is fairly small and surgical.

As you yourself point out, your proposal would need to be additionally 
changed to support CDATA blocks and case-sensitive tag and attribute names 
in the parser, presumably triggered on the the presence of this 
aforementioned mythical trigger. For SVG we would also need to somehow 
support namespaced attributes, though I don't know how we'd do that 
(xlink:href in particular). We'd also need to define how to process HTML 
or other namespaces nested in the block covered by this trigger, as well 
as defining how to handle mis-nested tags in this mode.


I understand the problem. It's the solution I don't understand. Even what 
you propose here, even after ignoring the several assumptions which 
present what I consider fatal problems, is incomplete. I have yet to see a 
full workable solution proposed. I also can't ignore the above assumptions 
in the final solution.


> But before going too far into the weeds, it would be helpful if some 
> more of the requirements and assumptions which are identified not only 
> in this email but also on New_Vocabularies page were verified.

If there's a problem on New_Vocabularies, let me know. The page reflects 
everything that I've been able to find in the 400+ e-mails I've read so 
far on the subject.

-- 
Ian Hickson               U+1047E                )\._.,--....,'``.    fL
http://ln.hixie.ch/       U+263A                /,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'
Received on Thursday, 3 April 2008 03:08:18 UTC