HTML/XML Task Force Minutes, 21 Dec 2010



                                   - DRAFT -

                              XML/HTML Task Force

Meeting 1, 21 Dec 2010

   See also: [2]IRC log


           Norm, James, Mike Champion, Yves, Michael Kay, Henri

           Noah, Robin




     * [3]Topics
     * [4]Summary of Action Items


   Chair apologizes for lack of agenda and careful planning in prep for this
   meeting. Expresses goal as simply an initial meeting, continuing the
   conversations that have started on the list about our goals as a task

   MKay: Could you give us some background?

   Norm attempts to describe some of the background of the task force. It
   arose from the TAG issue [5]HTML-XML-Divergence-67.

   MC: TV Raman led a discussion on AC-Forum back in the April time-frame.

   James: Perhaps someone could make that discussion public, as I don't have
   member access.

   MC: It may all have been copied to www-tag

   <scribe> ACTION: Norm to review the ac-forum mail and see if he can
   summarize what wasn't made public. [recorded in

   <hsivonen> see also the tag list (as opposed to www-tag)

   Scribe struggles to work out the right level of detail for scribing this
   meeting. Probably unsuccessfully.

   Some discussion of what we imagine the TAG's goal to have been in creating
   the task force.

   Henri observes that there are two plausible goals: adding namespaces to
   HTML and making it possible to parse HTML with an XML parser.

   Henri: It appears that the popularity of namespaces is waning even in the
   XML community, so it doesn't make sense to add it to HTML.
   ... And it seems unlikely that the majority of HTML authors are going to
   produce XML-well-formed content, so that's not likely to be broadly

   <jcowan> +1 to Henri's points

   Henri: I think something like tagsoup or my HTML5 parser that exposes an
   XML stream from HTML5 is a more likely to be successful approach.

   <hsivonen> for the record, I think neither goal is "plausible" as a goal
   to pursue. they are goals I've heard from TAG members. :-)

   James: Two goals expressed to me: figure out how to use an XML toolchain
   to produce web pages and in the future how to reduce the divergence.
   ... Looking forward ten or twelve years, I think we should be thinking
   about how to make things better in the long run.

   <jcowan> We already know how people process HTML as XML: they use TagSoup
   or Tidy or NekoHTML.

   JCowan: I think convergence has a use beyond parsing the wild web; it's
   true it only works in closed contexts, but there are a lot of those.
   ... the ability to embed HTML as a rich text island in "data XML" is a
   valuable thing and I think there should be a standard way to do this.
   ... Polyglot documents focus on XML validity which I'm inclined to think
   is less valuable than it used to be. I'm more interested in XML
   well-formedness and HTML validity.

   Yves: During the last TAG f2f we discussed the issue. I rember that Raman
   that having two different stacks, one for XML and one for HTML was costing
   a lot to all parties involved.
   ... He wanted more compatibility between tools and libraries.
   ... At least that was my understanding.

   Henri: Two points: first, it sounds like the existence of XHTML5 is
   getting forgotten. The HTML5 WG is already defining XHTML5 alongside
   HTML5. There's already a way to express the whole HTML5 vocabulary in XML.
   ... The main difference is that you can have namespaces that the parser
   can't output. There are some fringe differences that you can have in HTML
   but not in XML, for example the FF character is whitespace in HTML but not
   ... So you can do distributed extensibility with HTML and you can embed
   HTML in XML with XHTML5.
   ... Second, the question about software stacks, I think the problem is
   that people think that we're adding stuff when they see HTML5. But it
   doesn't add a stack, it documents the existing stack.
   ... XML is the second stack, but it's not useful to point fingers about
   which is first or second, except to recognize that HTML5 isn't adding
   ... Both stacks are more than a decade old, so neither is being added. One
   is simply being documented at this point. I think it's way past the point
   of avoiding adding a second stack.
   ... There are already at least three stacks and different communities:
   HTML, XML, and RDF. Treating the situation as if something is being added
   isn't really productive, I don't think.

   JCowan: While those are all valid points, it seems to me that
   characterizing browser behavior as a stack makes it a kind of truncated
   stack. It simply renders. There's no transformation facility or other
   post-processing steps that can interevene.

   Henri: The situation before the HTML5 spec is that IE was implementing DOM
   Level 1 so IE didn't recognize DOM Level 2 in the implementation sense.
   But gecko, presto, and webkit were implementing DOM Level 2.
   ... So in all browsers except IE, the view to the data model has been the
   same for years. There were inconsistencies across the XML/HTML data
   models, especially with respect to namespaces.
   ... HTML5 has codified the resolution of these inconsistencies. Now the
   data model is the same for XML or HTML, with a few small differences in
   the details.
   ... Once the parser is done, the data model is the same now. That's
   something that's an achievement of HTML5. The same approach already
   existed on the non-browser side.
   ... First tagsoup and now HTML5 conformant parsers provide the same kind
   of API for both XML and HTML5. So I think we've gone a long way to unify
   the data model.
   ... This means that as far as the stack goes, we've already done much of
   the unification. You can, for example, use an XSLT engine on HTML5 using
   the output of my HTML5 parser. It just works, whether the input is XML or
   ... I think it's a win that the stack is shallow, limited just to the
   parser and the serializer.
   ... The question is can we unify the parser and the serializer? I think we
   could unify the serializer, but it seems unlikely to me that we can get
   more unification on the parser side. It would do violence to one side or
   the other.

   Norm: I sometimes struggle to see what we should do, on the one hand long
   term harmonization seems like ti would be good, on the other, in the short
   term Henri's HTML5 parser and an HTML5 serializer do sort of "fix" the
   problem of how to read/write HTML5/XML together.

   JCowan: That makes me think that a possible outcome is a set of
   recommendations for the XML toolset to be able to serialize HTML5 instead
   of the current HTML serializer which is incomplete.

   <hsivonen> XSLT should definitely get an HTML5 output mode

   Norm: Yes, clearly the XML serialization spec could/would/should/will get
   an "HTML5" serialization method.

   MKay: Yes. We decided a year ago that it was too early to start looking at
   that, if we looked again now we might feel differently.

   James: I don't agree with Henri; I think there's plenty that one can do to
   make things better. But the way to go forward on that is probably to make
   some concrete use cases as Noah suggested.

   Norm: Yes, perhaps some use cases would be a good work item.

   MKay: I think one of the use cases is the one John Cowan mentioned, that
   is handling files that are data rich but include rich textual parts.
   ... The other is the inverse of that, rich textual files that contain data
   either XML or RDF. Whether it's an existing XML vocabulary or a new one or
   a user defined one.
   ... An important part of that is looking not just at the formats on the
   wire but also at the programming experience: both in generation and
   ... We need to look at that whole picture from the perspective of
   processing, not just syntax on the wire.

   Henri: Do you mean browsers providing a way to edit non-HTML data
   natively? Or do you mean JavaScript that might provide editing for the
   private data?

   MKay: I mean the whole spectrum from wikis and form-based data across the
   whole spectrum.

   Henri: The editing story for HTML is actually rather bad in terms of what
   actually works. I wouldn't expect browsers to be interested in addressing
   problems beyond editing HTML5 and perhaps SVG for a long time because
   they've already got lots of issues.

   MKay: So there's room for improvement?

   Henri: Yes, but I wouldn't expect generic editing to become part of the
   browser feature set anytime soon beyond what comes along naturally.

   MKay: Perhaps architecturally what we'll see is editors as a client tool
   become a separate kind of tool from browsers.

   Henri: I'd expect editing in the browser to be custom JavaScript.

   Norm: What can we glean from the past 40 minutes or so for next steps?
   ... use cases seems like a possibility.

   MChampion: I had some good conversations at TPAC about some specific
   ... Could we write down and triage some of those?

   Henri: Terminology-wise, "foreign" means MathML and SVG.

   Norm: Is there a term for random XML?

   Henri: No, because it's not possible in text/html.
   ... The specific issue that David Carlisle mentioned is about
   non-intuitive error handling.
   ... If you stick to the cases where HTML5 is expected in foreign markup,
   then things work ok now.
   ... The error handling isn't intuitive if you put them elsewhere.

   JCowan: And is it to late to fix this in HTML5?

   Henri: It's not a bug, it's a feature. It minimizes the risk to getting
   mathml and svg support deployed in browsers.
   ... There is existing web content that contains math or svg tags. In order
   to keep those pages more-or-less backwards compatible, we have to have the
   current rules.

   Henri: The counter-intuitive behavior only arises if the document is an
   error. If you try to do sensible stuff, you don't see this behavior.
   ... Even if we decided it was a problem, it would be too late to fix it.
   It's already shipping in Chrome and will ship in Firefox 4.

   James: I'm troubled by this idea that there's nothing that can be changed
   in HTML5. HTML5 is a WD, if the W3C process means anything, the idea that
   something is frozen and static before it gets into last call is off base.
   ... I also completely disagree that one has to be constrained by what
   existing browsers do. There used to be two modes but folks have judged
   that that's not good. But the case could be made for the other decision.
   ... The idea that there should be one mode and standards mode should be
   quirky is very disappointing.

   JCowan: I think there's a distinction between prospective and
   retrospective standardization. This is retrospective standardization and
   that does make things less fixable.
   ... This may come to an end at some point, but I don't think it's
   appropriate to complain that they're not behaving like a prospective
   standardization group. They aren't because that's not where we are.

   Henri: As far as the process goes, I think the W3C process is out of touch
   with reality as far as the implementation overlap with the specification
   process goes.
   ... In theory you're supposed to start implementing after CR. But in
   practice, for something as complex as a browser, you need to have a
   constant feedback cycle.
   ... It's unfortunate that the process document doesn't recognize this.
   ... It seems that the HTML5 WG gets more scrutiny on this point; I think
   the problem isnt the WG but the process document.
   ... About the modes: there's a big difference between browser vendors on
   this point. In IE8, there are 4 modes; I think there are 7 in IE9. Other
   vendors with the experience of having 2.5 or 3 modes, have been pushing to
   remove modes.

   <hsivonen> [7]

   Henri: I think it's unrealistic for a WG or process to impose modes. Doing
   HTML5 with no new modes is how it has to be.

   <MikeK> I regret I have to leave you now for another call. I'll stick
   around on IRC

   MChampion: I think to address Henri's point. This is implementation
   feedback, this is rapid integration with the waterfall model. There's a
   problem with real use cases. This isn't even a LC WD, in principle it
   should be open to a bug report from the XML community saying that this
   isn't going to work, especially if a reasonable fix was proposed.
   ... I think it would be reasonable for this TF to triage the problem
   report. Does it effect enough users? Is it worth fixing, even if it
   introduces some churn in the HTML5 spec?
   ... I wouldn't propose or preclude any particular solution. The mission
   I'd like to see for this TF is to assess how severe the problem is and to
   see if a solution can be proposed.
   ... It may be too hard to change, but I don't think we should make that
   decision apriori.

   Norm: We're losing folks.


Summary of Action Items

   [NEW] ACTION: Norm to review the ac-forum mail and see if he can summarize
   what wasn't made public. [recorded in

   [End of minutes]


    Minutes formatted by David Booth's [9]scribe.perl version 1.135 ([10]CVS
    $Date: 2010/12/21 20:35:20 $



Received on Tuesday, 21 December 2010 21:10:28 UTC