- From: Norman Walsh <ndw@nwalsh.com>
- Date: Tue, 21 Dec 2010 16:09:53 -0500
- To: public-html-xml@w3.org
- Message-ID: <m2wrn2esta.fsf@nwalsh.com>
See http://www.w3.org/2010/html-xml/12/21-minutes
[1]W3C
- DRAFT -
XML/HTML Task Force
Meeting 1, 21 Dec 2010
See also: [2]IRC log
Attendees
Present
Norm, James, Mike Champion, Yves, Michael Kay, Henri
Regrets
Noah, Robin
Chair
Norm
Scribe
Norm
Contents
* [3]Topics
* [4]Summary of Action Items
--------------------------------------------------------------------------
Chair apologizes for lack of agenda and careful planning in prep for this
meeting. Expresses goal as simply an initial meeting, continuing the
conversations that have started on the list about our goals as a task
force.
MKay: Could you give us some background?
Norm attempts to describe some of the background of the task force. It
arose from the TAG issue [5]HTML-XML-Divergence-67.
MC: TV Raman led a discussion on AC-Forum back in the April time-frame.
James: Perhaps someone could make that discussion public, as I don't have
member access.
MC: It may all have been copied to www-tag
<scribe> ACTION: Norm to review the ac-forum mail and see if he can
summarize what wasn't made public. [recorded in
[6]http://www.w3.org/2010/12/21-html-xml-minutes.html#action01]
<hsivonen> see also the tag list (as opposed to www-tag)
Scribe struggles to work out the right level of detail for scribing this
meeting. Probably unsuccessfully.
Some discussion of what we imagine the TAG's goal to have been in creating
the task force.
Henri observes that there are two plausible goals: adding namespaces to
HTML and making it possible to parse HTML with an XML parser.
Henri: It appears that the popularity of namespaces is waning even in the
XML community, so it doesn't make sense to add it to HTML.
... And it seems unlikely that the majority of HTML authors are going to
produce XML-well-formed content, so that's not likely to be broadly
successful.
<jcowan> +1 to Henri's points
Henri: I think something like tagsoup or my HTML5 parser that exposes an
XML stream from HTML5 is a more likely to be successful approach.
<hsivonen> for the record, I think neither goal is "plausible" as a goal
to pursue. they are goals I've heard from TAG members. :-)
James: Two goals expressed to me: figure out how to use an XML toolchain
to produce web pages and in the future how to reduce the divergence.
... Looking forward ten or twelve years, I think we should be thinking
about how to make things better in the long run.
<jcowan> We already know how people process HTML as XML: they use TagSoup
or Tidy or NekoHTML.
JCowan: I think convergence has a use beyond parsing the wild web; it's
true it only works in closed contexts, but there are a lot of those.
... the ability to embed HTML as a rich text island in "data XML" is a
valuable thing and I think there should be a standard way to do this.
... Polyglot documents focus on XML validity which I'm inclined to think
is less valuable than it used to be. I'm more interested in XML
well-formedness and HTML validity.
Yves: During the last TAG f2f we discussed the issue. I rember that Raman
that having two different stacks, one for XML and one for HTML was costing
a lot to all parties involved.
... He wanted more compatibility between tools and libraries.
... At least that was my understanding.
Henri: Two points: first, it sounds like the existence of XHTML5 is
getting forgotten. The HTML5 WG is already defining XHTML5 alongside
HTML5. There's already a way to express the whole HTML5 vocabulary in XML.
... The main difference is that you can have namespaces that the parser
can't output. There are some fringe differences that you can have in HTML
but not in XML, for example the FF character is whitespace in HTML but not
XML.
... So you can do distributed extensibility with HTML and you can embed
HTML in XML with XHTML5.
... Second, the question about software stacks, I think the problem is
that people think that we're adding stuff when they see HTML5. But it
doesn't add a stack, it documents the existing stack.
... XML is the second stack, but it's not useful to point fingers about
which is first or second, except to recognize that HTML5 isn't adding
stuff.
... Both stacks are more than a decade old, so neither is being added. One
is simply being documented at this point. I think it's way past the point
of avoiding adding a second stack.
... There are already at least three stacks and different communities:
HTML, XML, and RDF. Treating the situation as if something is being added
isn't really productive, I don't think.
JCowan: While those are all valid points, it seems to me that
characterizing browser behavior as a stack makes it a kind of truncated
stack. It simply renders. There's no transformation facility or other
post-processing steps that can interevene.
Henri: The situation before the HTML5 spec is that IE was implementing DOM
Level 1 so IE didn't recognize DOM Level 2 in the implementation sense.
But gecko, presto, and webkit were implementing DOM Level 2.
... So in all browsers except IE, the view to the data model has been the
same for years. There were inconsistencies across the XML/HTML data
models, especially with respect to namespaces.
... HTML5 has codified the resolution of these inconsistencies. Now the
data model is the same for XML or HTML, with a few small differences in
the details.
... Once the parser is done, the data model is the same now. That's
something that's an achievement of HTML5. The same approach already
existed on the non-browser side.
... First tagsoup and now HTML5 conformant parsers provide the same kind
of API for both XML and HTML5. So I think we've gone a long way to unify
the data model.
... This means that as far as the stack goes, we've already done much of
the unification. You can, for example, use an XSLT engine on HTML5 using
the output of my HTML5 parser. It just works, whether the input is XML or
HTML5.
... I think it's a win that the stack is shallow, limited just to the
parser and the serializer.
... The question is can we unify the parser and the serializer? I think we
could unify the serializer, but it seems unlikely to me that we can get
more unification on the parser side. It would do violence to one side or
the other.
Norm: I sometimes struggle to see what we should do, on the one hand long
term harmonization seems like ti would be good, on the other, in the short
term Henri's HTML5 parser and an HTML5 serializer do sort of "fix" the
problem of how to read/write HTML5/XML together.
JCowan: That makes me think that a possible outcome is a set of
recommendations for the XML toolset to be able to serialize HTML5 instead
of the current HTML serializer which is incomplete.
<hsivonen> XSLT should definitely get an HTML5 output mode
Norm: Yes, clearly the XML serialization spec could/would/should/will get
an "HTML5" serialization method.
MKay: Yes. We decided a year ago that it was too early to start looking at
that, if we looked again now we might feel differently.
James: I don't agree with Henri; I think there's plenty that one can do to
make things better. But the way to go forward on that is probably to make
some concrete use cases as Noah suggested.
Norm: Yes, perhaps some use cases would be a good work item.
MKay: I think one of the use cases is the one John Cowan mentioned, that
is handling files that are data rich but include rich textual parts.
... The other is the inverse of that, rich textual files that contain data
either XML or RDF. Whether it's an existing XML vocabulary or a new one or
a user defined one.
... An important part of that is looking not just at the formats on the
wire but also at the programming experience: both in generation and
consuming/rendering.
... We need to look at that whole picture from the perspective of
processing, not just syntax on the wire.
Henri: Do you mean browsers providing a way to edit non-HTML data
natively? Or do you mean JavaScript that might provide editing for the
private data?
MKay: I mean the whole spectrum from wikis and form-based data across the
whole spectrum.
Henri: The editing story for HTML is actually rather bad in terms of what
actually works. I wouldn't expect browsers to be interested in addressing
problems beyond editing HTML5 and perhaps SVG for a long time because
they've already got lots of issues.
MKay: So there's room for improvement?
Henri: Yes, but I wouldn't expect generic editing to become part of the
browser feature set anytime soon beyond what comes along naturally.
MKay: Perhaps architecturally what we'll see is editors as a client tool
become a separate kind of tool from browsers.
Henri: I'd expect editing in the browser to be custom JavaScript.
Norm: What can we glean from the past 40 minutes or so for next steps?
... use cases seems like a possibility.
MChampion: I had some good conversations at TPAC about some specific
problems.
... Could we write down and triage some of those?
Henri: Terminology-wise, "foreign" means MathML and SVG.
Norm: Is there a term for random XML?
Henri: No, because it's not possible in text/html.
... The specific issue that David Carlisle mentioned is about
non-intuitive error handling.
... If you stick to the cases where HTML5 is expected in foreign markup,
then things work ok now.
... The error handling isn't intuitive if you put them elsewhere.
JCowan: And is it to late to fix this in HTML5?
Henri: It's not a bug, it's a feature. It minimizes the risk to getting
mathml and svg support deployed in browsers.
... There is existing web content that contains math or svg tags. In order
to keep those pages more-or-less backwards compatible, we have to have the
current rules.
Henri: The counter-intuitive behavior only arises if the document is an
error. If you try to do sensible stuff, you don't see this behavior.
... Even if we decided it was a problem, it would be too late to fix it.
It's already shipping in Chrome and will ship in Firefox 4.
James: I'm troubled by this idea that there's nothing that can be changed
in HTML5. HTML5 is a WD, if the W3C process means anything, the idea that
something is frozen and static before it gets into last call is off base.
... I also completely disagree that one has to be constrained by what
existing browsers do. There used to be two modes but folks have judged
that that's not good. But the case could be made for the other decision.
... The idea that there should be one mode and standards mode should be
quirky is very disappointing.
JCowan: I think there's a distinction between prospective and
retrospective standardization. This is retrospective standardization and
that does make things less fixable.
... This may come to an end at some point, but I don't think it's
appropriate to complain that they're not behaving like a prospective
standardization group. They aren't because that's not where we are.
Henri: As far as the process goes, I think the W3C process is out of touch
with reality as far as the implementation overlap with the specification
process goes.
... In theory you're supposed to start implementing after CR. But in
practice, for something as complex as a browser, you need to have a
constant feedback cycle.
... It's unfortunate that the process document doesn't recognize this.
... It seems that the HTML5 WG gets more scrutiny on this point; I think
the problem isnt the WG but the process document.
... About the modes: there's a big difference between browser vendors on
this point. In IE8, there are 4 modes; I think there are 7 in IE9. Other
vendors with the experience of having 2.5 or 3 modes, have been pushing to
remove modes.
<hsivonen> [7]http://hsivonen.iki.fi/doctype/#ie8
Henri: I think it's unrealistic for a WG or process to impose modes. Doing
HTML5 with no new modes is how it has to be.
<MikeK> I regret I have to leave you now for another call. I'll stick
around on IRC
MChampion: I think to address Henri's point. This is implementation
feedback, this is rapid integration with the waterfall model. There's a
problem with real use cases. This isn't even a LC WD, in principle it
should be open to a bug report from the XML community saying that this
isn't going to work, especially if a reasonable fix was proposed.
... I think it would be reasonable for this TF to triage the problem
report. Does it effect enough users? Is it worth fixing, even if it
introduces some churn in the HTML5 spec?
... I wouldn't propose or preclude any particular solution. The mission
I'd like to see for this TF is to assess how severe the problem is and to
see if a solution can be proposed.
... It may be too hard to change, but I don't think we should make that
decision apriori.
Norm: We're losing folks.
Adjourned.
Summary of Action Items
[NEW] ACTION: Norm to review the ac-forum mail and see if he can summarize
what wasn't made public. [recorded in
[8]http://www.w3.org/2010/12/21-html-xml-minutes.html#action01]
[End of minutes]
--------------------------------------------------------------------------
Minutes formatted by David Booth's [9]scribe.perl version 1.135 ([10]CVS
log)
$Date: 2010/12/21 20:35:20 $
References
1. http://www.w3.org/
2. http://www.w3.org/2010/12/21-html-xml-irc
3. http://www.w3.org/2010/html-xml/12/21-minutes#agenda
4. http://www.w3.org/2010/html-xml/12/21-minutes#ActionSummary
5. http://www.w3.org/2001/tag/group/track/issues/67
6. http://www.w3.org/2010/12/21-html-xml-minutes.html#action01
7. http://hsivonen.iki.fi/doctype/#ie8
8. http://www.w3.org/2010/12/21-html-xml-minutes.html#action01
9. http://dev.w3.org/cvsweb/~checkout~/2002/scribe/scribedoc.htm
10. http://dev.w3.org/cvsweb/2002/scribe/
Received on Tuesday, 21 December 2010 21:10:28 UTC