Re: Revised Agenda for XML Core WG face-to-face meeting of 2012 October 29/30

Norman Walsh <ndw@nwalsh.com> writes:
> We had an XML Core WG face-to-face meeting at W3C TPAC in
> Lyon, France, for Monday, 29 October and Tuesday, 30 October.
>
> Minutes
> =======

Present: Liam, Henry, Jirka, Norm
Regrets: Paul, Glenn, John, Daniel

> 1. Accepting the minutes from the last telcon [3] and
>   the current task status [2] (have any questions, comments,
>   or corrections ready by the beginning of the meeting).

Any questions or comments about the minutes of the last meeting?

None heard, accepted.

Any questions or comments about today's revised agenda?

Jirka asks to add an agenda item to discuss names that begin with xml,
such as "xml-data" in one of the Microsoft formats.

> 2. Miscellaneous administrivia and document reviews.

There's a W3C developer meetup this evening.

No document reviews mentioned.

> 3. XInclude 1.1--see http://www.w3.org/XML/Group/Core#xinclude
>
>    Consider the substantive changes hinted at by the note in
>    section 4.5, namely using MIME content-types for the value
>    of the parse attribute and associating the fragment
>    identifier syntax with the MIME content type.

Norm reviews a summary of Daniel's comments[4].

General nods of agreement.

Some discussion of how we deal with MIME content type values.

We need to say that XInclude will attempt to treat the resource as the specified
content type. The purpose of finer granularity in parse types is to allow additional
fragment identifier syntaxes to be used.

Appeal to media type hierarchy: you may know the media type or you may know the
suffix or you may know the family.

That's the fallback story for media types; we already have a fallback story for
fragment identifier syntaxes that aren't understood.

Media types for which you can't understand fallback are treated as recoverable
errors.

ACTION: Norm to revise the draft to use media types for the parse attribute.

>    Consider the backwards/forwards compatibility story.

We think the use of media types finesses this issue. It certainly
seems a better compromise than a new namespace or a version attribute.

> 4. xml-stylesheet and HTML5

Some discussion of the existence of test cases for the stylesheet PI and
what those tests would mean. Absence of a concrete spec which answers questions
such as https://www.w3.org/Bugs/Public/show_bug.cgi?id=14689#c8 is possibly
a stumbling block.

Liam: I guess my question is, what's the minimum change needed for us to
be happy with this version? For me, I'd be ok with a non-normative statement
that the xml-stylesheet PI may also be used to point to XSLT stylesheets.

Henry: We need to find out where the xml-stylesheet PI is even mentioned
in the spec.

Liam: What we need to avoid is a normative reference to CSS without a
reference to XSLT. I think we need to make sure that the editor understands
that we need to say something at a higher level. There's a lot of work that
could be done to improve interoperability, and that's where test case would
be involved, but that doesn't need to be in the first recommendation.

Henry: I believe that from the HTML perspective, the relevant spec is
the CSS Object Model, http://dev.w3.org/csswg/cssom/

Norm: So are we happy if the mention of XSLT is in the CSS OM spec?

Henry: I think that's up to them.

Norm: If we're content that the reference can go in this spec, then I think
simply saying that CSS and XSLT are among the possible "supported styling
languages" might be enough.

Henry: It's very odd that the only place XSLT is mentioned is in a
section titled "CSS". This looks like, "for CSS processors, here's what
we say about the xml-stylesheet PI". It's badly scoped to have a section 
that talks about stylesheet languages in general in a section titled CSS.

Some further discussion of the various documents.

Henry: The CSS OM document has a model of "merging stylesheets", but
we don't do that. There are three layers: pointing to stylesheets;
selection: among those that you've pointed to, how do you select a
subset; and if that subset has more than element, how do you combine
them?

... We only care about the first two. It doesn't make any sense at all
to try to do combination. The relevant question is: if there's more
than one, then which one do you use. Right now, the implementations
are split on first or last.

Liam: I think it's up to the XSLT WG to decide what it means if you
have more than one stylesheet.

Norm: So the CSS OM is not where this reference belongs, agreed?

Henry: Well. The CSS OM spec should really be called the Stylesheet
Object Model. Until you get to combination there's very little that's
CSS specific. It covers all the ways there are of getting stylesheets
into the object model, it covers how you select from them, and then it
goes on to talk about you combine them.

... I don't mind what the spec is called, as long as it's clear at
the end of the day.

Liam: I don't think we'll get objections to adding a sentence to the
CSS OM spec to say that XSLT is one of the possible stylesheet
languages. That's the pragmatic position.

Henry: I'd still like to get something normative and more substantial.

>    How hard do we want to press the HTML5 WG?

Norm: I'm not sure where things stand now.

Liam: The HTML spec does refer to the xml-stylesheet PI. It doesn't
clearly mention that you can get both CSS and XSLT out of it. It
seems that what we'd like is something normative said about what
it means if a stylesheet type of XSLT comes back.

Henry: The HTML5 spec deals with these things called stylesheet objects.
That's what the CSS OM spec defines. That spec and the HTML5 spec agree
that stylesheet objects have a stylesheet type.

Liam: HTML5 is done by the HTML WG. CSS OM is done by the CSS WG.

Henry: We don't need any change to CSS OM at all. It tells us what we
need to know: if we give it an xml-stylesheet PI with a type of
"application/xslt" we get a stylesheet object with a stylesheet type
of XSLT. Then the question is, where in the HTML5 spec does it say
what happens if you have a stylesheet object of that type?

... I can't figure out where stylesheets of *any* type get their bite
in the HTML5 spec.

Some discussion of the relationship between browsing contexts,
stylesheet objects, event loops, rendering, etc.

Henry: The breadcrumbs necessary to answer this question are not easy
to follow. The stylesheet object is referred to obliquely in a list
in a section that talks about dependencies, 2.2.2.

Norm: The problem is that what CSS do and what XSLT do are very different.

Jirka: I think the most logical place is to put it in 5.6.3, page load
processing model for XML files.

More discussion of what and where we might say something.

Henry: Isn't this parallel to the syndication feed case in the paragraph
after the note? I think we want another one of these.

... We need to back up to 5.6.1 and look at step 19. We're still going
to have to do something that's not allowed here. What we want is to follow
the steps in the HTML case.

Liam: It seems to me that in practice, they go to the next step.

Further discussion. It seems we want 5.6.3 then 5.6.2 or 5.6.3.

Henry: Bear with me, suppose we were going to say that we were going to
render this by thinking of the XSLT process as a plugin. I think 5.6.7 is
closer to what actually happens than anything else.

Norm: Maybe what we need is a new peer to section 5.6.7 called
"Page load processing model for content that uses XSLT stylesheets".

Henry: But 5.6.7 looks like the best model.

>    Can we coordinate a discussion with HTML5 folks this week?

Liam has talked to Mike Smith. Norm will talk to the editor.

The HTML WG isn't going to accept normative spec changes, so we should
just work on a non-normative note.

Henry: I think we should work on some use cases and see if we can help
get some normative text moving forward for at least some future draft.

Liam: I think we've probably outlined what we think is the best
long-term solution is, in broad strokes [see above, --scribe]. We're
not likely to get the WG as a whole to agree to another normative
section at the moment. If that's the case, I think we should try to
figure out what the section should look like. By the time we've
figured it out, HTML will be even closer to being frozen. In the
meantime, I think we should try to craft a non-normative sentence that
broadly describes what current behavior is.

Liam: Section 10 is a non-normative section about rendering with CSS,
so I think we should be able to have a similar statement about XSLT
at the same level of conformance.

Norm: I think the green "Note" sections are non-normative. In 5.6.3,
how about we propose:

  Note: Many existing user agents support the 'text/xsl' (or
  'application/xslt+xml') style sheet type, with XSLT [ref] as the
  relevant supported styling language. When the browsing context has a
  StyleSheet of that style sheet type, such agents transform the
  current XML document using the XSLT stylesheet retrieved from the
  style sheet location (typically supplied via an xml-stylesheet
  processing instruction) and rendering (or otherwise processing) the
  document that results from that transformation. The precise details
  of this process will be defined in a future specification.

General agreement that this is ok.

ACTION: Norm to pass this note along to the HTML5 WG.

Henry: I'd like you to include the fact that we'll continue to help
provide additional test cases to aid in the development of the future
specification.

> 5. Error recovery note
>
>    Consider Liam's suggestion to document error recovery.
>    See http://lists.w3.org/Archives/Public/public-xml-core-wg/2012Sep/0002

Liam: I looked at the Amsterdam Web Corpus for a Balisage 2012 paper.
After allowing for various sorts of errors, I looked at what percentages
of various XML content types were not well formed. Most were RSS and Atom.

... I'd be happy if web browsers were to treat RSS and HTML specially.
They already treat HTML specially.

... Most XML on the web is well-formed. If you except RSS and HTML, we
get to more than 90% is well formed. Of the rest, some is pastebins
and such where you don't expect it to be informed.

... I don't think well-formedness is a big problem on the web.

... There were 11,000 bad RSS documents and about 58,000 good ones.

Henry: Roughly a sixth are bad.

Liam: To put that in an interesting and useful perspective, for urlset
documents (Google sitemap) there are 41,700 good ones and 491 bad ones.
Because there's economic incentive to fix urlset documents. And because
RSS readers fix errors.

Liam: There are also a bunch of XML documents that the Amsterdam corpus
labels as broken that are in fact ok if you get the encoding correct.

Liam: My proposal is not to change XML. We all know that a document with
a well-formedness error is not an XML document. And we also know that the
XML Recommendation doesn't apply to documents that aren't XML. Except
that you can't call a non-wellformed document "XML" because it isn't.

... I think the answer is: a web browser that gets something that
isn't well formed, it needs to give indication, for example in the
developer console, along the lines of "document was not XML, doing
recovery". And it may then, at that point, process the document in any
non-XML way it likes, including generating a conceptually new XML
document that is well-formed.

... What you must not do is say that the original resource was XML.
That might mean, for example, taking away the XML base property.
I don't know. Marking the DOM in some clear way. And of course issuing
a warning message.

Some discussion of how a DOM could be marked as not-XML.

Henry: Why is this better than doing nothing?

Liam: My concern, purely with my XML activity hat on, is I don't want
to see error recovery being used in a non-interactive word. Even though
it would be great for browser vendors, I don't want to change XML to
allow error correction. I don't think it's appropriate for the majority
of XML use cases.

Jirka: So you don't want an XML-ER spec?

Liam: I'm worried about it. I want the developer to know that there
was a syntax error. I know it could easily get lost, but I'd still
like it to be there.

... you can't silently correct errors, but you can if you give a
warning.

... I'm suggesting a working group note explaining the bounds of
possibilities, explaining what the spec does and doesn't say.

Some discussion of how this relates to the XML-ER work.

Liam: The only thing I really wanted to try to head off was parsers
that don't maintain the distinction between well-formed and
not-well-formed.

Liam: Having an error message and marking it in the DOM gives
JavaScript and other applications a chance to do something useful with
this information.

Norm: So you're proposing a Note and you're willing to edit it?

Liam: Yes, but I'm willing to do something else. We now have a wiki.
We could make it a wiki page.

Norm: So how about this: create a wiki page with roughly the sort of
text you'd like to see in a Note, if we did a Note, and then see where
we come out.

Liam: I'm happy with that.

Henry: I'm skeptical about how helpful that will be. My feeling is
that your proposal will be, in practice, that the XML community has
endorsed error recovery for XML. I think that that's worse than the
status quo.

... But this space is evolving and it's unclear where we are at the
moment with respect to whether there are problems that need to be
solved or not.

Liam: I hear you and it may be that we decide collectively that it's
not necessary or useful to do what I'm suggesting.

[ Recess for lunch ]

> 6. RFC 3023bis and LEIRIs
>
>    Any progress to discuss?

Henry: I've been added as an editor and there is now a new draft,
draft-lilley-xml-mediatypes-00 at
http://tools.ietf.org/html/draft-lilley-xml-mediatypes-00

From the status section:

   Major differences from [RFC3023] are alignment of charset handling
   for text/xml and text/xml-external-parsed-entity with application/
   xml, the addition of XPointer and XML Base as fragment identifiers
   and base URIs, respectively, mention of the XPointer Registry, and
   updating of many references.

Henry: Most of these are not new. There are two major, recent changes.
One is that as a result of sensible movement within the IETF community
text/xml and text/xml-external-parsed-entity are no longer deprecated.
(Because the underlying specs have changed or are changing so that
ASCII and ISO Latin 1 are no longer the defaults for text/ MIME types.)

... The other change is to deal with the fragment ID issues better.
Wherever you have a suffixed type (e.g., +xml), there's one in the
background, the one that applies to the suffix, in this case
application/xml.

There are three relevant specs:

  1. foo/baz+suffix
  2. +suffix
  3. and the spec from which +suffix is derived, e.g. application/suffix

The key move was to say that it is possible, for barenames at least,
to make the fallback on a per-link basis. So "...#foo" for RDF/XML
can have RDF semantics even if other links have different semantics.

Next action is to get this through the IETF process.

For IRIs, that one's boiled up again and now the IETF and WHAT WG[5] have
conflicting specs for resource identifiers. That will have to be
resolved in some way.

> 7. MicroXML
>
>    Is there anything we need or want to say?

Norm: It's such a small subset that I don't think it's very
interesting: no namespaces, no processing instructions, no colons in
names, no general entities, and *only* UTF-8.

Henry: Getting rid of general entities will simplify the parser and
getting rid of namespaces will make it faster. Fixing charsets really
does make it the work of a competent graduate student to write a
parser for it.

Henry: The only thing I'd say is that if they do decide to standardize
it, we should do it. We do own that name. I wouldn't have a problem
with publishing such a spec.

Henry: A pointy-bracket alternative to JSON is presumably one of
the goals.

> 8. XML-ER
>
>    (Lack of) status report and any discussion.

Norm: Not much has happened, it's not clear that the community
interest persists. Liam's ideas are a kind of counter-proposal. I
think the ball is in my court but it's not clear how much I'll be able
to contribute in the immediate future.

9. Names that begin [Xx][Mm][Ll]

Henry: The XML spec does say "Names beginning with the string "xml",
or with any string which would match (('X'|'x') ('M'|'m') ('L'|'l')),
are reserved for standardization in this or future versions of this
specification."

... Two things follow from that: anything that approaches
standardization of those names have to go through us, and if you use
such a name and don't try to standardize through us, you may find
yourself stepped all over by us.

Jirka: On the other hand, this is before namespaces. And there are already
thousands of documents that start with "XML" so...

... I think there are two points of confusion: some applications emit
warnings for elements that start with "xml". We could say that we
don't mean that. Secondly, we may think about revising this
limitation. Every couple of years there's some XML format that is
quite widely used that has some element or attribute that starts with
XML.

... So we could keep it and say it's a mistake or we could remove it
because there are namespaces and namespaces can be used to create
elements and attributes with special meanings.

Henry: I expect that there's no contention that an element or
attribute that starts with "xml" is not a well-formedness error. Also,
I'm always skeptical when I see a direct negative assertion in a spec.
The only justification for negatives is if the rest of the spec is so
badly written that it's easier to make it clear in this way.

... As for going the next step, saying that this statement doesn't
license warnings or stylistic changes for XML processors, it seems
entirely reasonable for me to write warnings about this. Warnings
about using a reserved name not yet defined are applicable.

Norm: proposed erratum

  s/in this or future versions of this specification/
   /in this or future specifications from the XML Core WG or its successors/

Henry: I'm perfectly happy to entertain a motion to remove this from
this specification and retain the "xml:" prefix only for elements and
attributes and "xml-" only for PI targets.

Norm: I'd prefer to make explicit that you *can* write names that
begin "xml", but doing so exposes you to being walked on in the
future. So don't do that.

Jirka: I can go either way, it's always been a restriction, users
should know better, but there are lots of documents that use it, so we
should adapt to common practice.

ACTION: Jirka to start an email thread about this issue on the Core WG.

[10] Any other business

None heard.

Any reason to reconvene tomorrow?

None heard.

Meeting adjourned.

> [1] http://www.w3.org/XML/Group/Core
> [2] http://www.w3.org/XML/Group/Core#tasks
> [3] http://lists.w3.org/Archives/Public/public-xml-core-wg/2012Oct/0009.html
[4] http://lists.w3.org/Archives/Public/public-xml-core-wg/2012Oct/0008.html
[5] http://url.spec.whatwg.org/

Received on Monday, 29 October 2012 14:19:26 UTC