- From: Norman Walsh <ndw@nwalsh.com>
- Date: Mon, 29 Oct 2012 15:18:53 +0100
- To: public-xml-core-wg@w3.org
- Message-ID: <m2y5iptl82.fsf@nwalsh.com>
Norman Walsh <ndw@nwalsh.com> writes: > We had an XML Core WG face-to-face meeting at W3C TPAC in > Lyon, France, for Monday, 29 October and Tuesday, 30 October. > > Minutes > ======= Present: Liam, Henry, Jirka, Norm Regrets: Paul, Glenn, John, Daniel > 1. Accepting the minutes from the last telcon [3] and > the current task status [2] (have any questions, comments, > or corrections ready by the beginning of the meeting). Any questions or comments about the minutes of the last meeting? None heard, accepted. Any questions or comments about today's revised agenda? Jirka asks to add an agenda item to discuss names that begin with xml, such as "xml-data" in one of the Microsoft formats. > 2. Miscellaneous administrivia and document reviews. There's a W3C developer meetup this evening. No document reviews mentioned. > 3. XInclude 1.1--see http://www.w3.org/XML/Group/Core#xinclude > > Consider the substantive changes hinted at by the note in > section 4.5, namely using MIME content-types for the value > of the parse attribute and associating the fragment > identifier syntax with the MIME content type. Norm reviews a summary of Daniel's comments[4]. General nods of agreement. Some discussion of how we deal with MIME content type values. We need to say that XInclude will attempt to treat the resource as the specified content type. The purpose of finer granularity in parse types is to allow additional fragment identifier syntaxes to be used. Appeal to media type hierarchy: you may know the media type or you may know the suffix or you may know the family. That's the fallback story for media types; we already have a fallback story for fragment identifier syntaxes that aren't understood. Media types for which you can't understand fallback are treated as recoverable errors. ACTION: Norm to revise the draft to use media types for the parse attribute. > Consider the backwards/forwards compatibility story. We think the use of media types finesses this issue. It certainly seems a better compromise than a new namespace or a version attribute. > 4. xml-stylesheet and HTML5 Some discussion of the existence of test cases for the stylesheet PI and what those tests would mean. Absence of a concrete spec which answers questions such as https://www.w3.org/Bugs/Public/show_bug.cgi?id=14689#c8 is possibly a stumbling block. Liam: I guess my question is, what's the minimum change needed for us to be happy with this version? For me, I'd be ok with a non-normative statement that the xml-stylesheet PI may also be used to point to XSLT stylesheets. Henry: We need to find out where the xml-stylesheet PI is even mentioned in the spec. Liam: What we need to avoid is a normative reference to CSS without a reference to XSLT. I think we need to make sure that the editor understands that we need to say something at a higher level. There's a lot of work that could be done to improve interoperability, and that's where test case would be involved, but that doesn't need to be in the first recommendation. Henry: I believe that from the HTML perspective, the relevant spec is the CSS Object Model, http://dev.w3.org/csswg/cssom/ Norm: So are we happy if the mention of XSLT is in the CSS OM spec? Henry: I think that's up to them. Norm: If we're content that the reference can go in this spec, then I think simply saying that CSS and XSLT are among the possible "supported styling languages" might be enough. Henry: It's very odd that the only place XSLT is mentioned is in a section titled "CSS". This looks like, "for CSS processors, here's what we say about the xml-stylesheet PI". It's badly scoped to have a section that talks about stylesheet languages in general in a section titled CSS. Some further discussion of the various documents. Henry: The CSS OM document has a model of "merging stylesheets", but we don't do that. There are three layers: pointing to stylesheets; selection: among those that you've pointed to, how do you select a subset; and if that subset has more than element, how do you combine them? ... We only care about the first two. It doesn't make any sense at all to try to do combination. The relevant question is: if there's more than one, then which one do you use. Right now, the implementations are split on first or last. Liam: I think it's up to the XSLT WG to decide what it means if you have more than one stylesheet. Norm: So the CSS OM is not where this reference belongs, agreed? Henry: Well. The CSS OM spec should really be called the Stylesheet Object Model. Until you get to combination there's very little that's CSS specific. It covers all the ways there are of getting stylesheets into the object model, it covers how you select from them, and then it goes on to talk about you combine them. ... I don't mind what the spec is called, as long as it's clear at the end of the day. Liam: I don't think we'll get objections to adding a sentence to the CSS OM spec to say that XSLT is one of the possible stylesheet languages. That's the pragmatic position. Henry: I'd still like to get something normative and more substantial. > How hard do we want to press the HTML5 WG? Norm: I'm not sure where things stand now. Liam: The HTML spec does refer to the xml-stylesheet PI. It doesn't clearly mention that you can get both CSS and XSLT out of it. It seems that what we'd like is something normative said about what it means if a stylesheet type of XSLT comes back. Henry: The HTML5 spec deals with these things called stylesheet objects. That's what the CSS OM spec defines. That spec and the HTML5 spec agree that stylesheet objects have a stylesheet type. Liam: HTML5 is done by the HTML WG. CSS OM is done by the CSS WG. Henry: We don't need any change to CSS OM at all. It tells us what we need to know: if we give it an xml-stylesheet PI with a type of "application/xslt" we get a stylesheet object with a stylesheet type of XSLT. Then the question is, where in the HTML5 spec does it say what happens if you have a stylesheet object of that type? ... I can't figure out where stylesheets of *any* type get their bite in the HTML5 spec. Some discussion of the relationship between browsing contexts, stylesheet objects, event loops, rendering, etc. Henry: The breadcrumbs necessary to answer this question are not easy to follow. The stylesheet object is referred to obliquely in a list in a section that talks about dependencies, 2.2.2. Norm: The problem is that what CSS do and what XSLT do are very different. Jirka: I think the most logical place is to put it in 5.6.3, page load processing model for XML files. More discussion of what and where we might say something. Henry: Isn't this parallel to the syndication feed case in the paragraph after the note? I think we want another one of these. ... We need to back up to 5.6.1 and look at step 19. We're still going to have to do something that's not allowed here. What we want is to follow the steps in the HTML case. Liam: It seems to me that in practice, they go to the next step. Further discussion. It seems we want 5.6.3 then 5.6.2 or 5.6.3. Henry: Bear with me, suppose we were going to say that we were going to render this by thinking of the XSLT process as a plugin. I think 5.6.7 is closer to what actually happens than anything else. Norm: Maybe what we need is a new peer to section 5.6.7 called "Page load processing model for content that uses XSLT stylesheets". Henry: But 5.6.7 looks like the best model. > Can we coordinate a discussion with HTML5 folks this week? Liam has talked to Mike Smith. Norm will talk to the editor. The HTML WG isn't going to accept normative spec changes, so we should just work on a non-normative note. Henry: I think we should work on some use cases and see if we can help get some normative text moving forward for at least some future draft. Liam: I think we've probably outlined what we think is the best long-term solution is, in broad strokes [see above, --scribe]. We're not likely to get the WG as a whole to agree to another normative section at the moment. If that's the case, I think we should try to figure out what the section should look like. By the time we've figured it out, HTML will be even closer to being frozen. In the meantime, I think we should try to craft a non-normative sentence that broadly describes what current behavior is. Liam: Section 10 is a non-normative section about rendering with CSS, so I think we should be able to have a similar statement about XSLT at the same level of conformance. Norm: I think the green "Note" sections are non-normative. In 5.6.3, how about we propose: Note: Many existing user agents support the 'text/xsl' (or 'application/xslt+xml') style sheet type, with XSLT [ref] as the relevant supported styling language. When the browsing context has a StyleSheet of that style sheet type, such agents transform the current XML document using the XSLT stylesheet retrieved from the style sheet location (typically supplied via an xml-stylesheet processing instruction) and rendering (or otherwise processing) the document that results from that transformation. The precise details of this process will be defined in a future specification. General agreement that this is ok. ACTION: Norm to pass this note along to the HTML5 WG. Henry: I'd like you to include the fact that we'll continue to help provide additional test cases to aid in the development of the future specification. > 5. Error recovery note > > Consider Liam's suggestion to document error recovery. > See http://lists.w3.org/Archives/Public/public-xml-core-wg/2012Sep/0002 Liam: I looked at the Amsterdam Web Corpus for a Balisage 2012 paper. After allowing for various sorts of errors, I looked at what percentages of various XML content types were not well formed. Most were RSS and Atom. ... I'd be happy if web browsers were to treat RSS and HTML specially. They already treat HTML specially. ... Most XML on the web is well-formed. If you except RSS and HTML, we get to more than 90% is well formed. Of the rest, some is pastebins and such where you don't expect it to be informed. ... I don't think well-formedness is a big problem on the web. ... There were 11,000 bad RSS documents and about 58,000 good ones. Henry: Roughly a sixth are bad. Liam: To put that in an interesting and useful perspective, for urlset documents (Google sitemap) there are 41,700 good ones and 491 bad ones. Because there's economic incentive to fix urlset documents. And because RSS readers fix errors. Liam: There are also a bunch of XML documents that the Amsterdam corpus labels as broken that are in fact ok if you get the encoding correct. Liam: My proposal is not to change XML. We all know that a document with a well-formedness error is not an XML document. And we also know that the XML Recommendation doesn't apply to documents that aren't XML. Except that you can't call a non-wellformed document "XML" because it isn't. ... I think the answer is: a web browser that gets something that isn't well formed, it needs to give indication, for example in the developer console, along the lines of "document was not XML, doing recovery". And it may then, at that point, process the document in any non-XML way it likes, including generating a conceptually new XML document that is well-formed. ... What you must not do is say that the original resource was XML. That might mean, for example, taking away the XML base property. I don't know. Marking the DOM in some clear way. And of course issuing a warning message. Some discussion of how a DOM could be marked as not-XML. Henry: Why is this better than doing nothing? Liam: My concern, purely with my XML activity hat on, is I don't want to see error recovery being used in a non-interactive word. Even though it would be great for browser vendors, I don't want to change XML to allow error correction. I don't think it's appropriate for the majority of XML use cases. Jirka: So you don't want an XML-ER spec? Liam: I'm worried about it. I want the developer to know that there was a syntax error. I know it could easily get lost, but I'd still like it to be there. ... you can't silently correct errors, but you can if you give a warning. ... I'm suggesting a working group note explaining the bounds of possibilities, explaining what the spec does and doesn't say. Some discussion of how this relates to the XML-ER work. Liam: The only thing I really wanted to try to head off was parsers that don't maintain the distinction between well-formed and not-well-formed. Liam: Having an error message and marking it in the DOM gives JavaScript and other applications a chance to do something useful with this information. Norm: So you're proposing a Note and you're willing to edit it? Liam: Yes, but I'm willing to do something else. We now have a wiki. We could make it a wiki page. Norm: So how about this: create a wiki page with roughly the sort of text you'd like to see in a Note, if we did a Note, and then see where we come out. Liam: I'm happy with that. Henry: I'm skeptical about how helpful that will be. My feeling is that your proposal will be, in practice, that the XML community has endorsed error recovery for XML. I think that that's worse than the status quo. ... But this space is evolving and it's unclear where we are at the moment with respect to whether there are problems that need to be solved or not. Liam: I hear you and it may be that we decide collectively that it's not necessary or useful to do what I'm suggesting. [ Recess for lunch ] > 6. RFC 3023bis and LEIRIs > > Any progress to discuss? Henry: I've been added as an editor and there is now a new draft, draft-lilley-xml-mediatypes-00 at http://tools.ietf.org/html/draft-lilley-xml-mediatypes-00 From the status section: Major differences from [RFC3023] are alignment of charset handling for text/xml and text/xml-external-parsed-entity with application/ xml, the addition of XPointer and XML Base as fragment identifiers and base URIs, respectively, mention of the XPointer Registry, and updating of many references. Henry: Most of these are not new. There are two major, recent changes. One is that as a result of sensible movement within the IETF community text/xml and text/xml-external-parsed-entity are no longer deprecated. (Because the underlying specs have changed or are changing so that ASCII and ISO Latin 1 are no longer the defaults for text/ MIME types.) ... The other change is to deal with the fragment ID issues better. Wherever you have a suffixed type (e.g., +xml), there's one in the background, the one that applies to the suffix, in this case application/xml. There are three relevant specs: 1. foo/baz+suffix 2. +suffix 3. and the spec from which +suffix is derived, e.g. application/suffix The key move was to say that it is possible, for barenames at least, to make the fallback on a per-link basis. So "...#foo" for RDF/XML can have RDF semantics even if other links have different semantics. Next action is to get this through the IETF process. For IRIs, that one's boiled up again and now the IETF and WHAT WG[5] have conflicting specs for resource identifiers. That will have to be resolved in some way. > 7. MicroXML > > Is there anything we need or want to say? Norm: It's such a small subset that I don't think it's very interesting: no namespaces, no processing instructions, no colons in names, no general entities, and *only* UTF-8. Henry: Getting rid of general entities will simplify the parser and getting rid of namespaces will make it faster. Fixing charsets really does make it the work of a competent graduate student to write a parser for it. Henry: The only thing I'd say is that if they do decide to standardize it, we should do it. We do own that name. I wouldn't have a problem with publishing such a spec. Henry: A pointy-bracket alternative to JSON is presumably one of the goals. > 8. XML-ER > > (Lack of) status report and any discussion. Norm: Not much has happened, it's not clear that the community interest persists. Liam's ideas are a kind of counter-proposal. I think the ball is in my court but it's not clear how much I'll be able to contribute in the immediate future. 9. Names that begin [Xx][Mm][Ll] Henry: The XML spec does say "Names beginning with the string "xml", or with any string which would match (('X'|'x') ('M'|'m') ('L'|'l')), are reserved for standardization in this or future versions of this specification." ... Two things follow from that: anything that approaches standardization of those names have to go through us, and if you use such a name and don't try to standardize through us, you may find yourself stepped all over by us. Jirka: On the other hand, this is before namespaces. And there are already thousands of documents that start with "XML" so... ... I think there are two points of confusion: some applications emit warnings for elements that start with "xml". We could say that we don't mean that. Secondly, we may think about revising this limitation. Every couple of years there's some XML format that is quite widely used that has some element or attribute that starts with XML. ... So we could keep it and say it's a mistake or we could remove it because there are namespaces and namespaces can be used to create elements and attributes with special meanings. Henry: I expect that there's no contention that an element or attribute that starts with "xml" is not a well-formedness error. Also, I'm always skeptical when I see a direct negative assertion in a spec. The only justification for negatives is if the rest of the spec is so badly written that it's easier to make it clear in this way. ... As for going the next step, saying that this statement doesn't license warnings or stylistic changes for XML processors, it seems entirely reasonable for me to write warnings about this. Warnings about using a reserved name not yet defined are applicable. Norm: proposed erratum s/in this or future versions of this specification/ /in this or future specifications from the XML Core WG or its successors/ Henry: I'm perfectly happy to entertain a motion to remove this from this specification and retain the "xml:" prefix only for elements and attributes and "xml-" only for PI targets. Norm: I'd prefer to make explicit that you *can* write names that begin "xml", but doing so exposes you to being walked on in the future. So don't do that. Jirka: I can go either way, it's always been a restriction, users should know better, but there are lots of documents that use it, so we should adapt to common practice. ACTION: Jirka to start an email thread about this issue on the Core WG. [10] Any other business None heard. Any reason to reconvene tomorrow? None heard. Meeting adjourned. > [1] http://www.w3.org/XML/Group/Core > [2] http://www.w3.org/XML/Group/Core#tasks > [3] http://lists.w3.org/Archives/Public/public-xml-core-wg/2012Oct/0009.html [4] http://lists.w3.org/Archives/Public/public-xml-core-wg/2012Oct/0008.html [5] http://url.spec.whatwg.org/
Received on Monday, 29 October 2012 14:19:26 UTC