- From: liorean <liorean@gmail.com>
- Date: Thu, 17 Apr 2008 08:25:46 +0200
On 17/04/2008, William F Hammond <hammond at csc.albany.edu> wrote: > Previously: > > Yes, but the point is, once a user agent begins to sniff, there's no > rational excuse for it not to recognize compliant xhtml+(mathml|svg). Yes there is. Live content rely on even perfectly well formed XHTML to have the HTML behaviours of CSS and the DOM. It also relies on all elements having #PCDATA content. Thus scripts and style sheets would be given an incompatible parsing that changes the meaning of '&', '<' and XML comments within scripts, just to take one example. That is, a script which is well formed and valid XML and which is XML well formedness-compatible and proper HTML may have entirely textual content. (The subset of live XHTML content that uses embedded scripts which are also XML well formed without using explicit CDATA wrapping is very small, though.) > >> What obstacles to this exist? > > > > The Web. > > Really!?! Really. > And then: > > >>> The Web. > >> > >> Really!?! > > > > Yes, see for instance: > > > > http://lists.w3.org/Archives/Public/public-html/2007Aug/1248.html > > Taylor's comment is mainly about what happens when a user agent > confuses tag soup with good xhtml. > > It is a different question how a user agent decides what it is looking > at. > > Whether there is one mimetype or two, erroneous content will need > handling. The experiment begun around 2001 of "punishing" bad > documents in application/xhtml+xml seems to have led to that mime type > not being much used. We don't know how big a factor the draconianness of XML parsing really is. The fact is, the single biggest consumer of those documents has not begun supporting XHTML yet. Internet Explorer supports HTML and XML but not the XHTML namespace in XML, nor the XHTML content type. This alone makes everybody reluctant to serve application/xhtml+xml. Sure, there are other complications from the XML draconianness than this, but my point is that these are all compounded, so it's hard to tell how effectively they have been put to the test. If you could run the test again with Internet Explorer's non-support taken out of the equation, then you would be able to say something about it. As it is currently, you can't know either way. > So user agents need to learn how to recognize the good and the bad > in both mimetypes. > > Otherwise you have Gresham's Law: the bad documents will drive out the > good. > > The logical way to go might be this: > > If it has a preamble beginning with "^<?xml " or a sensible > xhtml DOCTYPE declaration or a first element "<html xmlns=...>", > then handle it as xhtml unless and until it proves to be non-compliant > xhtml (e.g, not well-formed xml, unquoted attributes, munged handling > of xml namespaces, ...). At the point it proves to be bad xhtml reload > it and treat it as "regular" html. Doesn't work. We need DOM and CSS treatment as in HTML, not as in XHTML, to be compatible with live content for those circumstances too. > So most bogus xhtml will then be 1 or 2 seconds slower than good xhtml. > Astute content providers will notice that and then do something about it. > It provides a feedback mechanism for making the web become better. So, you argue that a document with an XHTML structure as text/html should change semantics in ways that will affect functionality, behaviour and presentation because of e.g. a single unescaped ampersand in a URI or a single character that breaks because of encoding? My opinion: Any feedback mechanism that directly hurts the user and only indirectly hurts the publisher, as opposed to a feedback mechanism that directly notifies the publisher, is totally backwards. Fail early. Compile time is better than run time because that's instantly obvious to the programmer - the build isn't compiling, so there there's no working but buggy build to give users. The analogy for web content is that you should fail at publishing time instead of viewing time if possible, because then you HAVE to correct your documents before you can serve them to the user. If you want to serve XML to users on the web, you should make sure your tools cannot possibly serve malformed XML, by making absolutely certain that the content has correct encoding (any defaulting must confirm that the content actually conforms to the default encoding), has a specified content type (defaulting is acceptable for fragments here, but e.g. uploading raw files should require specifying the type) and is a well formed fragment or document at publishing time, loudly rejecting any content that is malformed. (And by publishing I include all sources: design templates, content producers, information from the database, advertisements, comments, trackbacks etc.) -- David "liorean" Andersson
Received on Wednesday, 16 April 2008 23:25:46 UTC