- From: Shane McCarron <shane@aptest.com>
- Date: Fri, 01 Aug 2008 17:05:15 -0500
- To: Simon Pieters <simonp@opera.com>
- CC: Al Gilman <Alfred.S.Gilman@ieee.org>, Michael Cooper <cooper@w3.org>, Richard Schwerdtfeger <schwer@us.ibm.com>, XHTML WG <public-xhtml2@w3.org>
(removed WAI-PF - they wanted out of this discussion ;-) Sorry for the delay in responding, I got distracted with child rearing. Simon Pieters wrote: > > On Wed, 30 Jul 2008 19:32:29 +0200, Shane McCarron <shane@aptest.com> > wrote: >>> You're causing grief for browser vendors. See >>> http://annevankesteren.nl/2007/12/xml-entities >> The XHTML Family supports a well defined collection of entities. You >> can 1) dereference the DTD from the DOCTYPE declaration to learn them, > > No, we can't really do this. Doing so would be a massive distributed > denial of service on w3.org and make w3.org a single point of failure. > See http://hsivonen.iki.fi/no-dtd/ Well.... not really. I mean, you could pre-cache well known ones, cache new ones as you encounter them, and fallback to normal XML entity processing rules if the DTD were not retrievable... However, you could not just cache based upon FPI - it would have to be the combination of FPI and SYSTEM id. Otherwise someone could do a pretty neat man-in-the-middle attack, masquerading their weird version of XHTML-whatever as the real one. >> or 2) you can say "that's a well known FPI, I know what that means" or > > Yeah, that's what we're doing now, but a browser can't know about ones > that are minted after the browser shipped... Of course. But it could learn them. Or *you* could learn them and notify your installed base as part of what would effectively be similar to an anti-virus "signature update". Periodically update the collection of known, supported FPIs and their entity collections. Assuming it is only the entity collections you care about. That surprises me, but whatever. >> 3) you can say "That FPI matches the pattern for XHTML Family FPIs, >> which is well know, and I know what that means". > > Could you elaborate on how this would work, exactly? See if the FPI > starts with "-//W3C//DTD XHTML" and, if so, feeding all the entity > declarations? Well, I think that would be legitimate. All XHTML Family Markup Languages included the same base collection of defined entities. However, that is not actually mandated. If it would make the browser vendor's lives easier, I think the XHTML 2 Working Group would be open to requiring that those entity sets are a part of all XHTML Family languages. >> Finally, you could do 1) when 2) or 3) was not true, but then learn >> the FPI and treat it as 2) from then on. Or, you could just do what >> XHTML M12N says you should do in this case, which is in production 6 >> of clause 3.5 of XHTML M12N. > > Do you have a pointer? Of course. See http://www.w3.org/TR/xhtml-modularization/conformance.html#s_conform_user_agent - clause 6 clearly says how a user agent should behave when it doesn't know an entity. This is consistent with the requirements for a non-validating XML processor as defined in the XML spec section 4.4.3. The XML parser tells the "application", the user agent, that an entity reference was not expanded. The user agent is then free to do something.... ala what is recommended in the M12N spec. >> There are lots of solutions to the XML Entity problem. > > The one I like best is to add all HTML and MathML entities as > predefined ones in XML so they can be used with any doctype or no > doctype at all. I honestly don't think that is very forward looking. Its a fine default, but if I provide a user agent with a DOCTYPE and a SYSTEM identifier, and it doesn't know what that is, it should try to load it and use it. IMHO. >> But that problem is not really relevant to the question of whether >> FPIs have meaning and whether creating new ones is problematic. > > Well for browsers in XML, the only meaning FPIs have is whether or not > to load a bunch of entity declarations. For some browsers I suppose that is true. Some user agents and processors do validation on the fly (WAP gateways, for example). They pay attention to these. >> Let me put this another way. If there were no DOCTYPE - no >> declaration of any type about what version of what markup language a >> document was written in, what would a browser do? > > The same as it would if there was a doctype, except don't load in any > entity declarations... > > >> I mean, I know what it would do today if it were served as >> text/html... use some broken tag soup parser that has been around for >> ages and try to guess what I meant. Reverse engineer a DOM tree that >> is probably right, but maybe not. 'cause it guessed. Makes me >> crazy. Makes most people crazy. It's the best reason for the HTML5 >> spec. Lock down the broken behavior so it makes people predictably >> crazy. >> >> But what if it were XML and served as application/xhtml+xml? What >> would a user agent do then? Presumably it would follow the rules as >> set forth in XML for parsing, the XML DOM rules for DOM generation, >> and for elements in the XHTML namespace, it would look to XHTML M12N >> for behavior. Probably with some arcane knowledge based upon >> historical practice, because that's just how programmers work. But >> in general it would follow the rules for behavioral requirements for >> XHTML... And that's exactly what it should do. >> >> If there were a DOCTYPE declaration, what would it do differently? > > If the FPI is unknown, the browser could opt to not be fatal when > encountering an undeclared entity reference (like Opera does). If the > FPI is known it would load in a bunch of entity declarations. And > that's all. Actually, I think it MUST opt to not be fatal unless the XML says 'standalone="yes"' but I am not 100% certain. >> If that DOCTYPE adheres to the naming requirements in M12N and >> matches the pattern for XHTML Family, then it should do the same thing. > > Are there conformance criteria in M12N (or elsewhere) that UAs should > match against a pattern for "XHTML Family"? Yes - see the reference above. >> If it has some inbuilt knowledge about some XHTML family document >> types and wants to do something special for them, I suppose that's >> okay too. > > Such as loading in entity declarations? Yes, that is certainly permitted. Or knowing the content model ahead of time, permissible datatypes for elements.... whatever. If you are caching FPI related information, you should cache whatever your user agent needs to grok the elements and attributes that markup language uses. >> But the default rules can and should apply in all cases for all >> unknown family members. > > How does one know if an unknown FPI is an XHTML family member or not? See above. >> That's what the recommendations say, > > Where? Again, see above. >> and I am pretty sure we meant it when we wrote them. >> >> What's the alternative? Have inbuilt knowledge about a handful of >> predefined markup languages based upon FPIs, and then fail to process >> new members of the XHTML Family? > > This is not about support for the language -- that's dispatched on > namespaces -- but merely about entities. > > Mozilla, Opera and WebKit have knowledge about a handful of FPIs that > will load in the HTML or HTML+MathML entity declarations. If you use a > different FPI, only the 5 XML entities can be used -- others will > fail, either gracefully (Opera) or fatally (Mozilla/WebKit). Well - a fatal failure is a bug. XML doesn't permit that as I read it. And switching on namespaces is not really safe - namespaces are overloaded. But regardless, if that is what people do... >> That is certainly a violation of the spirit of the requirements for >> user agent conformance in M12N. > > Well, loading in entity declarations based on pattern matching on the > FPI and hoping that the referenced DTD contained those same > declarations is certainly a violation of the XML spec, AFAICT. Not if there is a superseding standard that permits it. But your point is well taken. You should really be loading them, not guessing. >> What a user agent that claims to support XHTML must do is say "oh, >> this is XHTML family. I know the rules for that" and just deal with >> it. FWIW, the XHTML2 Working Group "mints" new FPIs all the time. > > Well so long as they aren't used on the Web we're not really affected. > :-) But it would be nice if new FPIs weren't minted, at least until we > change XML to get more default entities. Understood. But the truth is that we are crafting new variations of XHTML all the time. Moreover, that was the whole point of XHTML M12N. Perhaps what we should be doing is telling people to not use named entities at all. That would sort of solve your problem. Or fix your implementations so they work in a way consistent with the standard. I mean..... for whom is this easier? The millions of web content authors who want to use entities in their markup, or the 5 or 6 user agent development groups who need to make their agents understand them. Easy choice. -- Shane P. McCarron Phone: +1 763 786-8160 x120 Managing Director Fax: +1 763 786-8180 ApTest Minnesota Inet: shane@aptest.com
Received on Friday, 1 August 2008 22:06:38 UTC