Minting new FPIs and the entity problem in browsers (was: Re: Agenda+ Review XHTML module for ARIA)

On Wed, 30 Jul 2008 19:32:29 +0200, Shane McCarron <shane@aptest.com>  
wrote:

> I have added the XHTML2 working group to this thread, since it is a good  
> discussion for that group.  My comments on this message are at the end.
>
> [...]


>> You're causing grief for browser vendors. See  
>> http://annevankesteren.nl/2007/12/xml-entities
> The XHTML Family supports a well defined collection of entities.  You  
> can 1) dereference the DTD from the DOCTYPE declaration to learn them,

No, we can't really do this. Doing so would be a massive distributed  
denial of service on w3.org and make w3.org a single point of failure. See  
http://hsivonen.iki.fi/no-dtd/


> or 2) you can say "that's a well known FPI, I know what that means" or

Yeah, that's what we're doing now, but a browser can't know about ones  
that are minted after the browser shipped...


> 3) you can say "That FPI matches the pattern for XHTML Family FPIs,  
> which is well know, and I know what that means".

Could you elaborate on how this would work, exactly? See if the FPI starts  
with "-//W3C//DTD XHTML" and, if so, feeding all the entity declarations?


> Finally, you could do 1) when 2) or 3) was not true, but then learn the  
> FPI and treat it as 2) from then on.  Or, you could just do what XHTML  
> M12N says you should do in this case, which is in production 6 of clause  
> 3.5 of XHTML M12N.

Do you have a pointer?


> There are lots of solutions to the XML Entity problem.

The one I like best is to add all HTML and MathML entities as predefined  
ones in XML so they can be used with any doctype or no doctype at all.


> But that problem is not really relevant to the question of whether FPIs  
> have meaning and whether creating new ones is problematic.

Well for browsers in XML, the only meaning FPIs have is whether or not to  
load a bunch of entity declarations.


> Let me put this another way.  If there were no DOCTYPE - no declaration  
> of any type about what version of what markup language a document was  
> written in, what would a browser do?

The same as it would if there was a doctype, except don't load in any  
entity declarations...


> I mean, I know what it would do today if it were served as text/html...  
> use some broken tag soup parser that has been around for ages and try to  
> guess what I meant.  Reverse engineer a DOM tree that is probably right,  
> but maybe not.  'cause it guessed.  Makes me crazy.  Makes most people  
> crazy.  It's the best reason for the HTML5 spec.  Lock down the broken  
> behavior so it makes people predictably crazy.
>
> But what if it were XML and served as application/xhtml+xml?  What would  
> a user agent do then?  Presumably it would follow the rules as set forth  
> in XML for parsing, the XML DOM rules for DOM generation, and for  
> elements in the XHTML namespace, it would look to XHTML M12N for  
> behavior. Probably with some arcane knowledge based upon historical  
> practice, because that's just how programmers work.  But in general it  
> would follow the rules for behavioral requirements for XHTML...  And  
> that's exactly what it should do.
>
> If there were a DOCTYPE declaration, what would it do differently?

If the FPI is unknown, the browser could opt to not be fatal when  
encountering an undeclared entity reference (like Opera does). If the FPI  
is known it would load in a bunch of entity declarations. And that's all.


> If that DOCTYPE adheres to the naming requirements in M12N and matches  
> the pattern for XHTML Family, then it should do the same thing.

Are there conformance criteria in M12N (or elsewhere) that UAs should  
match against a pattern for "XHTML Family"?


> If it has some inbuilt knowledge about some XHTML family document types  
> and wants to do something special for them, I suppose that's okay too.

Such as loading in entity declarations?


> But the default rules can and should apply in all cases for all unknown  
> family members.

How does one know if an unknown FPI is an XHTML family member or not?


> That's what the recommendations say,

Where?


> and I am pretty sure we meant it when we wrote them.
>
> What's the alternative?  Have inbuilt knowledge about a handful of  
> predefined markup languages based upon FPIs, and then fail to process  
> new members of the XHTML Family?

This is not about support for the language -- that's dispatched on  
namespaces -- but merely about entities.

Mozilla, Opera and WebKit have knowledge about a handful of FPIs that will  
load in the HTML or HTML+MathML entity declarations. If you use a  
different FPI, only the 5 XML entities can be used -- others will fail,  
either gracefully (Opera) or fatally (Mozilla/WebKit).


> That is certainly a violation of the spirit of the requirements for user  
> agent conformance in M12N.

Well, loading in entity declarations based on pattern matching on the FPI  
and hoping that the referenced DTD contained those same declarations is  
certainly a violation of the XML spec, AFAICT.


> What a user agent that claims to support XHTML must do is say "oh, this  
> is XHTML family. I know the rules for that" and just deal with it. FWIW,  
> the XHTML2 Working Group "mints" new FPIs all the time.

Well so long as they aren't used on the Web we're not really affected. :-)  
But it would be nice if new FPIs weren't minted, at least until we change  
XML to get more default entities.


> I am saddened to learn that this is a problem for browser vendors.

-- 
Simon Pieters
Opera Software

Received on Wednesday, 30 July 2008 18:11:37 UTC