Re: numeric character references and Unicode surrogate pairs: part of my review of 8 The HTML syntax from Anne van Kesteren on 2007-08-20 (public-html@w3.org from August 2007)

From: Anne van Kesteren <annevk@opera.com>
Date: Mon, 20 Aug 2007 14:54:03 +0200
To: "Robert Burns" <rob@robburns.com>, "public-html WG" <public-html@w3.org>
Message-ID: <op.txczgd2c64w2qv@annevk-t60.oslo.opera.com>

On Sun, 19 Aug 2007 12:05:15 +0200, Robert Burns <rob@robburns.com> wrote:
> I believe this is not consistent with existing browser behavior. That is  
> that while surrogate pairs, expressed as pairs of numeric character  
> references, are not supposed to resolve to the non-BMP character,  
> browsers do it anyway.

Do you have any tests to demonstrate that?


> So while I think we should count this as a parse error, we may want to  
> include it in a list of parse errors that are handled differently by  
> different browsers.

The specification should define only a single conformant behavior.


> I think this would be the best procedure for our WG to follow. For every  
> parse error in the draft, we should maintain a list. Then we should  
> produce results for how this error is currently handled in top-of-tree  
> versions of the various browsers. Then I think we'll be in a better  
> position to decide how HTML5 should recommend interoperable  
> error-handling in each case. Obviously we may still have to decide  
> between conflicting implementations, but at least we can do that through  
> proper deliberation and consensus building steps.

This is how the text is produced. As people produce testcases and  
implement the parsing specification in browsers problems are reported back  
if any are found.


-- 
Anne van Kesteren
<http://annevankesteren.nl/>
<http://www.opera.com/>

Received on Monday, 20 August 2007 12:54:16 UTC