Re: IRI normalization and escaping test results for DOM and HTTP

On 7/12/2011 2:39 PM, Chris Weber wrote:
> On 7/10/2011 5:45 PM, Bjoern Hoehrmann wrote:
>> * Chris Weber wrote:
>>> I ran some tests to produce the following observations.
>>>
>>> 1) Safari applies NFC normalization to the path, query, and fragment.
>>> 2) Chrome applies NFC normalization to the fragment.
>>> 3) MSIE sends raw, unescaped UTF-8 bytes in the query of an HTTP GET
>>> request.
>>
>> My http://lists.w3.org/Archives/Public/www-html/2002Oct/0002.html would
>> add results to yours, although they are nine years old and I have not
>> checked them recently. I note that you don't say how you arrived at your
>> conclusions. Does it happen in the address bar, XMLHttpRequest results,
>> when clicking links, which encoding did you use or configure, and so on?
>
> My conclusions were based on reviewing the following set of results
> associated with each test case.
>
> 1) the DOM property values for the anchor element, which included an
> individual TestCase along side an <img> element which included the same
> TestCase.
> 2) the raw HTTP GET request (for the img) as sniffed off the wire using
> winpcap
>
> The spreadsheet tab "Normalization Results" includes the HTML fragment
> containing each TestCase, and each TestCase is included inline in the
> table of results. This fragment was included in the <body> of an <html>
> page with no DOCTYPE, so Quirks mode was tested using the UTF-8 charset
> as set by the HTTP header.
>
>>
>>> https://spreadsheets.google.com/spreadsheet/ccc?key=0AifoWoA0trUndEZSTlRRNnd5MzE3N3RYOVlIVFFMREE&hl=en_US#gid=3
>>>
>>> https://spreadsheets.google.com/spreadsheet/ccc?key=0AifoWoA0trUndEZSTlRRNnd5MzE3N3RYOVlIVFFMREE&hl=en_US#gid=5
>>>
>>
>> I would encourage you to post your findings in a more portable format,
>> like Microsoft Excel files or Java applets or something like that. I'm
>> afraid the browser I use the most is not one "Google" "supports".
>
> I've copied the Google docs spreadsheet to an excel file located at
> <https://github.com/cweb/iri-tests/blob/master/results/IRI%20Testing%20Results.xls?raw=true>.
> I'll continue to update this file as I update the test plan and results.
>
> I've sent a different message on this topic
> <http://lists.w3.org/Archives/Public/public-iri/2011Jul/0038.html> but
> it would seem that having a test plan in place that described some goals
> and methods for testing would allow us to discuss conclusions and
> results without having to question how we arrived at them. Do you agree?
>
> Thank you for the feedback,
> Chris
>

I explained the test setup but not the test cases.  I used some of the 
character sequences from Unicode Standard Annex 15 "Unicode 
Normalization Forms" <http://www.unicode.org/reports/tr15/> and others 
from RFC3197.  From TR15 I used a Singleton from Figure 3 - U+212B which 
normalizes to U+005C under NFC.  I also used multiple combining marks 
from Figure 5, U+10EB U+0323, and the sequence U+0032 U+2075 from Figure 
6 Compatibility Composites.  Through those few tests we can see NFC in 
Safari, and rule out NFD, NFKC, and NFKD.

Best regards,
Chris

Received on Wednesday, 13 July 2011 06:38:53 UTC