- From: John Lumley <john@saxonica.com>
- Date: Tue, 30 May 2023 11:23:54 +0100
- To: Norm Tovey-Walsh <norm@saxonica.com>, public-ixml@w3.org
On 29/05/2023 17:11, Norm Tovey-Walsh wrote: >> I got the first submission to my processor this week with a UTF-8 >> encoding error, which managed to hang the processor. > Curiously, I have no trouble with the grammar. But I also haven’t > provided any way for the user to specify an encoding, so I’m not sure > what Java is doing. My processor seems to have the replacement character (65533) substituted for the #b7. The file I got through email contains the incorrect byte sequence (i.e. no #c2 before the #b7), but it looks as if when injected into the browser context (using JavaScript FileReader.readasText(file,'UTF-8')) the errant code is converted to the replacement. John Lumley
Received on Tuesday, 30 May 2023 10:24:18 UTC