- From: John Cowan <cowan@mercury.ccil.org>
- Date: Mon, 17 Sep 2012 20:16:59 -0400
- To: "Henry S. Thompson" <ht@inf.ed.ac.uk>
- Cc: liam@w3.org, public-xml-core-wg <public-xml-core-wg@w3.org>
Henry S. Thompson scripsit:
> Of the 392 files with PIs, 40 were not well-formed (that is, 10.2%),
> with the following problems as reported by rxp [1]:
Probably only about half that many are real well-formedness (wf) errors.
> Error: Document ends too soon
> Error: EOE in PI [3 of these]
> Error: Expected ; after entity name, but got = [4 of these]
> Error: Expected > at end of entity declaration, but got -
> Error: Expected name, but got & for entity
> Error: Expected whitespace or tag end in start tag
Can't argue with those.
> Error: Input error: Illegal UTF-8 byte 2 <0x20>
> Error: Input error: Illegal UTF-8 byte 2 <0x20>
> Error: Input error: Illegal UTF-8 byte 2 <0x2e>
> Error: Input error: Illegal UTF-8 byte 2 <0x65>
> Error: Input error: Illegal UTF-8 start byte <0xa0>
Probably the results of blind transcoding, believing a junk Content-type
header, or other screwups. Only technically not-wf.
> Error: Input error: Illegal character <0x0> [11 of these]
> Error: Mismatched end tag: expected </abbr>, got </a>
Can't argue with these either.
> Error: Unknown declared encoding GB2312
> Error: Unknown declared encoding ISO8859-1
> Error: Unknown declared encoding TIS-620
> Error: Unknown declared encoding gb2312
> Error: Unknown declared encoding uft-8 [2 of these]
> Error: Unknown declared encoding windows-1251 [2 of these]
> Error: Unknown declared encoding windows-1252 [2 of these]
> Error: Unknown declared encoding x-user-defined
Those aren't wf errors, just limitations on rxp's ability to cope
with random encodings, though I grant that uft-8 is probably not
a legitimate encoding.
--
I don't know half of you half as well John Cowan
as I should like, and I like less than half cowan@ccil.org
of you half as well as you deserve. http://www.ccil.org/~cowan
--Bilbo
Received on Tuesday, 18 September 2012 00:17:22 UTC