Re: XML on the web

Henry S. Thompson scripsit:

> Of the 392 files with PIs, 40 were not well-formed (that is, 10.2%),
> with the following problems as reported by rxp [1]:

Probably only about half that many are real well-formedness (wf) errors.
>    Error: Document ends too soon
>    Error: EOE in PI [3 of these]
>    Error: Expected ; after entity name, but got = [4 of these]
>    Error: Expected > at end of entity declaration, but got -
>    Error: Expected name, but got & for entity
>    Error: Expected whitespace or tag end in start tag

Can't argue with those.

>    Error: Input error: Illegal UTF-8 byte 2 <0x20>
>    Error: Input error: Illegal UTF-8 byte 2 <0x20>
>    Error: Input error: Illegal UTF-8 byte 2 <0x2e>
>    Error: Input error: Illegal UTF-8 byte 2 <0x65>
>    Error: Input error: Illegal UTF-8 start byte <0xa0>

Probably the results of blind transcoding, believing a junk Content-type
header, or other screwups.  Only technically not-wf.

>    Error: Input error: Illegal character <0x0> [11 of these]
>    Error: Mismatched end tag: expected </abbr>, got </a>

Can't argue with these either.

>    Error: Unknown declared encoding GB2312
>    Error: Unknown declared encoding ISO8859-1
>    Error: Unknown declared encoding TIS-620
>    Error: Unknown declared encoding gb2312
>    Error: Unknown declared encoding uft-8 [2 of these]
>    Error: Unknown declared encoding windows-1251 [2 of these]
>    Error: Unknown declared encoding windows-1252 [2 of these]
>    Error: Unknown declared encoding x-user-defined

Those aren't wf errors, just limitations on rxp's ability to cope
with random encodings, though I grant that uft-8 is probably not
a legitimate encoding.

-- 
I don't know half of you half as well           John Cowan
as I should like, and I like less than half     cowan@ccil.org
of you half as well as you deserve.             http://www.ccil.org/~cowan
        --Bilbo

Received on Tuesday, 18 September 2012 00:17:22 UTC