On 9/10/2012 9:27 PM, David Lee wrote:
>
> I can buy that ... its splitting hairs at this point and consistency
> with both Unicode and legacy XML win.
>
> What is at the back of my mind is the painpoint of JSON allowing a
> greater range of codepoints which really bites the big one sometimes
> even if they are invalid. For example in my tests of sucking twitter
> feeds, I get about 1:1000 documents with an invalid XML character (but
> "valid" in JSON ... well "valid" as in , Twitter feed produces
> "JSON"and the character got in there ...).
>
> That character itself is bogus usually and not "useful" but what is
> painful is typical bulk processing XML tools dieing a flaming death at
> that point ... But I digress MicroXML must suffer/benefit from the
> same decision wrt to Unicode as XML ... although to put a camels nose
> in the tent we might want to open the issue 'waffer thin' to allow
> processors to toss or substitute invalid characters instead of drop dead.
>
>
Yes - some kind of recovery process would be a boon; +1 for allowing
parsers to replace these disallowed codepoints with the special Unicode
character reserved to mean "unknown or unrepresentable character": FFFD./
/-Mike