Re: some ERB decisions

On Thu, 17 Oct 1996 14:40:06 -0400 Gavin Nicol said:
>>  - all XML processors must accept documents in UTF-8 and UCS-2 (or
>>    optionally UTF-16) form
>I'm not a great fan of UTF-16, and am worried about the connotations
>of "accept". Does that mean parse, process, or just accept and die?

I think it means parse successfully.  Some processing may involve
resources (e.g. fonts) over which neither an XML implementor nor a user
may have control, so it'd be hard to guarantee or require that.  (Users
at my university, for example, do not control which X11 fonts are
available on our Unix systems.  I'm not sure our staff does either, but
that's another question.)  Accept-and-die doesn't fall within the
range of meanings I associate with 'accept', but if there's any doubt
the spec should make it clear.

>>  - XML processor may provide a user option which causes them to accept
>>    documents in other coded character sets (e.g. ISO 8859 or JIS 0208)
>>    or other encodings of 10646 or other coded character sets (e.g.
>>    Extended Unix Code) -- this behavior must be optional (i.e. the user
>>    must be able to turn it off, so that documents not in UTF-8 or
>>    UCS-2 raise errors).
>OK. I can live with this, but am not overly happy about the "must be
>optional" clause.

Motivation is simple:  if I want to ensure that document D is in a
form that all XML processors can parse, then I need a way, using the
processor in front of me, to check that.  It's the same idea as in
the Pascal spec:  conforming Pascal implementations *must* provide a
user option turning off all local extensions and flagging as errors
anything the spec says is an error.

There was some sentiment, when we discussed the topic again this
morning, to restrict this rule to validating processors, and allow
non-validating processors to do it whether the user likes it or not.
This seems to me a pointless complication of a simple rule, but it
may go into the spec.

-C. M. Sperberg-McQueen