Re: [CSS21][css3-namespace][css3-page][css3-selectors][css3-content] Unicode Normalization

Phillips, Addison wrote:
>> I couldn't care less about non-normalizability of XML names per se,
>> 
> 
> That is the restriction on XML, though.

Right.  I don't view this restriction as one that should stop us here,
if we decide to normalize anything at all.

>> but you indicated that any data that will be communicated to the
>> server can't be normalized.
> 
> I don't think I said that it can't be normalized under any
> circumstances. It is possible that any data that will be communicated
> to the server might not be normalized.

You seem to have misunderstood the issue.  The question is "Is it OK for 
the browser to normalize its input data if the resulting normalized text 
might then end up being communicated back to the server?"

> - is it okay to send non-normalized data? I think the answer to this
> is emphatically yes. There is actually no way to prevent it.

Sure there is, for the browser doing the sending....

> - is it okay to normalize non-normalized data at the server (or
> elsewhere) for some process? I think the answer to this is
> emphatically yes, although whether one wants to or not depends on the
> context of what "some process" is.

That sounds closer to my question, though not identical to it.

Again, my questions are the following:

1)  Is it ok for a web browser to normalize all character data it gets
     from the network at the time when it performs the
     bytes-to-characters conversion (and possibly re-normalize again
     after handling escapes)?
2)  Is it ok for a web browser to normalize any text data that will be
     exposed to DOM APIs (including form input values which the user may
     have typed)?  If not, then what makes this case different from the
     case of other editors?

>> Any data exposed via a DOM API, whether it be form input values or 
>> XML/HTML tag localNames can be sent to the server, right?
> 
> Yes. The question is, when you then *select* that data, what comes
> back?
...
> Both are semantically equivalent and normalize to U+00E9. I can send
> either to the server in my request and get the appropriate
> (normalized) value in return.

That's an issue of server behavior and doesn't affect browser behavior.

> Conversely, I should be able to select:
> 
> <p>&#x65;&#x300;</p>
> 
> ... using either form. I might be returned the original
> (non-normalized) sequence in the result.

You _might_, or you _have_to_be_?  That's the question, as I see it.

> The point is that processes
> that are normalization sensitive must behave as if the data were
> normalized. Why is that a contradiction?

There are two possible ways to behave as if the data were normalized. 
One is to normalize the data up front, when you first read your input. 
One is to do on-the-fly normalization at all points where comparisons of 
any sort are performed.

The former is by far the simpler approach.  Is it viable?

-Boris

Received on Monday, 2 February 2009 22:06:22 UTC