[whatwg] [wf2] More late comments and questions on Web Forms 2.0 from Ian Hickson on 2006-08-15 (public-whatwg-archive@w3.org from August 2006)

From: Ian Hickson <ian@hixie.ch>
Date: Tue, 15 Aug 2006 07:30:12 +0000 (UTC)
Message-ID: <Pine.LNX.4.62.0608150703010.5340@dhalsim.dreamhost.com>
On Sun, 12 Mar 2006, Henri Sivonen wrote:
> 
> 3.6.1
> Item 10. There's a comma missing after '"[")' and before "a modifier".

Fixed.


> 3.6.1
> Example in item 11. Double quote missing in '"[n? string'.

Fixed.


> 5.
> Step 5. When XML submission is used, characters that are not XMLChars as per
> XML 1.0 need to be dealt with. I suggest dropping them.

I prefer converting them to U+FFFD. Dropping characters can be the source 
of very hard-to-debug security problems. Done.


> Also, when XML submission is used, CRLF line breaks on the data level 
> are weird, because the CR would have to escaped in order to preserve it 
> in XML. I suggest using LF line breaks in XML submission. LF line breaks 
> in XML may be serialized as literal (unescaped) LF, CR or CRLF.

Done.


> 5.
> Step 5. I think NFC normalization should be applied before using legacy
> encodings as well. E.g. Windows-1252 can encode many precomposed European
> characters but cannot encode the decomposed versions without precomposing
> first. However, in some special cases like Windows-1258 (Vietnamese) it is
> necessary to separate some diacritics from the base characters after the NFC
> step. (But I imagine Windows-1258 encoders do that themselves.)

I don't see that this is a WF2 problem. It's up to the encoding 
specifications to specify how to encode Unicode characters.


> 5.
> Step 8. What happens if a 204 response changes the character encoding
> metadata? Or Content-Type in general for that matter?

This is the realm of the HTTP specification.


> 5.2.
> "Note that a string containing the codepoint's value itself (for example, the
> six-character string "U+263A" or the seven-character string "&#9786;") is not
> considered to be human readable and must not be used as a transliteration."
> 
> I agree with the sentiment, but changing that behavior is not
> backwards-compatible.

Backwards compatible with what? IE's behaviour is broken (there's no way 
to submit a literal "&#9786;" followed by a U+263A character). It could 
even be a security risk in certain instances.


> 5.3. & 5.5.
> "The submission character encoding is selected from the form's accept-charset
> attribute. UAs must use the encoding that most completely covers the
> characters found in the form data set of the encodings specified. If the
> attribute is not specified, then the client should use either the page's
> character encoding, or, if that cannot encode all the characters in the form
> data set, UTF-8."
> 
> I think sending UTF-8 to unsuspecting form handlers is worse that losing some
> unencodable characters. Sending UTF-8 to programs that don't expect it amounts
> to garbage in which increases the global amount of garbage out.

If they haven't specified an encoding, then using the page's encoding is 
as much a guess as using UTF-8. The server hasn't said what it expects, it 
should use the encoding metadata in the submission to deal with this.

The sooner we switch to a full-UTF-8/16 solution the better.


> 5.4.
> Can the presence of the accept-charset attribute be considered non-conforming
> when the XML submission type is specified?

Seems like a fine thing to warn about. I don't know if it should be an 
error; what if the page changes the enctype around?


> 5.6. and elsewhere
> Minor typographical nit: Em dash used with spaces on both sides as opposed to
> either em dash without spaces or en dash with spaces.

Em-dash without spaces is ugly, and en-dash is too short. IMHO. :-)


> 5.6.
> "The value of the enctype attribute must be dispatched using a case-
> insensitive literal comparison."
> 
> "case-insensitive" marked up as code. Still worried about considering 
> Turkish i conforming.

Yeah... I think HTML5 might switch pure-ASCII attributes' case-folding to 
ASCII-only. Not sure yet.


> 6.1.
> "(Even if importing into a text/html document, the newly imported nodes will
> still be namespaced.)"
> 
> But will tagName return in upper case?

->HTML5. (Yes. But that isn't specced yet.)


> General DOM
> Will localName return the name in lower case in HTML DOM?

->HTML5. (Depends on whether the Document is an "HTML" or "XML" Document.)


> 6.1.
> "The following script has only one possible valid outcome:"
> 
> "Valid" used loosely. :-)

Fixed.


> 7.10.
> Does "mirror" mean "reflect"?

Changed to reflect, but note that neither term is well-defined in WF2.


> B.
> Is the presence of inapplicable attributes in the input element non-
> conforming? (I think it would be useful to make inapplicable attributes
> non-conforming.)

It should warn, for sure, but I don't know that making it non-conforming 
is useful. As mentioned previously, I don't like making things 
non-conforming unless they are very clearly wrong.

-- 
Ian Hickson               U+1047E                )\._.,--....,'``.    fL
http://ln.hixie.ch/       U+263A                /,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'
Received on Tuesday, 15 August 2006 00:30:12 UTC