Conformance requirements for accept-charset from Henri Sivonen on 2007-11-22 (public-html@w3.org from November 2007)

From: Henri Sivonen <hsivonen@iki.fi>
Date: Thu, 22 Nov 2007 14:07:00 +0200
To: "public-html@w3.org Tracking WG" <public-html@w3.org>
Message-Id: <6D28AC7B-6073-4DD4-8B47-BFD1DD35F69B@iki.fi>

Currently, the accept-charset attribute on <form> is defined by  
reference to HTML4.

What should conformance checkers do?

Is leading and trailing whitespace allowed? Leading or trailing commas  
hopefully aren't.

The separators for the charset names must be at least one character  
long and contain zero or more space characters and at most one comma,  
right?

Charset names should presumably match the mime-charset production from  
RFC 2978. Should the names also be checked against the IANA list of  
encodings? Or a shorter list of encodings that actually work? Are non- 
preferred IANA names errors?

Opinion comment: Allowing comma in the separator is a design bug, IMO,  
considering that the general design pattern for separating spaceless  
tokens is to separate with whitespace. It is probably not worthwhile  
to make the commas non-conforming, though. Also, considering that  
UTF-8 ends the need to keeps the list of character encodings  
extensible, I think it would make sense to define a closed list of  
known to work legacy encoding encodings (plus UTF-8) to check against.

-- 
Henri Sivonen
hsivonen@iki.fi
http://hsivonen.iki.fi/

Received on Thursday, 22 November 2007 12:07:41 UTC