- From: Michael Sokolov <sokolov@ifactory.com>
- Date: Sat, 1 Aug 2009 08:23:18 -0400
- To: "'Norman Walsh'" <ndw@nwalsh.com>
- Cc: "'XProc Dev'" <xproc-dev@w3.org>
http://www.w3.org/Protocols/rfc2616/rfc2616-sec3.html#sec3.7.1 backs you up. It's interesting to note that browsers don't actually comply with this specification. They all allow the user to override the server-specified character set, although they do follow the spec in their initial default behavior. Here's another relevant quote from the w3 i18n site: http://www.w3.org/International/O-HTTP-charset It is very important to always label Web documents explicitly. HTTP 1.1 says that the default charset is ISO-8859-1. But there are too many unlabeled documents in other encodings, so browsers use the reader's preferred encoding when there is no explicit charset parameter. --------------- My takeaway is that whatever the spec may say, it has proven unworkable in practice to rely on the default charset being iso-8859-1, and that "user-agents" have found they way to a more sensible practice, which is to allow users to override. Not sure what the implication for xproc is exactly, but I feel like a general principle is to define ideal behavior in a specification while at the same time providing, in implementation, whatever tools may be needed to deal with the inevitable realities. An open theoretical question is whether the spec would be better written as saying "should" or "ought" rather than "must" if implementations are inevitably going to be forced to break off... -Mike > -----Original Message----- > From: normanwalsh@gmail.com [mailto:normanwalsh@gmail.com] On > Behalf Of Norman Walsh > Sent: Saturday, August 01, 2009 7:01 AM > To: Mike Sokolov > Cc: XProc Dev > Subject: Re: Dealing with encodings and file: URIs > > On Wednesday, July 29, 2009, Mike Sokolov > <sokolov@ifactory.com> wrote: > > > > It also might be nice if an HTTP server reporting "text/plain" (no > > character set) didn't override the client specification of > > "text/plain; windows/1252", since they don't in fact conflict: the > > server is simply providing less information. > > I believe, though it isn't convenient for me to check right > now, that the absence of an encoding decl in an http response > is a declaration that the text is US-ASCII. > > > > -Mike > > > > Norman Walsh wrote: > > > > "Toman_Vojtech@emc.com" <Toman_Vojtech@emc.com> > <Toman_Vojtech@emc.com> <Toman_Vojtech@emc.com> writes: > > > > > > Um. What kind of filesystem are you using that tells you the > > content type? > > > > > > > > I suppose, on reflection, that that gives me justification > to ignore > > the content type reported by java.net.URL. There really > isn't a server > > involved. > > > > Be seeing you, > > norm > > > > > > > > > > > > >
Received on Saturday, 1 August 2009 12:24:02 UTC