RE: Dealing with encodings and file: URIs from Michael Sokolov on 2009-08-01 (xproc-dev@w3.org from August 2009)

From: Michael Sokolov <sokolov@ifactory.com>
Date: Sat, 1 Aug 2009 08:23:18 -0400
To: "'Norman Walsh'" <ndw@nwalsh.com>
Cc: "'XProc Dev'" <xproc-dev@w3.org>
Message-Id: <200908011157.n71BvVm5016313@hades.falutin.net>

http://www.w3.org/Protocols/rfc2616/rfc2616-sec3.html#sec3.7.1 backs you up.

It's interesting to note that browsers don't actually comply with this
specification.  They all allow the user to override the server-specified
character set, although they do follow the spec in their initial default
behavior.

Here's another relevant quote from the w3 i18n site:

http://www.w3.org/International/O-HTTP-charset
It is very important to always label Web documents explicitly. HTTP 1.1 says
that the default charset is ISO-8859-1. But there are too many unlabeled
documents in other encodings, so browsers use the reader's preferred
encoding when there is no explicit charset parameter. 

---------------

My takeaway is that whatever the spec may say, it has proven unworkable in
practice to rely on the default charset being iso-8859-1, and that
"user-agents" have found they way to a more sensible practice, which is to
allow users to override.

Not sure what the implication for xproc is exactly, but I feel like a
general principle is to define ideal behavior in a specification while at
the same time providing, in implementation, whatever tools may be needed to
deal with the inevitable realities.  An open theoretical question is whether
the spec would be better written as saying "should" or "ought" rather than
"must" if implementations are inevitably going to be forced to break off...

-Mike

> -----Original Message-----
> From: normanwalsh@gmail.com [mailto:normanwalsh@gmail.com] On 
> Behalf Of Norman Walsh
> Sent: Saturday, August 01, 2009 7:01 AM
> To: Mike Sokolov
> Cc: XProc Dev
> Subject: Re: Dealing with encodings and file: URIs
> 
> On Wednesday, July 29, 2009, Mike Sokolov 
> <sokolov@ifactory.com> wrote:
> >
> > It also might be nice if an HTTP server reporting "text/plain" (no 
> > character set) didn't override the client specification of 
> > "text/plain; windows/1252", since they don't in fact conflict: the 
> > server is simply providing less information.
> 
> I believe, though it isn't convenient for me to check right 
> now, that the absence of an encoding decl in an http response 
> is a declaration that the text is US-ASCII.
> >
> > -Mike
> >
> > Norman Walsh wrote:
> >
> >   "Toman_Vojtech@emc.com" <Toman_Vojtech@emc.com> 
> <Toman_Vojtech@emc.com> <Toman_Vojtech@emc.com> writes:
> >
> >
> >     Um. What kind of filesystem are you using that tells you the 
> > content type?
> >
> >
> >
> > I suppose, on reflection, that that gives me justification 
> to ignore 
> > the content type reported by java.net.URL. There really 
> isn't a server 
> > involved.
> >
> >                                         Be seeing you,
> >                                           norm
> >
> >
> >
> >
> >
> >
>

Received on Saturday, 1 August 2009 12:24:02 UTC