W3C home > Mailing lists > Public > xproc-dev@w3.org > July 2009

Dealing with encodings and file: URIs

From: Norman Walsh <ndw@nwalsh.com>
Date: Wed, 29 Jul 2009 08:34:00 -0400
To: XProc Dev <xproc-dev@w3.org>
Message-ID: <m21vnzis7b.fsf@nwalsh.com>
Hello world,

Imagine that windows-1252.txt contains some data encoded in Windows
CP1252.

You load that data into your pipeline with p:data:

  <p:data href="windows-1252.txt"/>

And you get botched content because no one knew that the encoding was
windows-1252.

So you try again:

  <p:data href="windows-1252.txt">
          content-type="text/plain; charset=windows-1252"/>

And much to your surprise, you get a botched file again.

Why?

Because the implementation of p:data opens windows-1252.txt and the
filesystem reports that the content type is "text/plain" (without any
encoding because how can the filesystem tell?)

"Server" metadata is authoritative so the charset that you specified
is discarded.

My short-term workaround in Calabash is:

  If the URI is a file URI and the server doesn't return a charset and
  the server-supplied content type is the same as the user-supplied
  content type, then apply the user's charset parameter.

This (a) does what the user expects but (b) is clearly a violation of
the rule that says server metadata is authoritative.

Am I overlooking a better solution?

                                        Be seeing you,
                                          norm

-- 
Norman Walsh <ndw@nwalsh.com> | The art of living is more like
http://nwalsh.com/            | wrestling than dancing.--Marcus Aurelius

Received on Wednesday, 29 July 2009 12:34:46 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Wednesday, 29 July 2009 12:34:47 GMT