W3C home > Mailing lists > Public > www-international@w3.org > April to June 2005

RE: Creating a PDF file with UTF-8 encoding through Servlet

From: Addison Phillips <addison.phillips@quest.com>
Date: Tue, 24 May 2005 08:01:28 -0700
Message-ID: <634978A7DF025A40BFEF33EB191E13BC0B7F3C12@irvmbxw01.quest.com>
To: "Bruno Girin" <Bruno.Girin@cambista.com>, <www-international@w3.org>

No, that's not right. The PDF file is a binary file. The text *INSIDE* the file (i.e. the text being encoded by the PDF library) has an encoding. But PDF file themselves do not have or need a charset parameter. Putting a charset parameter on a Content-Type of "application/*" is just silly.

Your browser does not read the text in a PDF. It calls the Acrobat plug-in which read the Acrobat file.

Addison

Addison P. Phillips
Globalization Architect, Quest Software
Chair, W3C Internationalization Core Working Group

Internationalization is not a feature.
It is an architecture. 

> -----Original Message-----
> From: www-international-request@w3.org [mailto:www-international-
> request@w3.org] On Behalf Of Bruno Girin
> Sent: 2005?5?24? 4:34
> To: www-international@w3.org
> Subject: FW: Creating a PDF file with UTF-8 encoding through Servlet
> 
> Sorry, sent this message to Khurram only, not the list.
> 
> 
> -----Original Message-----
> From: Bruno Girin
> Sent: Tue 5/24/2005 11:39 AM
> To: Khurram Ilyas
> Subject: RE: Creating a PDF file with UTF-8 encoding through Servlet
> 
> Addison, that's the whole point of Sourav's question: a PDF file is binary
> file that contians text data. As a consequence, you need to specify the
> encoding of the text data so that the computer that will read the PDF can
> properly read the binary stream and translate it into the correct
> characters to display.
> 
> To achieve this, you need 3 things:
> 1. the servlet needs to encode the binary stream using an encoding that is
> able to encode the totality of the character set used in the document. If
> it is Japanese, the best encoding is probably UTF-8.
> 2. the servlet needs to specify that same encoding in the content type
> 3. the PDF file presumably needs to contain encoding data so that the file
> can be re-read by a PDF viewer independantly of the download
> 
> To do 1, you need to enclose the output stream into an OutputStreamWriter
> that specifies the encoding, such as:
> Writer wout = new OutputStreamWriter(out, "UTF-8"); // out being the
> output stream obtained in Sourav's step 2
> then you call wout.write() and other Writer methods
> 
> To do 2, you just specify the encoding as part of the content type:
> response.setContentType("application/pdf; charset=utf-8");
> 
> 3 is dependant on the API you're using to create your PDF file. I don't
> know PDFlib so can't tell you what the call is.
> 
> Good luck with this.
> 
> Bruno Girin
> Chief Technical Architect
> Cambista Technologies Ltd
> 
> 
> -----Original Message-----
> From: www-international-request@w3.org on behalf of Khurram Ilyas
> Sent: Fri 5/20/2005 11:04 PM
> To: addison.phillips@quest.com; SOURAVM@infosys.com; www-
> international@w3.org
> Subject: RE: Creating a PDF file with UTF-8 encoding through Servlet
> 
> Instead of
> 
> response.setContentType("application/pdf");
> 
> 
> 
> try
> 
> response.setContentType("application/download");
> 
> 
> 
> 
> 
> 
> Best Regards,
> Khurram Ilyas
> 
> 
> 
> 
> >From: "Addison Phillips" <addison.phillips@quest.com>
> >To: "souravm" <SOURAVM@infosys.com>,<www-international@w3.org>
> >Subject: RE: Creating a PDF file with UTF-8 encoding through Servlet
> >Date: Fri, 20 May 2005 09:14:10 -0700
> >
> >
> >PDF files are binary, not text, objects.
> >
> >Addison
> >
> >Addison P. Phillips
> >Globalization Architect, Quest Software
> >Chair, W3C Internationalization Core Working Group
> >
> >Internationalization is not a feature.
> >It is an architecture.
> >
> > > -----Original Message-----
> > > From: www-international-request@w3.org [mailto:www-international-
> > > request@w3.org] On Behalf Of souravm
> > > Sent: 2005?5?20? 6:13
> > > To: www-international@w3.org
> > > Subject: Creating a PDF file with UTF-8 encoding through Servlet
> > >
> > >
> > > Hi All,
> > >
> > > I need to create and return back a PDF file from Servlet as a response
> to
> > > http request (typical download functionality).
> > >
> > > Now for this purpose I'm -
> > >
> > > 1. First setting following fields in response onject -
> > > response.setContentType("application/pdf");
> > > response.setHeader("Pragma", "");
> > > response.setHeader("Cache-Control", "");
> > > response.setDateHeader("Expires", 0);
> > >
> > > 2. After that I'm creating an OutputStream object from the response
> object.
> > >
> > > 3. Using theat OutputStream object I'm wrting the content of the PDF
> file
> > > (using APIs of PDFlib). Using PDFDocument.open(OutputStream) to create
> the
> > > document object.
> > >
> > > 4. After writing the content of the PDF I'm closing the PDF file
> > > (PDFDocument.close()).
> > >
> > > In this context, I'll like to know, don't I need to specify the
> encoding
> > > of the PDF document through the setContentType API ? Say, I'm creating
> a
> > > PDF file with Japanese content and I want the encoding of the file to
> be
> > > of Shift_JIS.
> > >
> > > Any pointer/information on thios would be highly appreciated.
> > >
> > > Regards,
> > > Sourav
> > >
> > >
> >
> >
> >
> 
> 
> 
> 
> 
> 
> _____________________________________________________________________
> This e-mail and attachments has been scanned for viruses. Please email
> virus@cambista.net if you have detected a virus in this mail.
Received on Tuesday, 24 May 2005 15:01:32 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 2 June 2009 19:17:05 GMT