W3C home > Mailing lists > Public > www-international@w3.org > April to June 2005

Re: Creating a PDF file with UTF-8 encoding through Servlet

From: Vinod Balakrishnan <vinod@adobe.com>
Date: Tue, 24 May 2005 10:05:55 -0700
To: Bruno Girin <Bruno.Girin@cambista.com>, Addison Phillips <addison.phillips@quest.com>, WWW International <www-international@w3.org>
Message-id: <BEB8AD03.122AF%vinod@adobe.com>
            >1. How do you specify a character encoding in a PDF file when
you create it using PDFlibs?

> The text object section of the PDF spec has all the details about specifying
> the fonts, encoding etc. If you have any problem with displaying Japanese
> text, you may want to read this section. Again if you are embedding the fonts
> ( only allowed fonts ), please make sure to use the glyph ids.
> 
> -Vinod
> 
> From: Bruno Girin <Bruno.Girin@cambista.com>
> Date: Tue, 24 May 2005 17:07:56 +0100
> To: Addison Phillips <addison.phillips@quest.com>, www-international@w3.org
> Subject: RE: Creating a PDF file with UTF-8 encoding through Servlet
> Resent-From: www-international@w3.org
> Resent-Date: Tue, 24 May 2005 16:07:15 +0000
> 

> Addison, you're absolutely right. My mistake. But that does not explain how
> you can include non-Latin characters in a PDF file. Somewhere along the line,
> you need to specify the encoding of the characters inside the PDF file. I
> suppose that the PDF format has a way to specify the encoding of the
> characters inside the file, similar to what HTML does. So the original
> question from Sourav boils down to:
> 
> 1. How do you specify a character encoding in a PDF file when you create it
> using PDFlibs?
>         
> 2. How do you produce content encoded with the encoding specified in the PDF
> file; I believe you can do this with the Java OutputStreamWriter class as
> explained below.
> 
> The content type is then secondary information that tells the browser to fire
> Acrobat or any other reader when loading the PDF and should be
> "application/pdf".
> 
> For information, the solution we use in my company is to produce XSL-FO with
> the proper XML encoding specification and use Jakarta FOP to produce the PDF.
> It works just fine to produce Russian (or English) output, as long as you
> configure FOP properly so that it can find fonts that contain glyphs for all
> the characters present in the file.
> 
> Bruno Girin
> Chief Technical Architect
> Cambista Technologies Ltd
> 
> 
> -----Original Message-----
> From: Addison Phillips [mailto:addison.phillips@quest.com]
> Sent: Tue 5/24/2005 4:01 PM
> To: Bruno Girin; www-international@w3.org
> Subject: RE: Creating a PDF file with UTF-8 encoding through Servlet
> 
> No, that's not right. The PDF file is a binary file. The text *INSIDE* the
> file (i.e. the text being encoded by the PDF library) has an encoding. But PDF
> file themselves do not have or need a charset parameter. Putting a charset
> parameter on a Content-Type of "application/*" is just silly.
> 
> Your browser does not read the text in a PDF. It calls the Acrobat plug-in
> which read the Acrobat file.
> 
> Addison
> 
> Addison P. Phillips
> Globalization Architect, Quest Software
> Chair, W3C Internationalization Core Working Group
> 
> Internationalization is not a feature.
> It is an architecture.
> 
>> > -----Original Message-----
>> > From: www-international-request@w3.org [mailto:www-international-
>> > request@w3.org] On Behalf Of Bruno Girin
>> > Sent: 2005?5?24? 4:34
>> > To: www-international@w3.org
>> > Subject: FW: Creating a PDF file with UTF-8 encoding through Servlet
>> >
>> > Sorry, sent this message to Khurram only, not the list.
>> >
>> >
>> > -----Original Message-----
>> > From: Bruno Girin
>> > Sent: Tue 5/24/2005 11:39 AM
>> > To: Khurram Ilyas
>> > Subject: RE: Creating a PDF file with UTF-8 encoding through Servlet
>> >
>> > Addison, that's the whole point of Sourav's question: a PDF file is binary
>> > file that contians text data. As a consequence, you need to specify the
>> > encoding of the text data so that the computer that will read the PDF can
>> > properly read the binary stream and translate it into the correct
>> > characters to display.
>> >
>> > To achieve this, you need 3 things:
>> > 1. the servlet needs to encode the binary stream using an encoding that is
>> > able to encode the totality of the character set used in the document. If
>> > it is Japanese, the best encoding is probably UTF-8.
>> > 2. the servlet needs to specify that same encoding in the content type
>> > 3. the PDF file presumably needs to contain encoding data so that the file
>> > can be re-read by a PDF viewer independantly of the download
>> >
>> > To do 1, you need to enclose the output stream into an OutputStreamWriter
>> > that specifies the encoding, such as:
>> > Writer wout = new OutputStreamWriter(out, "UTF-8"); // out being the
>> > output stream obtained in Sourav's step 2
>> > then you call wout.write() and other Writer methods
>> >
>> > To do 2, you just specify the encoding as part of the content type:
>> > response.setContentType("application/pdf; charset=utf-8");
>> >
>> > 3 is dependant on the API you're using to create your PDF file. I don't
>> > know PDFlib so can't tell you what the call is.
>> >
>> > Good luck with this.
>> >
>> > Bruno Girin
>> > Chief Technical Architect
>> > Cambista Technologies Ltd
>> >
>> >
>> > -----Original Message-----
>> > From: www-international-request@w3.org on behalf of Khurram Ilyas
>> > Sent: Fri 5/20/2005 11:04 PM
>> > To: addison.phillips@quest.com; SOURAVM@infosys.com; www-
>> > international@w3.org
>> > Subject: RE: Creating a PDF file with UTF-8 encoding through Servlet
>> >
>> > Instead of
>> >
>> > response.setContentType("application/pdf");
>> >
>> >
>> >
>> > try
>> >
>> > response.setContentType("application/download");
>> >
>> >
>> >
>> >
>> >
>> >
>> > Best Regards,
>> > Khurram Ilyas
>> >
>> >
>> >
>> >
>>> > >From: "Addison Phillips" <addison.phillips@quest.com>
>>> > >To: "souravm" <SOURAVM@infosys.com>,<www-international@w3.org>
>>> > >Subject: RE: Creating a PDF file with UTF-8 encoding through Servlet
>>> > >Date: Fri, 20 May 2005 09:14:10 -0700
>>> > >
>>> > >
>>> > >PDF files are binary, not text, objects.
>>> > >
>>> > >Addison
>>> > >
>>> > >Addison P. Phillips
>>> > >Globalization Architect, Quest Software
>>> > >Chair, W3C Internationalization Core Working Group
>>> > >
>>> > >Internationalization is not a feature.
>>> > >It is an architecture.
>>> > >
>>>> > > > -----Original Message-----
>>>> > > > From: www-international-request@w3.org [mailto:www-international-
>>>> > > > request@w3.org] On Behalf Of souravm
>>>> > > > Sent: 2005?5?20? 6:13
>>>> > > > To: www-international@w3.org
>>>> > > > Subject: Creating a PDF file with UTF-8 encoding through Servlet
>>>> > > >
>>>> > > >
>>>> > > > Hi All,
>>>> > > >
>>>> > > > I need to create and return back a PDF file from Servlet as a
>>>> response
>> > to
>>>> > > > http request (typical download functionality).
>>>> > > >
>>>> > > > Now for this purpose I'm -
>>>> > > >
>>>> > > > 1. First setting following fields in response onject -
>>>> > > > response.setContentType("application/pdf");
>>>> > > > response.setHeader("Pragma", "");
>>>> > > > response.setHeader("Cache-Control", "");
>>>> > > > response.setDateHeader("Expires", 0);
>>>> > > >
>>>> > > > 2. After that I'm creating an OutputStream object from the response
>> > object.
>>>> > > >
>>>> > > > 3. Using theat OutputStream object I'm wrting the content of the PDF
>> > file
>>>> > > > (using APIs of PDFlib). Using PDFDocument.open(OutputStream) to >>>>
create
>> > the
>>>> > > > document object.
>>>> > > >
>>>> > > > 4. After writing the content of the PDF I'm closing the PDF file
>>>> > > > (PDFDocument.close()).
>>>> > > >
>>>> > > > In this context, I'll like to know, don't I need to specify the
>> > encoding
>>>> > > > of the PDF document through the setContentType API ? Say, I'm
>>>> creating
>> > a
>>>> > > > PDF file with Japanese content and I want the encoding of the file to
>> > be
>>>> > > > of Shift_JIS.
>>>> > > >
>>>> > > > Any pointer/information on thios would be highly appreciated.
>>>> > > >
>>>> > > > Regards,
>>>> > > > Sourav
>>>> > > >
>>>> > > >
>>> > >
>>> > >
>>> > >
>> >
>> >
>> >
>> >
>> >
>> >
>> > _____________________________________________________________________
>> > This e-mail and attachments has been scanned for viruses. Please email
>> > virus@cambista.net if you have detected a virus in this mail.
> 
> 
> 
> 
> _____________________________________________________________________
> This e-mail and attachments has been scanned for viruses. Please email
> virus@cambista.net if you have detected a virus in this mail.
> 
Received on Tuesday, 24 May 2005 17:07:01 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 2 June 2009 19:17:05 GMT