- From: Vinod Balakrishnan <vinod@adobe.com>
- Date: Tue, 24 May 2005 10:05:55 -0700
- To: Bruno Girin <Bruno.Girin@cambista.com>, Addison Phillips <addison.phillips@quest.com>, WWW International <www-international@w3.org>
- Message-id: <BEB8AD03.122AF%vinod@adobe.com>
>1. How do you specify a character encoding in a PDF file when you create it using PDFlibs? > The text object section of the PDF spec has all the details about specifying > the fonts, encoding etc. If you have any problem with displaying Japanese > text, you may want to read this section. Again if you are embedding the fonts > ( only allowed fonts ), please make sure to use the glyph ids. > > -Vinod > > From: Bruno Girin <Bruno.Girin@cambista.com> > Date: Tue, 24 May 2005 17:07:56 +0100 > To: Addison Phillips <addison.phillips@quest.com>, www-international@w3.org > Subject: RE: Creating a PDF file with UTF-8 encoding through Servlet > Resent-From: www-international@w3.org > Resent-Date: Tue, 24 May 2005 16:07:15 +0000 > > Addison, you're absolutely right. My mistake. But that does not explain how > you can include non-Latin characters in a PDF file. Somewhere along the line, > you need to specify the encoding of the characters inside the PDF file. I > suppose that the PDF format has a way to specify the encoding of the > characters inside the file, similar to what HTML does. So the original > question from Sourav boils down to: > > 1. How do you specify a character encoding in a PDF file when you create it > using PDFlibs? > > 2. How do you produce content encoded with the encoding specified in the PDF > file; I believe you can do this with the Java OutputStreamWriter class as > explained below. > > The content type is then secondary information that tells the browser to fire > Acrobat or any other reader when loading the PDF and should be > "application/pdf". > > For information, the solution we use in my company is to produce XSL-FO with > the proper XML encoding specification and use Jakarta FOP to produce the PDF. > It works just fine to produce Russian (or English) output, as long as you > configure FOP properly so that it can find fonts that contain glyphs for all > the characters present in the file. > > Bruno Girin > Chief Technical Architect > Cambista Technologies Ltd > > > -----Original Message----- > From: Addison Phillips [mailto:addison.phillips@quest.com] > Sent: Tue 5/24/2005 4:01 PM > To: Bruno Girin; www-international@w3.org > Subject: RE: Creating a PDF file with UTF-8 encoding through Servlet > > No, that's not right. The PDF file is a binary file. The text *INSIDE* the > file (i.e. the text being encoded by the PDF library) has an encoding. But PDF > file themselves do not have or need a charset parameter. Putting a charset > parameter on a Content-Type of "application/*" is just silly. > > Your browser does not read the text in a PDF. It calls the Acrobat plug-in > which read the Acrobat file. > > Addison > > Addison P. Phillips > Globalization Architect, Quest Software > Chair, W3C Internationalization Core Working Group > > Internationalization is not a feature. > It is an architecture. > >> > -----Original Message----- >> > From: www-international-request@w3.org [mailto:www-international- >> > request@w3.org] On Behalf Of Bruno Girin >> > Sent: 2005?5?24? 4:34 >> > To: www-international@w3.org >> > Subject: FW: Creating a PDF file with UTF-8 encoding through Servlet >> > >> > Sorry, sent this message to Khurram only, not the list. >> > >> > >> > -----Original Message----- >> > From: Bruno Girin >> > Sent: Tue 5/24/2005 11:39 AM >> > To: Khurram Ilyas >> > Subject: RE: Creating a PDF file with UTF-8 encoding through Servlet >> > >> > Addison, that's the whole point of Sourav's question: a PDF file is binary >> > file that contians text data. As a consequence, you need to specify the >> > encoding of the text data so that the computer that will read the PDF can >> > properly read the binary stream and translate it into the correct >> > characters to display. >> > >> > To achieve this, you need 3 things: >> > 1. the servlet needs to encode the binary stream using an encoding that is >> > able to encode the totality of the character set used in the document. If >> > it is Japanese, the best encoding is probably UTF-8. >> > 2. the servlet needs to specify that same encoding in the content type >> > 3. the PDF file presumably needs to contain encoding data so that the file >> > can be re-read by a PDF viewer independantly of the download >> > >> > To do 1, you need to enclose the output stream into an OutputStreamWriter >> > that specifies the encoding, such as: >> > Writer wout = new OutputStreamWriter(out, "UTF-8"); // out being the >> > output stream obtained in Sourav's step 2 >> > then you call wout.write() and other Writer methods >> > >> > To do 2, you just specify the encoding as part of the content type: >> > response.setContentType("application/pdf; charset=utf-8"); >> > >> > 3 is dependant on the API you're using to create your PDF file. I don't >> > know PDFlib so can't tell you what the call is. >> > >> > Good luck with this. >> > >> > Bruno Girin >> > Chief Technical Architect >> > Cambista Technologies Ltd >> > >> > >> > -----Original Message----- >> > From: www-international-request@w3.org on behalf of Khurram Ilyas >> > Sent: Fri 5/20/2005 11:04 PM >> > To: addison.phillips@quest.com; SOURAVM@infosys.com; www- >> > international@w3.org >> > Subject: RE: Creating a PDF file with UTF-8 encoding through Servlet >> > >> > Instead of >> > >> > response.setContentType("application/pdf"); >> > >> > >> > >> > try >> > >> > response.setContentType("application/download"); >> > >> > >> > >> > >> > >> > >> > Best Regards, >> > Khurram Ilyas >> > >> > >> > >> > >>> > >From: "Addison Phillips" <addison.phillips@quest.com> >>> > >To: "souravm" <SOURAVM@infosys.com>,<www-international@w3.org> >>> > >Subject: RE: Creating a PDF file with UTF-8 encoding through Servlet >>> > >Date: Fri, 20 May 2005 09:14:10 -0700 >>> > > >>> > > >>> > >PDF files are binary, not text, objects. >>> > > >>> > >Addison >>> > > >>> > >Addison P. Phillips >>> > >Globalization Architect, Quest Software >>> > >Chair, W3C Internationalization Core Working Group >>> > > >>> > >Internationalization is not a feature. >>> > >It is an architecture. >>> > > >>>> > > > -----Original Message----- >>>> > > > From: www-international-request@w3.org [mailto:www-international- >>>> > > > request@w3.org] On Behalf Of souravm >>>> > > > Sent: 2005?5?20? 6:13 >>>> > > > To: www-international@w3.org >>>> > > > Subject: Creating a PDF file with UTF-8 encoding through Servlet >>>> > > > >>>> > > > >>>> > > > Hi All, >>>> > > > >>>> > > > I need to create and return back a PDF file from Servlet as a >>>> response >> > to >>>> > > > http request (typical download functionality). >>>> > > > >>>> > > > Now for this purpose I'm - >>>> > > > >>>> > > > 1. First setting following fields in response onject - >>>> > > > response.setContentType("application/pdf"); >>>> > > > response.setHeader("Pragma", ""); >>>> > > > response.setHeader("Cache-Control", ""); >>>> > > > response.setDateHeader("Expires", 0); >>>> > > > >>>> > > > 2. After that I'm creating an OutputStream object from the response >> > object. >>>> > > > >>>> > > > 3. Using theat OutputStream object I'm wrting the content of the PDF >> > file >>>> > > > (using APIs of PDFlib). Using PDFDocument.open(OutputStream) to >>>> create >> > the >>>> > > > document object. >>>> > > > >>>> > > > 4. After writing the content of the PDF I'm closing the PDF file >>>> > > > (PDFDocument.close()). >>>> > > > >>>> > > > In this context, I'll like to know, don't I need to specify the >> > encoding >>>> > > > of the PDF document through the setContentType API ? Say, I'm >>>> creating >> > a >>>> > > > PDF file with Japanese content and I want the encoding of the file to >> > be >>>> > > > of Shift_JIS. >>>> > > > >>>> > > > Any pointer/information on thios would be highly appreciated. >>>> > > > >>>> > > > Regards, >>>> > > > Sourav >>>> > > > >>>> > > > >>> > > >>> > > >>> > > >> > >> > >> > >> > >> > >> > >> > _____________________________________________________________________ >> > This e-mail and attachments has been scanned for viruses. Please email >> > virus@cambista.net if you have detected a virus in this mail. > > > > > _____________________________________________________________________ > This e-mail and attachments has been scanned for viruses. Please email > virus@cambista.net if you have detected a virus in this mail. >
Received on Tuesday, 24 May 2005 17:07:01 UTC