Re: TIFF as accessibility option from Christophe Strobbe on 2007-03-12 (w3c-wai-ig@w3.org from January to March 2007)

From: Christophe Strobbe <christophe.strobbe@esat.kuleuven.be>
Date: Mon, 12 Mar 2007 12:13:58 +0100
To: w3c-wai-ig@w3.org
Message-Id: <6.2.5.6.2.20070312112118.03d91080@esat.kuleuven.be>

Hi David,

At 20:01 10/03/2007, David Woolley wrote:
>(...)
> > Option - TIFF Format". The PDF contains the text of the article in
> > the form of scanned images. There are no plain text or HTML-versions
>
>I believe the proper Adobe tools can produce an OCRed underlay for the
>scans.  Can you confirm that none has been included. (Note that
>modern PDFs can be flagged as allowing access to the text for
>accessibility, but not for cut and paste.)   Actually, most
>vaguely recently published journals are available as proper PDFs, so,
>if they are using scans, rather than PDF rendered to a bitmap, they
>may have very nobbled access to the originals.

I checked the document properties, which tell me that the "PDF producer"
is not Adobe Acrobat but iText 1.3 (a free PDF library in Java;
see <http://www.lowagie.com/iText/>).
The security tab in document properties says that printing, changing the
document, content copying or extration, and content extraction for
accessibility are allowed.
I ran two such PDF files through the accessibility checker in Adobe Acrobat
Professional 7.0. For each page, it says: "1 image(s) with no alternate text".
The accessbility report also says that the document is not tagged and that
there are 7 text blocks with no language specified.
Searching for terms in the text yields no results at all.
After performing OCR, it was possible to search the text and to select spans
of text. The accessibility report still says: "1 image(s) with no alternate
text". So I assume that there was no text "behind" the images.

I couldn't find anything on the JSTOR site that said they use images because
journal publishers won't let them publish the articles as electronic text. At
<http://www.jstor.org/about/images.html> they say that they use images because
it is their goal to produce faithful replications.

Best regards,

Christophe Strobbe

-- 
Christophe Strobbe
K.U.Leuven - Departement of Electrical Engineering - Research Group 
on Document Architectures
Kasteelpark Arenberg 10 - 3001 Leuven-Heverlee - BELGIUM
tel: +32 16 32 85 51
http://www.docarch.be/ 

Disclaimer: http://www.kuleuven.be/cwis/email_disclaimer.htm

Received on Monday, 12 March 2007 11:14:47 UTC