RE: PDF - Text Extraction File from Andrews, David B (DEED) on 2025-10-17 (w3c-wai-ig@w3.org from October to December 2025)

From: Andrews, David B (DEED) <david.b.andrews@state.mn.us>
Date: Fri, 17 Oct 2025 17:26:05 +0000
To: "chagnon@pubcom.com" <chagnon@pubcom.com>, "'Richter,Susan'" <Susan.Richter@nscc.ca>, "w3c-wai-ig@w3.org" <w3c-wai-ig@w3.org>
Message-ID: <SA9PR09MB5373EC369B53B687179096A9ECF6A@SA9PR09MB5373.namprd09.prod.outlook.com>

I have seen this file several times in the past couple months, when trying to read a PDF file on-line.  It is difficult because you don't quite know what to do.

Dave

From: chagnon@pubcom.com <chagnon@pubcom.com>
Sent: Friday, October 17, 2025 12:15 PM
To: 'Richter,Susan' <Susan.Richter@nscc.ca>; w3c-wai-ig@w3.org
Subject: RE: PDF - Text Extraction File

This message may be from an external email source.
Do not select links or open attachments unless verified. Report all suspicious emails to Minnesota IT Services Security Operations Center.

________________________________
Hi Susan,
I've been a PDF developer since the first beta of Acrobat & the PDF file format. I've never seen the exact error message you quoted, and have no idea what a "text extraction file" is.

Generally, screen readers pick up the live text in the PDF file itself. But since you mentioned that these are campus maps, they could be just images without any live text such as titles, headers, footers, etc. Usually in that case, other messages will appear that basically prompt to make the file accessible with their A I tools.

The PDF Standards ISO 32000 and PDF/UA 14289 do define that text must be extractable by technologies, including screen readers. But it does not reference the term "text extraction file." And there isn't an additional file: everything should be inside the one PDF file.

Questions:

  1.  What specific screen reader or readers are you using?
  2.  What operating system?
  3.  What software are you trying to open and read the PDF with?  Adobe reader, Adobe Acrobat, FoxIt, or any of the 100s of other programs that can now open and read PDFs?  Or is the PDF being opened by the web browser?

-Bevi

Bevi Chagnon | bevi.chagnon@PubCom.com<mailto:bevi.chagnon@PubCom.com>

Member, ISO Committees for PDF & PDF/UA Standards

Adobe Community Expert

Media Designer, Author, Trainer, and Consultant

PubCom.com

Technologists for Accessible Design + Publishing

MS Office - Adobe InDesign & Acrobat - Editorial & Design - A11y Publishing Workflow

From: Richter,Susan <Susan.Richter@nscc.ca<mailto:Susan.Richter@nscc.ca>>
Sent: Wednesday, October 15, 2025 7:16 AM
To: w3c-wai-ig@w3.org<mailto:w3c-wai-ig@w3.org>
Subject: PDF - Text Extraction File

Hi All,

This is my first time posting here so please advise if this isn't the best group for this kind of assistance.

I've discovered we have some PDFs on our site that when accessed via a screen reader this is the message read out: "This PDF is inaccessible. Couldn't download text extraction files". The PDF(s) in question are large campus maps.

I'm not super familiar with creating accessible PDFs, but I'm trying to understand how a text extraction file is created and then attached to a PDF so when a user opens it on the page using a screen reader the text extraction file is available. Does it have to be a separate link on the page or is there some way to embed/tie it to the PDF itself that just triggers it when opened via a screen reader?

Thanks in advance.

Susan Richter
Senior Web Interface Developer
Digital Products & Experience
Nova Scotia Community College
Institute of Technology Campus
Web: nscc.ca<http://www.nscc.ca/?utm_source=email-sig&utm_medium=email&utm_campaign=email%20signature%20link>
[cid:image001.png@01DC3F61.349FAE90]

This communication (including any attachments) may contain privileged or confidential information of Nova Scotia Community College and is intended for a specific individual. If you are not the intended recipient, you should delete this communication, including any attachments without reading or saving them in any manner, and you are hereby notified that any disclosure, copying, or distribution of this communication, or the taking of any action based on it, is strictly prohibited.

Attachments

image/png attachment: image001.png

Received on Friday, 17 October 2025 21:31:30 UTC