RE: PDF - Text Extraction File

An inline SVG with an option to print to PDF would be a neat solution and there are plenty of applications out there that can vectorise a raster image into an inline SVG including Illustrator … 

 

 

 

From: Richter,Susan <Susan.Richter@nscc.ca> 
Sent: Monday, October 20, 2025 11:00 PM
To: w3c-wai-ig@w3.org
Subject: RE: PDF - Text Extraction File

 

Thank you everyone for all the feedback. You’ve given me a lot to go back to our content creators with regarding PDF creation. It is much appreciated.

 

Susan Richter
Senior Web Interface Developer
Digital Products & Experience

Nova Scotia Community College
Institute of Technology Campus
Web:  <http://www.nscc.ca/?utm_source=email-sig&utm_medium=email&utm_campaign=email%20signature%20link> nscc.ca



 

From: Joshua Hori <jhori@ucdavis.edu> 
Sent: Sunday, October 19, 2025 7:52 PM
To: Andrews, David B (DEED) <david.b.andrews@state.mn.us>; chagnon@pubcom.com; Richter,Susan <Susan.Richter@nscc.ca>; w3c-wai-ig@w3.org
Subject: Re: PDF - Text Extraction File

 


 

CAUTION: This message was sent from outside the organization. Please do not click links or open attachments unless you recognize the source of this email and know the content is safe. 

  _____  

This error message is occurring because it’s an image PDF of a campus map. There is no text to interact with. When viewing PDF’s in a browser, it extracts the text from the PDF and removes all tags, giving you a text only PDF when viewed in a browser. In this case, there are only images in the PDF. 

 

I was looking at DocAccess, which is an AI PDF remediation service which has a tie into AIRA.io, a company who hires humans to provide audio descriptions of content for visually disabled users. An excellent and responsible use case of AI backed by humans. Reach out if you need introductions. 

 

Best, 

 

Joshua Hori

Accessible Technology Coordinator

Information Educational Technology

Academic Technology Services

50 Hutchison Dr.

Davis, CA 95616

530-752-2439 

 <https://calendly.com/d/ytt-hsj-vbn> Schedule a meeting via Calendly

 

From: Andrews, David B (DEED) <david.b.andrews@state.mn.us <mailto:david.b.andrews@state.mn.us> >
Date: Friday, October 17, 2025 at 2:36 PM
To: chagnon@pubcom.com <mailto:chagnon@pubcom.com>  <chagnon@pubcom.com <mailto:chagnon@pubcom.com> >, 'Richter,Susan' <Susan.Richter@nscc.ca <mailto:Susan.Richter@nscc.ca> >, w3c-wai-ig@w3.org <mailto:w3c-wai-ig@w3.org>  <w3c-wai-ig@w3.org <mailto:w3c-wai-ig@w3.org> >
Subject: RE: PDF - Text Extraction File

I have seen this file several times in the past couple months, when trying to read a PDF file on-line.  It is difficult because you don’t quite know what to do.

 

Dave

 

 

From: chagnon@pubcom.com <mailto:chagnon@pubcom.com>  <chagnon@pubcom.com <mailto:chagnon@pubcom.com> >
Sent: Friday, October 17, 2025 12:15 PM
To: 'Richter,Susan' <Susan.Richter@nscc.ca <mailto:Susan.Richter@nscc.ca> >; w3c-wai-ig@w3.org <mailto:w3c-wai-ig@w3.org> 
Subject: RE: PDF - Text Extraction File

 

 
This message may be from an external email source.

Do not select links or open attachments unless verified. Report all suspicious emails to Minnesota IT Services Security Operations Center.

 

  _____  

Hi Susan,

I’ve been a PDF developer since the first beta of Acrobat & the PDF file format. I’ve never seen the exact error message you quoted, and have no idea what a “text extraction file” is.

 

Generally, screen readers pick up the live text in the PDF file itself. But since you mentioned that these are campus maps, they could be just images without any live text such as titles, headers, footers, etc. Usually in that case, other messages will appear that basically prompt to make the file accessible with their A I tools.

 

The PDF Standards ISO 32000 and PDF/UA 14289 do define that text must be extractable by technologies, including screen readers. But it does not reference the term “text extraction file.” And there isn’t an additional file: everything should be inside the one PDF file.

 

Questions:

1. What specific screen reader or readers are you using?
2. What operating system?
3. What software are you trying to open and read the PDF with?  Adobe reader, Adobe Acrobat, FoxIt, or any of the 100s of other programs that can now open and read PDFs?  Or is the PDF being opened by the web browser?

 

—Bevi

Bevi Chagnon |  <mailto:bevi.chagnon@PubCom.com> bevi.chagnon@PubCom.com

Member, ISO Committees for PDF & PDF/UA Standards

Adobe Community Expert

Media Designer, Author, Trainer, and Consultant

 

PubCom.com

Technologists for Accessible Design + Publishing

MS Office – Adobe InDesign & Acrobat – Editorial & Design – A11y Publishing Workflow

 

From: Richter,Susan < <mailto:Susan.Richter@nscc.ca> Susan.Richter@nscc.ca>
Sent: Wednesday, October 15, 2025 7:16 AM
To:  <mailto:w3c-wai-ig@w3.org> w3c-wai-ig@w3.org
Subject: PDF - Text Extraction File

 

Hi All,

 

This is my first time posting here so please advise if this isn’t the best group for this kind of assistance.

 

I’ve discovered we have some PDFs on our site that when accessed via a screen reader this is the message read out: “This PDF is inaccessible. Couldn’t download text extraction files”. The PDF(s) in question are large campus maps.

 

I’m not super familiar with creating accessible PDFs, but I’m trying to understand how a text extraction file is created and then attached to a PDF so when a user opens it on the page using a screen reader the text extraction file is available. Does it have to be a separate link on the page or is there some way to embed/tie it to the PDF itself that just triggers it when opened via a screen reader?

 

Thanks in advance.

 

Susan Richter
Senior Web Interface Developer
Digital Products & Experience

Nova Scotia Community College
Institute of Technology Campus
Web:  <http://www.nscc.ca/?utm_source=email-sig&utm_medium=email&utm_campaign=email%20signature%20link> nscc.ca



 






This communication (including any attachments) may contain privileged or confidential information of Nova Scotia Community College and is intended for a specific individual. If you are not the intended recipient, you should delete this communication, including any attachments without reading or saving them in any manner, and you are hereby notified that any disclosure, copying, or distribution of this communication, or the taking of any action based on it, is strictly prohibited.

Received on Monday, 20 October 2025 22:07:49 UTC