RE: PDF - Text Extraction File from Karen Lewellen on 2025-10-20 (w3c-wai-ig@w3.org from October to December 2025)

From: Karen Lewellen <klewellen@shellworld.net>
Date: Mon, 20 Oct 2025 12:56:41 -0400 (EDT)
To: "Richter,Susan" <Susan.Richter@nscc.ca>
cc: "w3c-wai-ig@w3.org" <w3c-wai-ig@w3.org>
Message-ID: <Pine.LNX.4.64.2510201253290.456551@users.shellworld.net>
I found it later Susan, outlining many of the issues others have noted.
at the end of the day, the question is this.
What are you trying to provide, and how can you provide that information 
in various  ways that helps your students?  its not like a shared label in 
terms of a disability experience makes a shared accommodation, as in  one 
size fits all.
Best,
Karen



On Mon, 20 Oct 2025, Richter,Susan wrote:

> Hi Karen,
>
> I shared the link previously but perhaps you didn't receive it. Here is the link again:
> https://www.nscc.ca/docs/campuses/ivany/ivany-map.pdf
>
> Thanks
>
> Susan Richter
> Senior Web Interface Developer
> Digital Products & Experience
> Nova Scotia Community College
> Institute of Technology Campus
> Web: nscc.ca
>
>
> -----Original Message-----
> From: info@karlencommunications.com <info@karlencommunications.com>
> Sent: Monday, October 20, 2025 9:15 AM
> To: 'David Woolley' <forums@david-woolley.me.uk>; w3c-wai-ig@w3.org
> Subject: RE: PDF - Text Extraction File
>
>
> CAUTION: This message was sent from outside the organization. Please do not click links or open attachments unless you recognize the source of this email and know the content is safe.
> ________________________________
>
> Morning!
>
> If the PDF could be shared, a lot of speculation could end.
>
> If it is a campus map, it is most likely a scanned graphic that is untagged.
>
> Without being able to look at the document, we can't determine the accessibility barrier.
>
> Several of us who have worked in the field of PDF creation and remediation have said that this sounds like a scanned graphic of a map that is not tagged, and therefore is not accessible.
>
> Performing OCR or optical character recognition on a map can result in two unsatisfactory scenarios:
>
> 1. the map will be tagged as a graphic requiring Alt Text. Alt Text would be difficult to provide as it would require a lot of unstructured text that could crash the screen reader buffer.
> 2. Any text on the map could be converted to text which would also be an accessibility barrier because someone would hear a list of names, buildings, streets or other bits of text with no context.
>
> We need to be able to open the PDF in a PDF Editor in order to examine what is going on and provide possible solutions. If it is a campus map, please send the link to download it.
>
> Cheers, Karen
>
> -----Original Message-----
> From: David Woolley <forums@david-woolley.me.uk>
> Sent: Saturday, October 18, 2025 8:30 PM
> To: w3c-wai-ig@w3.org
> Subject: Re: PDF - Text Extraction File
>
> On 19/10/2025 00:12, David Woolley wrote:
>> I'm wondering if the web server is serving it wrongly, possibly wrong
>> content-encoding, and some browsers are fixing that up.  I downloaded
>> it with Firefox ESR, from Debian 12, and ran the Debian 12 pdftotext,
>> against it.
>>
>>
> There are quite a lot of response headers that are new to me, but the ones that actually describe the document are very straightforward, and say it is PDF, with nothing special done to it.
>
> However, I wonder if you are using something other than a mainstream browser.  It is possible that you are being served with substitute document, intended for suspected crawlers, to stop for, example, AI being trained on the data, or, in the past, people building databases that could be used use to undermine the site owner's business model, or simply to delete the adverts that pay for the site.
>
> Could you check the size of the file, which should be 3216271, and, if using Linux, or FreeBSD, run the "file" utility against it; you should get:
>
> $ file ivany-map.pdf
> ivany-map.pdf: PDF document, version 1.5, 1 pages $
>
> (I'm wondering if you have been served a compressed version and it hasn't been uncompressed by the download tool.  Maybe there is a
> Contents-Encoding: gzip header, in the response, that is being ignored, although note that this wasn't present for responses from either Firefox or wget.)
>
> For reference, these are the HTTP headers I got (there is no more input from me after this):
>
> HTTP/1.1 200 OK
> Cache-Control: max-age=2592000
> Content-Type: application/pdf
> Last-Modified: Wed, 15 Oct 2025 00:48:42 GMT
> Accept-Ranges: bytes
> ETag: "19e2f4736d3ddc1:0"
> Server: Microsoft-IIS/10.0
> X-UA-Compatible: IE=edge
> Permissions-Policy: camera=(), fullscreen=(self), geolocation=(*),
> microphone=()
> Referrer-Policy: no-referrer-when-downgrade
> X-Content-Type-Options: nosniff
> X-Xss-Protection: 1; mode=block
> Content-Security-Policy-Report-Only: default-src 'self' *.nscc.ca; img-src 'self' *.nscc.ca *.gstatic.com *.fontawesome.com *.google.ca
>  --- much more of the same ---
>  ancestors 'self' *.nscc.ca:*;
> Content-Security-Policy: frame-ancestors 'self' *.nscc.ca:*;
> Date: Sat, 18 Oct 2025 23:24:06 GMT
> Content-Length: 3216271
> Strict-Transport-Security: max-age=157680000
>
>
>
>
Received on Monday, 20 October 2025 16:56:51 UTC