- From: David Woolley <forums@david-woolley.me.uk>
- Date: Sun, 19 Oct 2025 01:29:31 +0100
- To: w3c-wai-ig@w3.org
On 19/10/2025 00:12, David Woolley wrote: > I'm wondering if the web server is serving it wrongly, possibly wrong > content-encoding, and some browsers are fixing that up. I downloaded it > with Firefox ESR, from Debian 12, and ran the Debian 12 pdftotext, > against it. > > There are quite a lot of response headers that are new to me, but the ones that actually describe the document are very straightforward, and say it is PDF, with nothing special done to it. However, I wonder if you are using something other than a mainstream browser. It is possible that you are being served with substitute document, intended for suspected crawlers, to stop for, example, AI being trained on the data, or, in the past, people building databases that could be used use to undermine the site owner's business model, or simply to delete the adverts that pay for the site. Could you check the size of the file, which should be 3216271, and, if using Linux, or FreeBSD, run the "file" utility against it; you should get: $ file ivany-map.pdf ivany-map.pdf: PDF document, version 1.5, 1 pages $ (I'm wondering if you have been served a compressed version and it hasn't been uncompressed by the download tool. Maybe there is a Contents-Encoding: gzip header, in the response, that is being ignored, although note that this wasn't present for responses from either Firefox or wget.) For reference, these are the HTTP headers I got (there is no more input from me after this): HTTP/1.1 200 OK Cache-Control: max-age=2592000 Content-Type: application/pdf Last-Modified: Wed, 15 Oct 2025 00:48:42 GMT Accept-Ranges: bytes ETag: "19e2f4736d3ddc1:0" Server: Microsoft-IIS/10.0 X-UA-Compatible: IE=edge Permissions-Policy: camera=(), fullscreen=(self), geolocation=(*), microphone=() Referrer-Policy: no-referrer-when-downgrade X-Content-Type-Options: nosniff X-Xss-Protection: 1; mode=block Content-Security-Policy-Report-Only: default-src 'self' *.nscc.ca; img-src 'self' *.nscc.ca *.gstatic.com *.fontawesome.com *.google.ca --- much more of the same --- ancestors 'self' *.nscc.ca:*; Content-Security-Policy: frame-ancestors 'self' *.nscc.ca:*; Date: Sat, 18 Oct 2025 23:24:06 GMT Content-Length: 3216271 Strict-Transport-Security: max-age=157680000
Received on Sunday, 19 October 2025 00:29:38 UTC