Re: PDF - Text Extraction File

On 17/10/2025 18:21, Richter,Susan wrote:
> https://www.nscc.ca/docs/campuses/ivany/ivany-map.pdf <https:// 
> www.nscc.ca/docs/campuses/ivany/ivany-map.pdf>
> 
There is plenty of text in there, as text, but I don't see how that 
could ever be made accessible, except with AI technologies.

I'm looking at it with inkscape, so I can see the structure of the 
graphics, but I wouldn't be able to see any tagged PDF markup.  However, 
what I do see makes me think it would be pretty much impossible to use 
tagged PDF, except for a top level description of the whole map.

It is all, or mainly vector graphics and text, so it doesn't need 
computer vision to get some structure from it, but there doesn't seem to 
be any real reflection of the physical site structure in the order of 
the elements.

I'd say it is a 2D model of the intended print image, not a 2 or 3D 
model of the real site.  It might be an interesting research project to 
take something like the mapping data from Open Street Map, and convert 
that into a linear, human language, description, but even then I don't 
think there is the consistency of coding styles, and depth of detail 
that would allow that to work well.

The sort of problem you will get is that the key text is basically 
structured like this:

"Conferences and Events Wing - Floor 4
Trades and Applied Research
Shops and Labs - Floor 3
Tim Hortons - Floor 3
Centre for Built Environment Expo - Floor 3
Student Lounge - Floor 2
Cafeteria - Floor 2
Campus Boardroom - Floor 5
Academic Support Hub and Testing
Centre - Floor 2
Library and Learning Commons - Floor 2
Counselling and Wellness Hub - Floor 2,
Room 2110
Mawio'mi Child Care Centre - Floor 3
Presentation Theatre and Gallery - Floor 2
Student Association Offce - Floor 5
Etl-mawieykw Indigenous
Student Centre - Floor 3, Room 3170
Bookstore - Floor 3
IT Support - Floor 4, Room 4190
Food Pantry - Floor 5
Business and Registrar Offce - Floor 2
Student Advising Hub and Africentric
Student Space - Floor 3, Room 3130
Principal’s Offce - Floor 5
Fitness Centre and Spiritual Room - Floor 5"

with only presentational association with the key numbers.  The circles 
behind the key numbers are also separate objects, whilst the whole block 
of text above is one object, with newlines to separate items.

I'm not seeing any hyperlinks, but I'm not really that familiar with 
doing forensics on PDF files with the tools that I have to hand.

It's been created using Adobe Illustrator, which I assume is basically a 
2D visual design tool.

There may be some confirmation bias in this, as I didn't do a deep 
search, but 
<https://community.adobe.com/t5/illustrator-discussions/best-pdf-settings-from-illustrator-for-accessibility/td-p/9385236> 
basically says Adobe Illustrator is not a good tool for accessible results.

Received on Friday, 17 October 2025 21:31:30 UTC