- From: David Woolley <david@djwhome.demon.co.uk>
- Date: Sat, 10 Feb 2001 14:07:09 +0000 (GMT)
- To: w3c-wai-ig@w3.org
> Google provides a plain text version of the document. That this should be possible has always has always been an aim of PDF, although the document for which I found this feature didn't have a successful text extraction. It depends on the document being in text (not a scan with no backing text), composed in a sensible reading order, and the text extractor being able to cope with the excesses of micro-spacing in the authoring tool. (Word/Windows tends to place each character separately, so the extractor has to guess the word boundaries from the spacing, whereas it is trivial to extract text from PDF written to the PDF authoring guidelines.) (To the extent that SVG is created with similar tools to those used to create PDF, text extraction will be similarly easy or difficult.)
Received on Saturday, 10 February 2001 09:07:16 UTC