Re: PDF accessibility and complex script languages. from Olaf Drümmer on 2016-01-06 (w3c-wai-ig@w3.org from January to March 2016)

From: Olaf Drümmer <olaflist@callassoftware.com>
Date: Wed, 6 Jan 2016 01:56:45 +0100
To: Andrew Kirkpatrick <akirkpat@adobe.com>
Cc: Olaf Drümmer <olaflist@callassoftware.com>, Andrew Cunningham <andj.cunningham@gmail.com>, w3c WAI List <w3c-wai-ig@w3.org>
Message-Id: <7F055F2E-F736-4A61-A202-177D5E6455E2@callassoftware.com>

--Apple-Mail=_39B8BDC3-499D-47C1-9DE5-88B40329A589
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain;
	charset=utf-8

Hi Andrew,

> On 05.01.2016, at 02:41, Andrew Kirkpatrick <akirkpat@adobe.com =
<mailto:akirkpat@adobe.com>> wrote:
>=20
> ActualText is NEVER EVER EVER used to represent the results of OCR =E2=80=
=93 that would be a violation of the standard.

I=E2=80=99d love to learn how you come to this statement. AFAICT it =
can=E2=80=99t be derived from any of the PDF standards I know.

Just to make an extreme point: if each and every (recognized) character =
in an OCR-ed document would be represented by an ActualText attribute, =
on the formal level of applicable standards (PDF per ISO 32000-1 or =
PDF/UA per ISO 14289-1), nothing would be in violation of the applicable =
standards. Whether using such an approach makes any sense is a =
completely different story.

Olaf


--Apple-Mail=_39B8BDC3-499D-47C1-9DE5-88B40329A589
Content-Transfer-Encoding: quoted-printable
Content-Type: text/html;
	charset=utf-8

<html><head><meta http-equiv=3D"Content-Type" content=3D"text/html =
charset=3Dutf-8"><meta http-equiv=3D"Content-Type" content=3D"text/html =
charset=3Dutf-8"></head><body style=3D"word-wrap: break-word; =
-webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" =
class=3D"">Hi Andrew,<div class=3D""><br class=3D""><div><blockquote =
type=3D"cite" class=3D""><div class=3D"">On 05.01.2016, at 02:41, Andrew =
Kirkpatrick &lt;<a href=3D"mailto:akirkpat@adobe.com" =
class=3D"">akirkpat@adobe.com</a>&gt; wrote:</div><br =
class=3D"Apple-interchange-newline"><div class=3D""><span =
style=3D"font-family: Calibri, sans-serif; font-size: 14px; font-style: =
normal; font-variant: normal; font-weight: normal; letter-spacing: =
normal; line-height: normal; orphans: auto; text-align: start; =
text-indent: 0px; text-transform: none; white-space: normal; widows: =
auto; word-spacing: 0px; -webkit-text-stroke-width: 0px; float: none; =
display: inline !important;" class=3D"">ActualText is&nbsp;</span><span =
style=3D"font-family: Calibri, sans-serif; font-size: 14px; =
font-variant: normal; font-weight: normal; letter-spacing: normal; =
line-height: normal; orphans: auto; text-align: start; text-indent: 0px; =
text-transform: none; white-space: normal; widows: auto; word-spacing: =
0px; -webkit-text-stroke-width: 0px; font-style: italic;" class=3D"">NEVER=
 EVER EVER</span><span style=3D"font-family: Calibri, sans-serif; =
font-size: 14px; font-style: normal; font-variant: normal; font-weight: =
normal; letter-spacing: normal; line-height: normal; orphans: auto; =
text-align: start; text-indent: 0px; text-transform: none; white-space: =
normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px; =
float: none; display: inline !important;" class=3D"">&nbsp;used to =
represent the results of OCR =E2=80=93 that would be a violation of the =
standard.</span></div></blockquote></div><br class=3D""></div><div =
class=3D"">I=E2=80=99d love to learn how you come to this statement. =
AFAICT it can=E2=80=99t be derived from any of the PDF standards I =
know.</div><div class=3D""><br class=3D""></div><div class=3D"">Just to =
make an extreme point: if each and every (recognized) character in an =
OCR-ed document would be represented by an ActualText attribute, on the =
formal level of applicable standards (PDF per ISO 32000-1 or PDF/UA per =
ISO 14289-1), nothing would be in violation of the applicable standards. =
Whether using such an approach makes any sense is a completely different =
story.</div><div class=3D""><br class=3D""></div><div =
class=3D"">Olaf</div><div class=3D""><br class=3D""></div></body></html>=

--Apple-Mail=_39B8BDC3-499D-47C1-9DE5-88B40329A589--

Received on Wednesday, 6 January 2016 00:59:10 UTC