- From: Olaf Drümmer <olaflist@callassoftware.com>
- Date: Wed, 6 Jan 2016 01:56:45 +0100
- To: Andrew Kirkpatrick <akirkpat@adobe.com>
- Cc: Olaf Drümmer <olaflist@callassoftware.com>, Andrew Cunningham <andj.cunningham@gmail.com>, w3c WAI List <w3c-wai-ig@w3.org>
--Apple-Mail=_39B8BDC3-499D-47C1-9DE5-88B40329A589 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=utf-8 Hi Andrew, > On 05.01.2016, at 02:41, Andrew Kirkpatrick <akirkpat@adobe.com = <mailto:akirkpat@adobe.com>> wrote: >=20 > ActualText is NEVER EVER EVER used to represent the results of OCR =E2=80= =93 that would be a violation of the standard. I=E2=80=99d love to learn how you come to this statement. AFAICT it = can=E2=80=99t be derived from any of the PDF standards I know. Just to make an extreme point: if each and every (recognized) character = in an OCR-ed document would be represented by an ActualText attribute, = on the formal level of applicable standards (PDF per ISO 32000-1 or = PDF/UA per ISO 14289-1), nothing would be in violation of the applicable = standards. Whether using such an approach makes any sense is a = completely different story. Olaf --Apple-Mail=_39B8BDC3-499D-47C1-9DE5-88B40329A589 Content-Transfer-Encoding: quoted-printable Content-Type: text/html; charset=utf-8 <html><head><meta http-equiv=3D"Content-Type" content=3D"text/html = charset=3Dutf-8"><meta http-equiv=3D"Content-Type" content=3D"text/html = charset=3Dutf-8"></head><body style=3D"word-wrap: break-word; = -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" = class=3D"">Hi Andrew,<div class=3D""><br class=3D""><div><blockquote = type=3D"cite" class=3D""><div class=3D"">On 05.01.2016, at 02:41, Andrew = Kirkpatrick <<a href=3D"mailto:akirkpat@adobe.com" = class=3D"">akirkpat@adobe.com</a>> wrote:</div><br = class=3D"Apple-interchange-newline"><div class=3D""><span = style=3D"font-family: Calibri, sans-serif; font-size: 14px; font-style: = normal; font-variant: normal; font-weight: normal; letter-spacing: = normal; line-height: normal; orphans: auto; text-align: start; = text-indent: 0px; text-transform: none; white-space: normal; widows: = auto; word-spacing: 0px; -webkit-text-stroke-width: 0px; float: none; = display: inline !important;" class=3D"">ActualText is </span><span = style=3D"font-family: Calibri, sans-serif; font-size: 14px; = font-variant: normal; font-weight: normal; letter-spacing: normal; = line-height: normal; orphans: auto; text-align: start; text-indent: 0px; = text-transform: none; white-space: normal; widows: auto; word-spacing: = 0px; -webkit-text-stroke-width: 0px; font-style: italic;" class=3D"">NEVER= EVER EVER</span><span style=3D"font-family: Calibri, sans-serif; = font-size: 14px; font-style: normal; font-variant: normal; font-weight: = normal; letter-spacing: normal; line-height: normal; orphans: auto; = text-align: start; text-indent: 0px; text-transform: none; white-space: = normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px; = float: none; display: inline !important;" class=3D""> used to = represent the results of OCR =E2=80=93 that would be a violation of the = standard.</span></div></blockquote></div><br class=3D""></div><div = class=3D"">I=E2=80=99d love to learn how you come to this statement. = AFAICT it can=E2=80=99t be derived from any of the PDF standards I = know.</div><div class=3D""><br class=3D""></div><div class=3D"">Just to = make an extreme point: if each and every (recognized) character in an = OCR-ed document would be represented by an ActualText attribute, on the = formal level of applicable standards (PDF per ISO 32000-1 or PDF/UA per = ISO 14289-1), nothing would be in violation of the applicable standards. = Whether using such an approach makes any sense is a completely different = story.</div><div class=3D""><br class=3D""></div><div = class=3D"">Olaf</div><div class=3D""><br class=3D""></div></body></html>= --Apple-Mail=_39B8BDC3-499D-47C1-9DE5-88B40329A589--
Received on Wednesday, 6 January 2016 00:59:10 UTC