Re: Summary of action items, resolutions, and open issues from the F2F

Action Katie/Loretta: Send PDF techniques to the list.

********************

Summary from the group discussing PDF issues at the Face-2-Face
meeting Friday morning, based on notes by Claus Thoegerson and Katie
Haritos-Shea.

Participants:
	Loretta Guarino Reid 
	Sally Hadland 
	Katie Haritos-Shea 
	William Loughborough 
	Tom Pereira 
	Claus Thoegerson

Although the morning's topic was to look at the mapping of techniques
for PDF and see how they mapped into the proposed Guidelines and
Checkpoints, most of our time was spent just discussing techniques for
PDF. PDF is a page description language, not a Mark-up language. For
each technique, we attempted to identify the most likely checkpoint
that applies, and the version of PDF in which the language support is
first available. Some of these items refer to language features in PDF 1.4,
which has not yet been released.

All references to the PDF Reference Manual are to the PDF Reference
Second Edition, Version 1.3.


1. [Guideline 1?] Within a PDF page, there may be a sequence of show string
operations, each with a sequence of Character Codes with associated
fonts. Every such sequence of character codes must map unambiguously
into a sequence of Unicode code points. Mapping is done as follows:
	1a) If the Font contains a ToUnicode entry, convert the Character Code 
            to Unicode via the ToUnicode CMap. XS
	1b) If the Font uses one of the PDF predefined encodings
            MacRomanEncoding, MacExpertEncoding, or WinAnsi Encoding 
            (perhaps as modified by a DIFFERENCES array in the fonts 
	    encoding resource), use the DIFFERENCES array or Appendix D 
	    to convert the Character Code to an Adobe glyph name. Then use 
	    the Adobe glyph name and look up the corresponding Unicode value.
	1c) If the Font uses one of the predefined CMaps listed in Table 5.14 
            on page 320 of the PDF Reference Manual except Identity-H and 
	    Identity-V, convert the Character Code to a Unicode value via 
	    the following steps.
		1) Obtain the Registry and Ordering of the predefined CMap 
                   from the CIDSystemInfo of the appropriate CMap.
		2) Concatenate the Registry and the Ordering according to the 
                   format "<registry>-<ordering>-UCS2" to obtain a second 
	           CMap name, e.g. "Adobe-Japan1-UCS2". Obtain that CMap.
		3) Index into the predefined Cmap, using the Character Code, 
                   and obtain an Intermediate Value.
		4) Index into the CMap obtained in step 2), using the 
                   Intermediate Value, and obtain a Unicode Value.
	    If any of these four steps fail, e.g. there is no CMap of that 
	    name or the indexing value is missing or undefined in the CMap, 
	    then there is no mapping of the character code to Unicode.
	1d) If the font is a Type 0 font whose descendant CIDFOnt uses
            the Adobe-Japan, Adobe-Korea, Adobe-CNS1, or Adobe-GB1 character
            collection, as specified in the CIDSystemInfo dictionary, follow 
	    the same steps as in 1c) to obtain the character code mapping.
	1e) If the Font is a Type 1 font whose character names are
            taken from the Adobe standard Latin character set and the set 
	    of named characters in the Symbol font, documented in Appendix C, 
	    use the corresponding Unicode value found by looking up the glyph 
	    name.

2. Separate words explicitly with spacing characters. Do not rely on
the location of the characters or the division of characters into
showstring operations to indicate word breaks. Note that this implies
that lines of text for western languages usually end with a trailing
space character.
	 
3. [Guideline 1.1] If characters are not rendered using the showstring
operation, they must be marked in the page as a Span element with an
ActualText value reflecting the desired Unicode value. (PDF 1.4)

4. [Guideline 1.1] All images and other non-text content must have an
Alt property to provide a textual equivalent. (PDF 1.3)

5. [Guideline 1.1] Multimedia annotations such as Sounds and
Movies must be accessible.

6. [Guideline 2.5] Provide logical structure (PDF Reference Manual
Section 8.4.3) for the document. Map structure types to the standard
structure types described in Adobe Technical Note #5401. (PDF 1.3)
 
7. Set the data access restrictions on the document to permit the
contents to be accessed. In PDF 1.3 and early, permit the text and
graphics in the document to be copied. In PDF 1.4, set accessibility
permission for the document.

8. [Guideline 4.1] Use bookmarks and links within a document to
provide navigation aids.
 
9. [Guideline 3.8] Mark tables appropriately with the structure types
described in Adobe Technical Note #5401. (PDF 1.3).

10. [Guideline 3.9] Provide expansion attributes for abbreviations and
acronyms. (PDF 1.4)

11. [Guideline 2.5] Use the language tagging facilities (Lang) to
specify the natural language of all text in the document.

12. Tag Artifacts in the page contents, so that users can control how
and whether they are included in the contents of the
document. (PDF 1.4) Artifacts are either
	12a) Artifacts of the printing process, like crop-box markings
             and document file name printed outside the crop box.
	12b) Artifacts of the pagination of the document, that is
             elements that would be absent or present in a much different 
	     form if a document was always one big page. like running headers 
	     and page numbers
	12c) Artifacts of the layout process and typographic style,
             like a horizontal rule above a footnote.
 
13. Use a soft hyphen, identified by a character that maps to the
Unicode value U+00AD or 173 decimal, when a line-break hyphen is
introduced into the middle of a word.

We also discussed whether search commands should search Alt text; this
is not a Contents question but a User Agent question.

 

Received on Wednesday, 18 October 2000 23:20:53 UTC