This document summarizes the text composition requirements in the Chinese writing system. One of the goals of the task force is to describe the issues in the Chinese layout requirements, another one is to provide satisfactory equivalent to the current standards (i.e. Unicode), also to promote vendors to implement those relevant features correctly.
This document was created by the Chinese Layout Task Force within the W3C Internationalization Interest Group, and in collaboration with the W3C HTML5 Chinese Interest Group. The Internationalization Working Group has been a great help during the writing of this document. The Chinese Layout Task Force will work with Internationalization Working Group to publish Working Drafts of this document, and to widen the exposure and review of the document.
If you wish to make comments regarding this document,
please raise a github issue.
If you are unable to use github issues, you may also send
email to the list mentioned below. Please include [clreq] at the start of your
email's subject. To make it easier to track comments, please raise
separate issues or send separate emails for each comment. All comments
are welcome.
Introduction
Purpose of this document
Each cultural community has its own language, script and writing system. The transfer of each writing system into cyberspace is a task with very high importance for information and communication technology.
As one of the basic work items of this task force, this document summarizes the text composition requirements in the Chinese writing system. One of the goals of the task force is to describe the issues in Chinese layout requirements, another is to provide satisfactory equivalents to the current standards (i.e. Unicode), and another is to prompt vendors to implement those relevant features correctly.
How This Document Was Created
Chinese composition exhibits several differences from other writing systems. The major features include:
There are two written scripts in Chinese, using traditional and simplified characters. Apart from the differences between glyphs and strokes, the composition rules can be different as well.
There are two writing modes: vertical and horizontal. The former is often seen in Traditional Chinese publications.
In principal, the characters, including Chinese (hanzi) characters and punctuation, used in Chinese composition are squares with the ratio of 1:1, and are seamlessly arranged with one another.
The remaining English text in this section (ie. what follows) comes from hackpad, but seems to no longer be in the Chinese version. Should it be deleted?
This document mainly adopts the following policies to explain the features of Chinese composition:
It does not fully cover all details of the Chinese composition system, but mainly describes the differences from Western composition systems.
It explains in detail the similarities and differences between traditional and Simplified Chinese composition.
It describes presentational results and considers these results as issues and requirements for Chinese text layout. Meanwhile, it offers principles or methods for for handling these issues, without describing particular technological solutions.
It suggests solutions for, or explains, present-day issues that people face in Chinese composition.
It provides typical instances of Chinese composition and their actual use cases as much as possible.
In consideration of non-Chinese readers of this document, figures are used for explanation wherever possible.
It mainly explains modern Chinese publications. Looking back to the publications in the time of movable type, there may be some differences, but they are still considered part of Chinese composition. The document does not yet fully cover ancient books. Future editions may be revised with such features in mind.
For non-Chinese readers, frequency of use is indicated for each requirement. These frequencies are not the outcome of any accurate research, but arise from the long experience of the authors. Non-Chinese readers should understand that they are intuitive for ordinary Chinese readers. These frequencies provide only rough information to prioritize the importance of issues.
The main target of this document is common books. Other publications, such as magazines or newspapers, are also included.
Basics of Chinese Composition
Characters used for Chinese composition
Characters used for Chinese composition remove duplicate heading?
The majority of the text used in Chinese composition consists of Han characters (hanzi).
Chinese characters include Traditional Chinese and Simplified Chinese alternatives. The former is commonly used in Taiwan, Hong Kong and Macao while the latter is commonly used in Mainland China, Singapore and Malaysia. In this document, the number of strokes and the compositional nature of a character are considered to be the major differences between Simplified Chinese characters and Traditional Chinese characters.
Different typefaces are used different in regions. The presentational aspects of typefaces do not have a relationship to Unicode, but rather depend on the operating system and the character library. The focus of this document is Chinese composition and we will not discuss typeface presentation.
In addition to the Chinese (hanzi) characters, various punctuation marks, as well as Western characters such as European numerals, Latin letters and/or Greek letters, may be used in Chinese text.
This note appears to no longer be in the Chinese translation. Delete? One Simplified Chinese character may have more than one corresponding Traditional form. For example, the Simplified Chinese character 发 can be matched to either the Traditional Chinese character 發 or 髮, depending on the context.
Chinese Characters
Chinese characters have square character frames of equal dimensions. Aligned with the vertical and horizontal center of the character frame, there is a smaller box called the letter face, which contains the actual symbol. (There should be some space left between the letter face and the frame). Most of the punctuation marks for Chinese characters share the same size as the other characters, however some of them differ, such as the ellipsis, which is one-character high and two-characters wide.
Character size is measured by the size of the character frame. Character advance is a term used to describe the advance width of the character frame of a character. The frame of a Chinese character should be a square.
Principles for Arranging Characters during Chinese Composition
In principle, when composing a line with Chinese characters, no extra space appears between their character frames. This is called solid setting.
Unlike the letterpress printing era, when several sizes of the original pattern of a letter were required to create matrices, in today's digital era the same original pattern is used for any size simply by enlargement or reduction. Because of this, it might be necessary to adjust the inter-character space when composing lines at large character sizes. When composing lines at small character sizes, hinting data is used to ensure that the width of the strokes that make up a character look correct.
Depending on the context, in addition to solid setting several alternative setting methods can be used, as described below.
Increased inter-character spacing
It is common in books to increase the space between each character frame for the following cases:
To achieve a balance between running heads with different numbers of characters. Increased inter-character spacing is used for running heads with few characters.
For captions of illustrations and tables, which only have a few characters. Increased inter-character spacing is used to balance with the size of an illustration or table.
In some cases, increased inter-character spacing is used for poetry where one line has only a few characters, so as to maintain the balance of the layout.
For publications whose main audience is children, inter-character spacing is increased to make it easier for the children to read.
Even inter-character spacing
Text may be set with equal inter-character spacing between all characters on a given line, so that each line is aligned to the same line head and line end. Since the Chinese characters and punctuation marks are all in square shapes with almost the same dimensions, it is natural that each line is aligned to the same line head and line end. Even inter-character spacing is mainly used in the following cases:
To deal with rules that forbid certain characters at the beginning or end of a line. When a punctuation mark which is not supposed to be positioned at the end of a line happens to appear there, even inter-character space setting is used to move the character before the punctuation mark to the next line together with the punctuation mark. When a punctuation mark which is not supposed to be positioned at the beginning of a line happens to appear there it is necessary to move the last character from the previous line to the beginning of the next line, and there will be one or two (or sometimes more) empty spaces left in the previous line. Even inter-character space setting is used to unify the length of each line and justify them.
Even inter-character space setting is used when the number of characters in a table head differs from the table content, such as for person names, so as to justify the table.
Reduced inter-character spacing
By reducing the inter-character spacing, a portion of two character frames overlap each other. This method is mainly used in the following cases:
For characters in headings in magazines or advertisements, reduced inter-character spacing can be used to keep the characters on one line, or it can also be used to achieve a special effect for presentation.
Since Chinese characters are all square-shaped, this method does not apply to headings and content in books produced by letterpress printing.
中文排版常用字體
Typefaces for Chinese Characters
中文排版經常使用的四種字體
Four Frequently-used Typefaces for Chinese Characters I'm inclined to think that a heading is superfluous here.
中文排版時,主要使用的四種字體為:
宋體(明體、明朝體)
黑體
楷體
仿宋體
這四種字體於書籍排版上有其常見之使用方式,下列各節分別敘述其使用情境。
There are four main typefaces in use for Chinese characters: Song, Hei, Kai and Fangsong (Imitation Song). This following sections describe common practice and contexts for use of these four typefaces.
宋體(明體、明朝體)
Song/Ming
宋體,又稱為明體或明朝體,是中文排版最常使用之字體。
Song, also known as Songti or Ming, is a category of typefaces used to display Chinese characters, and currently the most common style of type in print for Chinese. In Mainland China, the most common name is Song while in Hong Kong, Taiwan, Japan and Korea, Ming is prevalent.
普遍使用於內文文字、標題與注釋。當應用於標題時,通常會特別加強字重,令其與內文有所差異。
Song is commonly used in text, headings and annotations. When used in headings, the characters will appear in a bold face, so as to distinguish the heading from the text.
楷體
Kai
楷體為帶有書法形態、手寫筆觸之字體。
Kai, also known as Kaiti or regular script, is another of the major typefaces, and provides calligraphic styles for Chinese characters. It is also the most easily and widely recognized style and shows notable handwriting features.
Kai is mainly used in text that needs to be differentiated from the rest of the content, for example, headlines, references, quotations, and dialogs. It is rarely used for emphasis, because of its similarity with Song.
由於楷體保留了書法筆觸,普遍用於公文書、教科書之內文字。
Since Kai retains some calligraphic features, it is widely used in office documents and textbooks.
黑體
Hei
Hei, also known as Heiti or Gothic, is a type style characterized by strokes of even thickness, reduced curves, and a lack of decoration. It is commonly used in headlines, signs, and personal names in dialogs. For certain types of text, characters in Kai style with thicker strokes typically indicate emphasis.
Traditional publications rarely apply the Hei style for content, but with the growing influence of the World Wide Web and the digital publishing industry, some publications are starting to experiment Hei in this context.
Fangsong (Imitation Song)
The Fangsong (Imitation Song) style lies between Song and Kai. It is commonly used in isolated paragraphs such as quotations or highlighted sentences.
The Type Area (or Printing Area)
The type area, sometimes called the printing area, is designed in the following sequence. There is no actual definition of what the type area is. I suggest we change the foregoing to something like: The type area, sometimes called the printing area, is the rectangle in the middle of the page that contains the main body of the text. It is surrounded by space containing headers, footers, notes, etc.
First, prepare a template of the page format, which determines the basic appearance of document pages.
Then, specify the details of actual page elements based on the template.
Books usually use one basic template for page format, whereas magazines often use several templates.
Although in books, there tends to be one template for the page format, some further design effort based around the basic page format will be needed for pages such as the table of contents and indexes. Furthermore, there are many examples of indexes with a different page format than the basic page format, and vertically set books often have indexes in horizontal writing mode, and sometimes multiple columns. However, while doing actual design for the page, the size of the type area should not exceed the basic page template. (have some doubts about the translation of 版面 here).
Magazines usually contain various kinds of content, which naturally leads to various designs of templates, different sizes of characters, and varying numbers of columns.
Basic Elements of Page Format
The following are the basic elements of a page format.
Trim size and binding side (vertically set Chinese documents are bound on the right-hand side, and horizontally set documents are bound on the left-hand side.)
Principal text direction (vertical writing mode or horizontal writing mode).
Appearance of the type area and its position relative to the trim size.
Appearance of running heads and page numbers, and their positions relative to the trim size and type area.
Establishing a type area may be seen as defining not only a rectangular area on a page, but also within that area an underlying, logical grid, to guide the placement of such things as characters, headings, and illustrations. Once the grid is established according to the principles of composition, the setting of the characters must align with the grid. If the content contains Chinese characters only, it is an important principle that the first and last characters on a line should align with the border of the type area. When both Chinese characters and Western text are mixed in the content, or forbidden locations of punctuation marks need to be taken into consideration, the setting of the content may not align with the grid.
Design of the Type Area
The type area defines the basic printing style of a book. The following are the basic elements of the type area.
Character size and typeface name
Text direction (vertical writing mode or horizontal writing mode)
Number of columns and column gap when using multi-column format
Length of a line
Number of lines per page (number of lines per column when using multi-column format)
Line gap
Type Area and Real Page Format
This section explains how to create a real page format based on the type area.
Realm and position of headings: The direction and size of the characters of the heading should be based on a number of lines in the type area. The size of the indent is usually specified as a number of characters in the type area.
Size of illustrations: In horizontal writing mode, the width of illustrations should, if at all possible, be the width of the type area; in horizontal writing mode with two columns, the width of illustrations should, if at all possible, be the width of one type area column. The illustrations are usually set at the head or the foot of the page. Likewise, in vertical writing mode, the height of illustrations should, if at all possible, be either the height of one type area column or the height of the type area. The illustrations are usually set at the right side or left side of the type area.
Page for Table of Contents, Indexes and References: The size of the type area for the table of contents, indexes and references of books is based on the size of the type area for the main body content. There are many examples of tables of contents in vertical writing mode where the left-to-right size is identical to that of the type area, but the top-to-bottom size is a little bit smaller.
Procedure for Defining the Type Area
Specifying the dimensions of the type area
For a document with a single column per page, specify the character size, the line length (the number of characters per line), the number of lines per page, and the line gap.
For a document with multiple columns per page, specify the character size, the line length (the number of characters per line), the number of lines per column, the line-gap, the number of columns and the column gap.
Determining the position of the type area relative to the trim size. There are various alternative methods for specifying the position of the type area relative to the trim size:
Set the type area at the horizontal and vertical center of the trim size.
Position vertically by specifying the size of the space at the head (for horizontal writing mode) or the space at the foot (for vertical writing mode). Position horizontally by centering the type area.
Position vertically by centering the type area. Position horizontally by specifying the size of the space for the gutter.
Position vertically by specifying the space at the head (for horizontal writing mode) or the space at the foot (for vertical writing mode). Position horizontally by specifying the size of the space for the gutter.
In most cases the type area is set at the horizontal and vertical center of the trim size, and then it can be adjusted depending on its dimensions. This design method is mainly inherited from the letter press printing technology. For desktop publishing, the dimensions of the type area are usually calculated based on the space between the type area and the trim size.
Considerations when Designing the Type Area
The following are considerations that need to be taken into account when designing the type area:
When deciding the dimensions of the type area, it is necessary to consider both the trim size and the margin. Generally speaking, the shape of the type area could be made similar to that of the trim size.
Character size. For the main target audience of publications, ie. the adult population, most commonly the character size is 10.5pt (≒3.7mm) or 9pt (≒3.2mm). The minimum acceptable size of type is 8pt (≒2.8mm), except for specialized publications.
There are two traditional size systems for Chinese characters, and old one and a new one. The following shows the equivalence in the Western point system. In the Old size system, Size 0 = 42pt, Size 1 = 27.5pt, Size 2 = 21pt, Size 3 = 15.75pt, Size 4 = 13.75pt, Size 5 = 10.5pt, Size 6 = 7.875pt, and Size 7 = 5.25pt; while 4 lines of New Size 5 is 36ptdon't understand the foregoing, New Size 1 = 24pt, New Size 2 = 18pt, New Size 3 = 16pt, New Size 4 = 12pt, New Size 5 is 9pt.
Size 5 is usually used for text content. Newspapers and magazines use both Size 5 and New Size 5. The acceptable minimal size for the text in content is Size 6 (7.875pt≒2.8mm). If a smaller size is used, it will be difficult to read due to the complex structure of the Chinese characters.
Line length should be multiples of the character size and each line should be in alignment with each other.
For Chinese composition without intermixed Western scripts, the characters all have a square-shaped frame, so all line lengths except that of the last line of the paragraph should, in principle, be the same.
The line gaps between each line should be the same throughout the book, except for special cases. It would probably help, at this point, to define what the line gap is, since it initially sounds like the same thing as line height. What are the reference points for measuring the line gap?
In Traditional Chinese composition, there are cases where pronunciation marks, referred to as 'ruby' in the Japanese Layout Requirements, are inserted between lines. In such cases the line gap is not changed but is kept constant . If these elements are likely to occur in text, the line gap established during the type area design needs to be of an adequate size to accommodate them.
The line gap for the type area is commonly set to a value between 50% and 100% of the height of the character frame used for the type area. A shorter line gap can be chosen in cases where the line length is short or the character size of the type area is relatively small. On the other hand, the line gap usually does not exceed the character size. Increasing the line gap beyond the character size does not improve the reading experience.
There is another method of specifying the type area that uses line height rather than line gaps. Line height is the distance between two adjacent lines measured from their reference points. The reference point differs from implementation to implementation, however, in vertical writing mode the horizontal center of the character frame is usually used, and with horizontal writing mode, the vertical center of the character frame is used. When the character size is the same for every character, the following calculation is used:
line height = character size / 2 + line gap + character size / 2 = character size + line gap
The usage of Chinese punctuation marks differs across different regions in China. One of the major differences is their behavior for composition. Punctuation marks in Traditional Chinese are usually positioned center-aligned with adjacent Han characters, while punctuation marks in Simplified Chinese should be aligned with the character they follow, and this alignment varies according to whether the composition uses vertical or horizontal writing mode. The differences of composition for punctuation in Traditional Chinese and Simplified Chinese, as well as the correct way to position them, will be introduced in more detail later.
Major typesetting differences between Traditional Chinese and Simplified Chinese include the positioning of punctuation and terminological variations. (See more at ).
There are differences between the punctuation marks used in China, Japan, Korea and Vietnam, but Unicode does not distinguish them using different codes. A number of punctuation marks are shared among Traditional Chinese, Simplified Chinese and Japanese. Usually the font used for the Han characters will determine the style of the punctuation marks, or they will be adjusted by the composition engine automatically.
The content of the following section is mainly based on the content of General Rules for Punctuation(GB/T 15834—2011) issued in Mainland China, as well as the Punctuation Guidance (2008 revised edition) issued by The Ministry of Education in Taiwan. The former is a recommended national standard while the latter is not mandatory for general publications but mainly used to regulate education materials like textbooks.
Categories and Usage of Punctuation Marks
Pause or Stop Punctuation Marks.
Pause or stop punctuation marks are used to indicate pauses or the end of a sentence. Some of the pause or stop punctuation marks appear within a sentence, such as the slight-pause comma, comma, semicolon, colon, etc., while others appear at the end of a sentence, such as period, question mark and exclamation mark.
Ideographic full stops, fullwidth commas and ideographic comma.
U+3002 IDEOGRAPHIC FULL STOP [。] is the punctuation mark placed at the end of a sentence. U+FF0C FULLWIDTH COMMA [,] is mainly used for separating parts of a sentence such as clauses, and items in lists, particularly when there are three or more items listed. U+3001 IDEOGRAPHIC COMMA [、] (slight-pause comma) is usually used to separate items in lists, as a way to show sequence.
In many college books, science and technology literature, and grammar books of Western languages, most of which are in horizontal writing mode, Western language text is heavily used. In this case, U+FF0E FULLWIDTH FULL STOP [.] can be used as period, while U+002C COMMA [,] or U+FF0C FULLWIDTH COMMA [,] can be used as comma and slight-pause comma.
Fullwidth colon and fullwith semicolon.
U+FF1A FULLWIDTH COLON [:] consists of two equally sized dots centered on the same vertical line. It is used to explain or start an enumeration and it is also used with ratios, titles and subtitles of books, city and publisher in bibliographies, business letter salutation, hours and minutes, and formal letters. U+FF1B FULLWIDTH SEMICOLON [;] is a punctuation mark that separates major sentence elements. A semicolon can be used between two closely related independent clauses, provided they are not already joined by a coordinating conjunction. Semicolons can also be used in place of commas to separate items in a list, particularly when the elements of that list contain commas.
Fullwidth Exclamation Mark and Fullwidth Question Mark.
U+FF01 FULLWIDTH EXCLAMATION MARK [!] is a punctuation mark usually used after an interjection or exclamation to indicate strong feelings or high volume (shouting), and often marks the end of a sentence. U+FF1F FULLWIDTH QUESTION MARK [?] casually known as the interrogation point, query, or eroteme, is a punctuation mark that indicates an interrogative clause, or phrase in many languages. The question mark is not used for indirect questions.
Indication is this the right word? should it be Indicator? Punctuation Marks.
In contrast with pause or stop punctuation marks, indication punctuation marks usually indicate a specific feature of the phrase or sentence. They include brackets, parentheses, em dashes, horizontal ellipsis, black circles or bullets, tildes, middle dots, angle brackets, low lines, and solidus.
Bracket/Quotation Mark.
Brackets, usually used in pairs, are commonly used to emphasize certain characters or words, or to indicate the beginning and ending of the dialog or quoted content. If there is a need to use a bracket within a pair of brackets, the shape of the inner brackets will differ from the parenting brackets.
When there is a need for quotation marks, Traditional Chinese will apply single quotation marks first and then double quotation marks. Single quotation marks include U+300C LEFT CORNER BRACKET [「] and U+300D RIGHT CORNER BRACKET [」]; double quotation marks include U+300E LEFT WHITE CORNER BRACKET [『] and U+300F RIGHT WHITE CORNER BRACKET [』].
On the other hand, Simplified Chinese will apply double quotation marks first and then single quotation marks. For Simplified Chinese, double quotation marks include U+201C LEFT DOUBLE QUOTATION MARK [“], U+300E LEFT WHITE CORNER BRACKET [『], U+2019 RIGHT SINGLE QUOTATION MARK [”], U+300F RIGHT WHITE CORNER BRACKET [』]; the single quotation marks include U+2018 LEFT SINGLE QUOTATION MARK [‘], U+300C LEFT CORNER BRACKET [「], U+2019 RIGHT SINGLE QUOTATION MARK [’] and U+300D RIGHT CORNER BRACKET [」].
Some publications in Traditional Chinese might also apply double quotation marks first and then single quotation marks.
Traditional Chinese might also use quotation marks, but it is hardly ever used in vertical writing mode.
Parentheses.
括號用於行內注釋、說明。 Parentheses, also called simply brackets, round brackets, or curved brackets, contain material that serves to clarify, or is aside from the main point. Parentheses used in Chinese inlcude U+FF08 FULLWIDTH LEFT PARENTHESIS [(], U+FF09 FULLWIDTH RIGHT PARENTHESIS [)] and U+2014 EM DASH [—]. Either one em dash [—] or two consecutive em dashes can be used.
General Rules for Punctuation (GB/T 15834–2011), the national standard issued by China Central Government, lists em dash as a kind of dash.
There are other brackets and quotation marks which include: U+3010 LEFT BLACK LENTICULAR BRACKET [【], U+3011 RIGHT BLACK LENTICULAR BRACKET [】], U+3016 LEFT WHITE LENTICULAR BRACKET [〖], U+3017 RIGHT WHITE LENTICULAR BRACKET [〗], left U+3014 LEFT TORTOISE SHELL BRACKET [〔], U+3015 RIGHT TORTOISE SHELL BRACKET [〕], U+FF3B FULLWIDTH LEFT SQUARE BRACKET [[], U+FF3D FULLWIDTH RIGHT SQUARE BRACKET []], U+FF5B FULLWIDTH LEFT CURLY BRACKET [{], U+FF5D FULLWIDTH RIGHT CURLY BRACKET [}]. These brackets and quotation marks are rarely used in Chinese publications.
Em Dash.
U+2014 EM DASH [—] sometimes shows a continuation of tone or sound, an abrupt change in thought, or adding new content to the context. This punctuation takes one-character height and two-character width; sometime two em dash can be used together, ie. "——"
Horizontal ellipsis.
In Chinese, the horizontal ellipsis usually consists of 3 dots, U+2026 HORIZONTAL ELLIPSIS […], or 6 dots (in two groups of three dots, occupying the same horizontal space as two characters, ie. "……"). They usually indicate an intentional omission of a word, sentence, or whole section from a text. Depending on their context and placement in a sentence, ellipses can also indicate an unfinished thought, a leading statement, a slight pause, a mysterious, echoing voice, or a nervous or awkward silence.
Emphasis Dots.
Emphasis dots are symbols placed above or beneath characters to emphasize the text, strengthen the tone, or avoid ambiguity. For horizontal writing mode, the emphasis dots are placed under the characters, whereas in vertical writing mode, they are usually placed to the right side of the characters. Both U+25CF BLACK CIRCLE [●] or U+2022 BULLET [•] can work as emphasis dots.
Punctuation Guidance (revised edition) issued by The Ministry of Education in Taiwan does not include this mark but it is still seen in some publications.
Connector Symbols.
Connector symbols are used to indicate the beginning and end of time or space, to indicate quantity, to express the name of a chemical compound, to label a table or illustration, to connect a house number in an address, for a phone number, to separate digits which indicate the year, month and date, or to connect compound nouns, for the romanization as well as the foreign text in the content.
According to the General Rules for Punctuation(GB/T 15834—2011), there are three types of connector symbol, which are the short connector symbol [–], the long connector symbol [—], and tilde [~].
The General Rules for Punctuation(GB/T 15834—2011) does not state the corresponding Unicode code point for the three types of connector symbols. However, we can make the deduction that the long connector symbol [—] is U+2014 EM DASH [—] and the tilde [~] is U+FF5E FULLWIDTH TILDE [~] . Since the short connector symbol should take half the width of the long connector symbol, it should be U+2013 EN DASH [–]. The actual length of these connector symbols may depend on the writing system as well as the typeface.
Middle Dot.
U+00B7 MIDDLE DOT [·], also known as interpoint, middot or centered dot, is a punctuation mark consisting of a vertically-centered dot, and is used to separate the first name and family name in names translated from a foreign language, or minority groups names. It is also used with double quotation marks to separate chapters, articles and volumes in publications.
Middle dot is applied to Chinese only. When a translated foreign name contains a Latin counter, the full stop should be used rather than the middle dot. For example, 「比尔·盖茨」 but 「B. 盖茨」.
The usage of middle dot differs between Traditional Chinese and Simplified Chinese. In principle, the middle dot, either in vertical writing mode or horizontal writing mode, should have the same dimensions as a character; while in Simplified Chinese, the middle dot sometimes has half the width of a character when it is used to separate the month and the date, e.g. 9·11.
Due to the fact that BIG-5 Code does not give a detailed definition of the middle dot, sometimes U+FF0E FULLWIDTH FULL STOP [.], U+2027 HYPHENATION POINT [‧] and U+2022 BULLET [•] are used as replacement for the middle dot. U+30FB KATAKANA MIDDLE DOT [・] is tightly connected to the JIS code system, it is not recommended to use this.
Book Title Mark.
書名號用於標示書名、篇名、歌曲名、影劇名、文件名、字畫名等各種作品名稱。 The book title mark is used to indicate the names of works which usually include books, articles, songs, movies, files, calligraphy and paintings. Generally there are two types of book title marks, wavy low lines or angle brackets. U+FE4F WAVY LOW LINE [﹏] is positioned beneath the corresponding characters. When two works are listed next to each other, the wavy lines for each should be clearly separated. The angle bracket includes U+300A LEFT DOUBLE ANGLE BRACKET [《], U+300B RIGHT DOUBLE ANGLE BRACKET [》], U+3008 LEFT ANGLE BRACKET [〈] and U+3009 RIGHT ANGLE BRACKET [〉].The former pair is used for the names of books while the latter pair is used for the names of the articles. When two book title marks are positioned next to each other, there should be a clear separation to indicate the difference names.
According to the General Rules for Punctuation (GB/T 15834―2011), the names of books as well as chapters should be quoted using double angle brackets [《》]. When there is a need to indicate the name of another book within the double angle brackets [《》], the ordinary angle brackets [〈〉] should be used.
Book title marks are a kind of brackets.
The wavy low line is rarely used in modern publications, but can still be seen in some textbooks and ancient publications.
Fullwidth low line.
U+FF3F FULLWIDTH LOW LINE [_] is positioned underneath proper nouns such as a person's name, the name of a place, etc.
When two proper nouns are listed together, the FULLWIDTH LOW LINE should provide a visual distinction for them.
As with WAVY LOW LINE, the FULLWIDTH LOW LINE is rarely used in modern publications, but it can still be seen in some textbooks and ancient publications.
Solidus.
Both U+002F SOLIDUS [/] and U+FF0F FULLWIDTH SOLIDUS [/] are used to indicate the separation of lines in poetry, syllable beats, and characters which should be separated.
Punctuation Guidance (revised edition) issued by The Ministry of Education in Taiwan does not include the SOLIDUS, but it is frequently used in traditional publications, including textbooks.
Sizes and positions of Punctuation Marks
Please find the shape and usage of punctuation marks at section and . There are no notable differences between the punctuation used in Traditional Chinese and Simplified Chinese. The major differences between the two are in the dimension and position of the punctuation marks.
Punctuation marks used in Traditional Chinese are usually positioned in the vertical and horizontal center of the square space left for them; while in vertical writing mode and horizontal writing mode, some of the punctuation marks are positioned in different directions so as to mark the corresponding characters more accurately. For Simplified Chinese, the punctuation marks are usually positioned following the characters they are supposed to mark; while some punctuation marks might be positioned in different directions due to the vertical or horizontal writing mode. Also, different writing modes might require different punctuation marks to fulfill the same function, e.g. horizontal writing mode requires curved quotation marks while vertical writing mode requires angle brackets.
Pause or stop punctuation marks include the slight-pause comma, comma, semicolon, colon, period, question mark, exclamation mark, etc. In Traditional Chinese, they take the same dimensions as well as the direction as a character does. Traditional Chinese pause or stop punctuation marks are usually positioned in the vertical and horizontal center of the square space left for them. In Simplified Chinese, they are positioned in the top or bottom side in the space left for them following the marked characters. In horizontal writing mode, the pause or stop punctuation marks are placed at the lower left corner in the square space while in vertical writing mode, they are placed in the right upper corner.
Brackets marks include quotation marks, parentheses, title marks, etc. They should be positioned in pairs at each side of the marked character and have the same dimensions as a character, and the same direction as the characters. Bracket quotation marks have different positioning rules in Traditional Chinese and Simplified Chinese. In Traditional Chinese, single quotation marks will be used first and then double quotation marks, whereas in Simplified Chinese, double quotation marks will be used first and then single quotation marks. Also, the writing mode should be taken into consideration too. Horizontal writing mode requires curved quotation marks while vertical writing mode requires angle brackets.
Ellipsis and long dash, in the vertical and horizontal center of the square space for them, should be one character in height and two-characters in width. They are not supposed to be separated from one line to the next and should be positioned in the same direction as the characters.
Dashes, with the same dimensions as one character, should be positioned in the vertical and horizontal center of the square space for it. Among the dash marks, EN DASH should have a short length to make a clear distinction from the Chinese character [一], which means one. And they should be positioned in the same direction as the characters they mark.
Solidus, with the same dimensions and direction as the character it follows, should be positioned in the vertical and horizontal center of the square space for it. To make a more economical use of the space, or to set the members or characters more solid, sometimes the solidus can have a half character width.
Inline marks like title marks, wavy low lines, and emphasis dots should be positioned underneath the marked characters in horizontal writing mode. In vertical writing mode, emphasis dots should be positioned to the right side of the marked characters so as not to affect the characters above and beneath them.
Solidus should be positioned in the vertical and horizontal center of the square space for it. In Simplified Chinese, it should take half a character width, whereas in Traditional Chinese, there is no clear rule about its dimension but most publications will give them the same dimensions as one character.
Atypical punctuation marks and their composition
Science and technology literature
Science and technology literature prefers U+FF0E FULLWIDTH FULL STOP [.] to U+3002 IDEOGRAPHIC FULL STOP [。] so as to make a clear distinction from letter [o] or digit [0].
Special cases in Traditional Chinese publications
In Traditional Chinese publications such as ancient books, science and technology literatures, textbooks, or the books that have quotations in Western languages, some pause or stop punctuation marks, including slight-pause comma, colon and period, are positioned following the marked characters. The same applies for Simplified Chinese as well as Japanese so as to make the same style for the punctuation marks in both Chinese and Western languages.
Prohibition Rules for Line Start and Line End
In order to maintain a smooth reading experience and consistency of the style, there are certain constraints for the positioning of most punctuation marks. In most cases, according to its function, a punctuation mark is prohibited from appearing at the line start or line end. This rule was first implemented during the time of letterpress printing. In Mainland China, the national standard General Rules for Punctuation (GB/T 15834–2011) sets clear rules about the positioning of punctuation marks. In the regions that use Traditional Chinese, there is not yet a standard for the usage and positioning of punctuation marks, but most of the publications apply the rules described in this document.
In Traditional Chinese, there is no strict rule indicating that a punctuation mark must not appear at the line start. In the time of letterpress printing, there were quite a few publications which ignored the prohibition rules for punctuation marks.
In Traditional Chinese publications like newspapers and magazines, columns are often used in the layout, which leads to fewer characters in each line, and prohibition rules of punctuation marks are sometimes ignored under these circumstances.
In order to avoid a punctuation mark appearing at the line start, the last character from the previous line can be moved to the beginning of next line and the extra space left for the previous line should be divided and inserted equally between the characters of previous line. However, in the case where several punctuation marks appear together, for example [。』」], moving one character from previous line might cause too much space left between the characters. In this case, the punctuation marks might be allowed to appear at the line start so as to keep a reasonable space between characters in each line.
Pause or stop punctuation marks including slight-pause comma, comma, semicolon, colon, period, question mark, exclamation mark, as well as right quotation marks, right parentheses, right angle brackets, ellipsis, dash, etc, should not appear at the line start.
Left parentheses, left quotation marks, left angle brackets and left title marks should not appear at the line end.
Prohibition Rules for Unbreakable Marks
Punctuation Marks
The following punctuation marks should be considered as one unit and take two-character widths. They should not be separated into two lines. In the case where multiples of these punctuation marks appear together, it is allowed to separate them into two lines as described in . If they were forced to remain on one line, it might cause too much space between the characters in the previous line and decrease the aesthetics of the entire composition.
In the digital era, these punctuation marks usually take the width of 2 characters but are still considered as one unit.
Em dash and long dash.
U+2014 EM DASH [—] should take one-character height and two-character widths, and long dash [——] can be used created by using two adjacent em dashes. What about U+2E3A TWO-EM DASH [⸺]?
Horizontal ellipsis.
U+2026 HORIZONTAL ELLIPSIS […] takes one-character height and two-characters in width. Two horizontal ellipses can be used together 「……] .
According to section 5.1.5 of General Rules for Punctuation (GB/T 15834―2011), when two horizontal ellipses are used together, they should be four characters wide and occupy an independent line.
Digits and their Prefix and Suffix
Annotation Marks
Compression Rules for Punctuation Marks
The punctuation marks in Chinese usually have the dimensions of one character (or more), will provides a clear distinction between characters and leaves some room for composition adjustment. However, if there is no character before or after the punctuation mark(s), the empty space around the punctuation mark(s) will seem a bit abrupt. In this case, proper compression for the punctuation mark(s) will make the composition more tightly-knit and readable.
Usually there are two ways to compress punctuation mark(s). First, when multiple punctuation marks appear together, the space between the punctuation marks can be adjusted; second, when multiple punctuation marks appear at the line start or line end, the space at the line start or line end can be adjusted.
Compression of adjacent punctuation marks
When opening bracket(s), closing bracket(s), slight-pause comma, comma, period or interpunct appear together, the following rules for space adjustment will make the composition more solid and readable.
In Simplified Chinese, when one or more closing brackets appear behind a slight-pause comma, comma or period, a space of half a character width can be reduced. This rule does not apply to Traditional Chinese.
When a slight-pause comma, comma or period appears after a closing bracket, a space of half a character width can be reduced.
When an opening bracket appears after a slight-pause comma, comma or period, a space of half a character width can be reduced.
When an opening bracket appears after a closing bracket, a space of half a character width can be reduced.
When two or more opening brackets appear together, a space of half a character width can be reduced.
When two or more closing brackets appear together, a space of half a character width can be reduced.
When a solidus appears after a closing bracket, a space of a quarter of a character width can be reduced.
When a solidus appears before an opening bracket, a space of a quarter of a character width can be reduced.
Compression of punctuation marks at line start
When a punctuation mark appears at line start or line end, the following rules for space adjustment will make the composition more solid and readable.
For the case of line head indent, if a bracket is set at the beginning of a line, half a character space can be reduced ahead of the bracket.
When an opening bracket appears at the beginning of a line, half a character space can be reduced ahead of the bracket.
When a closing bracket appears at the end of a line, half a character space can be reduced behind the bracket.
In Simplified Chinese, when a slight-pause comma, comma or period appears at the end of a line, half a character space can be reduced behind the bracket.
Hanging Punctuation at Line End
Most Chinese publications do not use hanging punctuation at line end. According to the Japanese Layout Requirements document, hanging punctuation at the line end is a kind of extension of the prohibition rules at line start. This rule helps to avoid moving characters or punctuation marks between lines and avoids inconsistency of space between the characters in different lines.
In general, the punctuation marks that can be hung at the line end include slight-pause comma, comma and period. In Simplified Chinese, the rest of the pause or stop punctuation marks can be hung at line end since they are set at at the side of the marked characters or in front of the marked characters.
However, for Traditional Chinese, which sets the punctuation marks at the center of the square space for it, hanging the punctuation marks might make an abrupt affect on the composition. Therefore, Traditional Chinese does not apply hanging punctuation in horizontal writing mode but only in vertical writing mode.
In the case of a succession of punctuation marks, punctuation hanging should not be applied.
Composition of Chinese and Western Mixed Texts
Composition of Chinese and Western Mixed Text
There are many examples in Chinese text where Western characters, such as Latin letters, Greek letters, or European numerals, are found alongside Han characters. The following are just a few examples:
One Western letter used as a symbol for something, such as 'A' or 'B'.
A Western word is used in a Chinese context, such as 'editor'.
Acronyms, such as 'DTP' or 'GDP'.
Book titles or authors in references to Western books that use the original spelling.
European numerals used to express years or other numbers, such as '1999年'.
Western numeric characters are also used in itemized lists and numbered headings, or as symbols for chemical elements or mathematical formulae. It can be seen from these examples that it is an everyday occurrence to find Western characters mixed with Han characters in Chinese composition.
Western numerals, sometimes called arabic, or arabic-indic numerals, are referred to as European numerals in the context of this document, unless notes indicate otherwise.
Formerly, fullwidth ASCII characters were often used, either to make the presentation look orderly, or simply due to the poorly developed computer technologies available for text layout. Nowadays, typesetting engines allow for proportional or monospace fonts, as required, rather than forcing the user to resort to the old fullwidth blocks of Latin letters and European numerals.
When Western texts are mixed with Han characters, Chinese style punctuation and its common usage should be used in principle since the main text is Chinese, However, in the case of technical documents, if plenty of formula are contained in the text, the full stop can be unified with the western-style period, U+002E FULL STOP [.]. Also in text books on grammar of Western languages etc., which contain plenty of example sentences mixed with Chinese, western-style periods can be used.
Mixed Text Composition in Horizontal Writing Mode
In horizontal writing mode, the basic approach uses proportional fonts to represent Western text and uses proportional or monospace fonts for European numerals. In principle, there is tracking or spacing between an adjacent Han character and Western character of up to one quarter of a Han character width, except at the line head or end.
Another approach is to use a Western word space (U+0020 SPACE), in which case the width depends on the font in use.
Mixed Text Composition in Vertical Writing Mode
For vertical writing mode, the following list describes methods of setting Western letters and European numerals:
Setting Western letters with Han character-width monospace fonts. Letters or European numerals follow each other, one at a time, in the same direction and rotation as the Han characters. This arrangement is usually adopted where the text contains a single letter or digit, or an acronym.
Setting Western letters with proportional fonts, rotated 90 degrees clockwise. This approach is usually adopted where Latin letters compose a word or sentence. There is tracking or spacing between a Han character and an adjacent Western letter or European numeral, up to a width of one quarter of a Han character, except at the line head or end.
Setting European numerals with proportional fonts in horizontal-in-vertical orientation. This style is usually adopted when dealing with a two to three digit number whose width is equal to the default line advance or slightly wider (within an acceptable range).
Han numerals are usually used in vertical writing mode, however in recent years it is becoming more common to see fullwidth European numerals and proportional numerals set as horizontal-in-vertical.
Handling Western Text in Chinese Text Using Proportional Western Fonts
The following provides composition rules for handling Western characters and European numerals in horizontal writing mode or in situations in vertical writing mode where the Western words/phrases or European numerals are rotated 90 degrees clockwise:
A sequence of Western characters in a Western word should not break across a line-break, except where hyphenation is allowed.
Tracking or spacing between a Han character and a Western letter or numeral is up to a quarter of the width of a Han character.
Justified text alignment is an important feature of Chinese composition. It is harder to align text as expected when a line contains Western characters. Typically, spacing or tracking is applied equally across the line, but such adjustments are only applied between Han characters or between Han and Western letters. The spacing is not equally distributed between characters in Western words and/or European numerals.
Exceptions are made in the following cases:
Tracking or spacing of Western letters or European numerals before the line head or after the line end are not justified.
Tracking or spacing of Western letters or European numerals is not adjusted before or after Chinese commas or full stops, nor after Chinese opening and before Chinese closing brackets.
Handling of Grid Alignment in Chinese and Western Mixed Text Composition
Due to the fact that each Han character is of the same width, not only should characters at the start and end of a line be aligned but it is also a requirement for characters within blocks of Han text to be aligned both vertically and horizontally, whether in vertical or horizontal writing mode. When Western text or European numerals intervene, this principle is harder to achieve. Possible approaches are listed below:
Instead of a quarter Han-width tracking between Han and Western letters, it is possible to use flexible spacing of up to half a Han character width. This brings the space occupied by Western characters to a multiple of the width of a Han character. In this way, both the Han character before and after the Western language span snaps to the grid lines.
When a Western word appears at the line end and needs to be broken, rather than breaking the word at a syllable boundary per the Western convention, the word may be forced to break at the line end, in order to ensure correct alignment.
When using grid alignment, it is recommended to deal with line end punctuation marks by hanging the first of them outside the type area as mentioned in section . In situations that involve consecutive punctuation marks, the second and following punctuation marks are allowed to appear at the line start.
Grid alignment is adopted more often in Traditional Chinese typesetting, whereas use in Simplified Chinese is rare.
Interlinear annotations
Usage of Interlinear Annotations
Chinese interlinear annotation, also known as ruby, is small, supplementary text attached to certain characters or words in the main text. Chinese interlinear annotation is usually set in the interlinear space and aligned to the corresponding base text which it annotates. In Chinese typesetting, Chinese interlinear annotation is mainly used to indicate pronunciation or meaning.
Indicating the Pronunciation for Chinese characters
In Chinese, interlinear annotation is most commonly used to indicate the pronunciation of Han characters. Presenting the pronunciation alongside the characters is a great help to beginners, especially to children who are native speakers, or to foreigners intending to study Chinese. Therefore, it is rare to annotate isolated Han characters. Instead, phonetic annotations tend to cover the full text. Also, it is not regular practice in Chinese layout to use interlinear annotation for pronunciation outside these educational contexts, even for the pronunciation of rarely used characters, although sometimes pronunciation is provided inline, possibly inside brackets.
There are two major annotation systems for indicating Chinese pronunciation: Zhuyin and Romanization.
Zhuyin.
Mandarin Phonetic Symbols (國語注音符號) or Taiwanese Dialect Phonetic Symbols (台灣方言音符號), hereinafter referred to as ‘Zhuyin’, are systems for phonetic annotation mainly used in Taiwan, although other areas may also include Zhuyin in certain dictionaries or textbooks. In most cases, Zhuyin appears on the right side of its corresponding base text. Exceptions are very rare.
Romanization.
Hanyu Pinyin (汉语拼音), now the official standard in Mainland China, uses the Latin alphabet to transcribe the Modern Standard Chinese (Mandarin) pronunciations of Chinese characters. The most common use case in Mainland China is to indicate the pronunciation for all characters of the full text with Hanyu Pinyin. In Taiwan and Hong Kong, the arrangement of the Taiwanese Romanization System for Minnan (台灣閩南語羅馬字), the Romanization System of the Hong Kong Education and Manpower Bureau (教育學院拼音方案) or romanization systems of other Chinese dialects are similar to those of Hanyu Pinyin.
Due to the characteristics of the Latin alphabet, such annotations appear in horizontal writing mode only. Texts for children who are native speakers usually provide reading assistance for each individual character, while texts for those who are learning Chinese as a second language mainly indicate pronunciation for whole words, but occasionally, both of them are set almost the same. There is space between the base text when whole words are annotated, and the interlinear annotation characters will have unique requirements such as sentence case, or punctuation marks corresponding to base characters. Early publications using Pinyin are very varied and lack consistency. Both character-based and word-based annotations are quite common. No further description of the early pinyin will be found in this document.
Indicating Meaning or Other Additional Information
Bilingual Annotations.
Bilingual annotations aim to provide a Chinese translation of text in foreign languages or acronyms, or to offer the original text for words that have been translated into Chinese. This is mainly used for proper nouns, titles or those terms whose concepts are difficult to convey after translation. It is commonly found in translated works, mainly in light novels.
Interlinear Comments.
Interlinear comments are ways to annotate the meaning of text fragments or a single word, and are so named for their interlinear positioning. They usually lie in the interlinear space and co-exist with the body text. Compared to other annotation methods, i.e. headnotes or footnotes, interlinear comments are more compact and stick better to the body. These kinds of comments are often found in ancient books, such as Rouge Inkstone, an early commentary of the novel Dream of the Red Chamber.
Overview of Interlinear Annotation Positioning
In vertical writing mode, Zhuyin, Romanization or bilingual annotations are usually placed on the right side of the base text (Han characters), while interlinear comments are often placed on the left side.
In horizontal writing mode, Zhuyin can be placed above the base text, but in most cases they are still set to the right side of the base text. On the other hand, Romanization and bilingual annotations can appear both above or below the base text, and the interlinear comments are usually placed at the bottom of the base text.
In principle, Zhuyin Phonetic Symbols are of the same size, and the number of Zhuyin symbols for one Han character is never more than three, which is quite easily manageable. Romanization, however, uses Latin letters whose sizes are proportional, their composed lengths are varied and there should be spaces between the words. Thus, these two kinds of phonetic annotations differ greatly in positioning.
Annotating with both Romanization and Zhuyin is a practical way to indicate the reading to readers who know only one of these systems, as well as helping study of or enquiries about the other one. Normally, when Romanization and Zhuyin are both provided, the Zhuyin are placed on the right side of the Han character while Romanization is set at the bottom of the Han character in horizontal writing mode and to the left side in vertical writing mode.
Positioning of Zhuyin Interlinear Annotations
Positioning of Zhuyin Symbols
According to the Manual of Mandarin Phonetic Symbols (國語注音符號手冊) released by the Ministry of Education in Taiwan, there are two standard ways of positioning Zhuyin: above the corresponding Han character (horizontal Zhuyin), or on the right side of the corresponding Han character (vertical Zhuyin). The use cases for putting Zhuyin above the base characters are rarely found in today's textbooks or other publications, and it is rarely used by the public at large. Therefore, it's always better practice to place Zhuyin annotations on the right side of their corresponding Han character, whether in horizontal or vertical writing mode.
Choice of Size and Ratio for Zhuyin Symbols
Considering a Han character a square with an aspect ratio of 30:30, the ratio of width to height of its Zhuyin annotation will be set as 15:30. The Zhuyin annotation should stay adjacent to its corresponding base character.
The aspect ratio of initials, medials and finals is 9:9, while that of Mandarin non-neutral tones and dialect non-checked tones is 5:5, that of Mandarin neutral tones is 9:2, and that of dialectal checked tones is 5:5. More details and figures can be found in Positioning of Different Composition for the Tones below.
When the font size of the body is relatively small, it's possible to provide a larger font size for the Zhuyin rather than using the default ratio listed above. Alternatively, other methods, such as bracketing Zhuyin inline, are acceptable.
Positioning of the Tones in Zhuyin Symbols
Mandarin non-neutral tones and dialectal non-checked tones, are placed by the upper right corner of the last phonetic symbol.
In Mandarin Chinese there are syllables that, when part of certain words or sentences, are intentionally read in a shorter and less-emphasized way, therefore losing their original tone. When any syllable is read in this way, we say that it has a neutral tone or “toneless.”
The Mandarin neutral tone comes on top before the phonetic symbols.
The dialectal checked tones are set alongside the lower right corner of the phonetic symbols.
Positioning of Different Composition for the Tones
When there is only one Zhuyin phonetic symbol, whether initial, medial or final, the tone mark symbol should be set alongside the lower right corner of the character. The dialectal checked tones should be positioned as described in above.
In the case of a combination of two phonetic symbols such as initial+medial, initial+final or medial+final, the tone mark symbols should be set alongside the lower right corner of the character. The dialectal checked tones should be positioned as described in above.
In the case of a combination of three phonetic symbols, initial+medial+final, if the syllable should be read in a neutral tone the tone mark symbols should be positioned alongside the lower right corner of the character while the neutral tone should be positioned alongside the upper corner of the character. For non-neutral tones, the tone mark symbols should be set alongside the lower right corner of the phonetic symbol. The dialectal checked tones should be positioned as described in above.
Line Prohibition Rules for Zhuyin
Like the line prohibition rules for punctuation, vertical Zhuyin annotations should stick to their base characters in horizontal writing mode. They must not appear in the line head, and must be placed on the right side of their corresponding Han character.
Positioning of Romanized Interlinear Annotations
Basic Requirements
Romanization is only available in horizontal writing mode. These phonetic annotations are usually placed on top of the base text. In general, phonetic annotations and their base text stick to each other regardless of space, and both of them are centered-aligned.
In special cases where Romanization is needed in vertical writing mode, the annotations are usually set to the right side of their corresponding base text, but it is difficult to read anyway.
If a Romanized annotation is longer than its base text and is at the line head or end, both the annotation and the base text can be aligned to the beginning of the line head or end.
The space between two adjacent annotations should not be smaller than the size of a normal Western-language space, which is about 1/4 em. Due to the limitation of the typesetting technologies, there is usually no space between the rather long phonetic annotations in many printed publications. Luckily, this is not likely to lead to ambiguity because each Han character contains one syllable and most Pinyin fragments are easy to tell apart. However, these annotations can be misleading sometimes. For example, character-based phonetic annotations may result in the false impression that they are word-based. Also, the accidentally concatenated annotations may disrupt word boundaries, which alters the semantic meanings of the words.
Annotations are allowed to extend to the top of adjacent base text as long as the minimum spacing is ensured.
As most target readers are beginners to Chinese, the body text is usually in larger sizes and in the Kai typeface.
Due to the fact that Latin letters are proportional (width unknown) and that the advance widths in different typefaces deviate greatly from one another, the relationship between the sizes of annotations and their base text is somewhat undetermined. Under the influence of the typesetting of Japanese furigana, however, annotations are usually of half size of the base text.
Annotations usually use a sans-serif typeface which is rather thin and plump. It is generally the opinion in publishing and in education that Hanyu Pinyin must use those typefaces in which ‘a’ and ‘g’ are single story and the second tone mark is thick on the lower part and thin on the upper, as in the handwritten style of the stroke. Actually there have never been any national standards specifying the typefaces and the glyphs for Hanyu Pinyin.
The General Association of Chinese Culture in Taiwan once wrote to the Ministry of Education in Mainland China about the rules for the glyphs of Hanyu Pinyin, and received the response that the glyphs of the letter ‘a’ and ‘g’ correspond to those of Latin. There is no requirement demanding the handwritten glyphs.
What follows is a detailed description of the difference between two typical use cases.
Characters as the Basic Units for Annotating Pronunciation
The base text is a single Han character. Only Han characters are annotated: European numerals or punctuation marks are excluded.
The phonetic annotations are always on the top.
As the phonetic annotations are often wider than their base text, the tracking of the body text should be larger, to allow annotations to expand and to avoid irregular adjustments within the base text.
The phonetic annotations are all in lowercase. Sentence case is rare.
Words as the Basic Units for Annotating Pronunciation
The base text contains one or more Han character. Rules for separating terms can be found in GB/T 16159—2012 Basic Rules of Hanyu Pinyin Orthography.
Annotations sometimes appear below the Han characters.
Both the phonetic annotations and the base text are separated at word boundaries. The adjacent annotations are separated by a space approximately 1/2 em wide, while the tracking inside the base text is usually normal.
Many word-based annotations indicate the logic of the whole sentence, rather than merely the pronunciation: these phonetic annotations have sentence case, as well as punctuation marks which follow the previous annotations.
Atypical Cases for Han character Phonetic Annotations
Erhuayin
Erhuayin, also known as rhotacization of syllable finals, is a special phonetic phenomenon in Modern Standard Chinese (Mandarin). Due to the limitations of annotating single Han character, the Zhuyin annotations fail to indicate the continuity of Erhuayin and the change of the final sound, while Romanization shows the features of Erhuayin effectively.
Ligatures
Ligatures are special for their multisyllabic nature, thus its interlinear annotation may be typeset incorrectly. The pronunciation of ligatures should be bracketed inline or given in notes instead. Ligatures are rare in the modern Chinese writing system.
Positioning of Bilingual Annotations
Typesetting of bilingual annotations is actually quite similar to that of Romanization. Annotations are usually placed to the right of the base text in vertical writing mode, or above the base text in horizontal writing mode.
Word Alignment
In order to maintain the integrity of annotations, when the lengths of annotations and their base text are different it is necessary to adjust the alignment between them to avoid misunderstandings.
When the length of an annotation is shorter than that of its base text, the annotation can be center-aligned (in the case of Western script) or use larger tracking (in the case of Han characters). There are two methods to satisfy the latter, one is to equally distribute the spacing while the other is to align justified.
When the length of an annotation is longer than that of its base text, the base text can be center-aligned (in the case of Western script) or use a larger tracking (in the case of Han characters).
Positioning of Interlinear Comments
Interlinear comments can have very varied layouts and lengths. They are usually placed at the foot side of the annotated text — to the left side of the base text in vertical writing mode or below the base text in horizontal writing mode. Sometimes the interlinear comments are in other colors to help the reader tell the difference from the body text .
Interlinear comments are also used to explain the context and details of a longer text fragment. In such cases, due to the ambiguity of the base text, the annotation can find a suitable place as an anchor and flow down. There's no strict requirement for its length, and sometimes it can be longer than one line.
Other Parts of This Draft This section appears to have been removed from the Chinese version. Should we remove it from here?
Romanization
The other Hanyu Dialect in Taiwan, Min Nan usually follows the official Taiwanese Romanization System, while Hakka follows an official standard named the Taiwanese Hakka Phonetic Transcription System and Popular Hakka Phonetic Transcription.
Hakka has varied tones, even a single character may correspond to different Hakka tones (xi ien/ian kiongˊ, hoi├ liukˋ, @@). interlinear annotation is used less in Hakka because it fails to fully indicate Hakka's pronunciation.
Qu Notation
Qu Notation contains many traditional methods of representing musical notes and lyrics in ancient China, e.g. Gongche notation(工尺譜). In modern printing, similar methods such as numbered musical notation can also be found.
Paragraph Adjustment Rules
Line Head Indent at the Beginning of Paragraphs
A paragraph, a section of a document which consists of one or more sentences to indicate a distinct idea, usually begins on a new line. For the related line head indent at the beginning of paragraphs, the following methods are available.
For Chinese publications, line head indent at the beginning of a paragraph usually uses two character-width spaces. Publications like magazines, with multi-column content and less text in each column, might apply a single character-width line head indent at the beginning of paragraph as well.
Line head indent at the beginning of a paragraph is applied to all paragraphs. Nearly all books and magazines make use of this method.
Line head indent does not apply to the first paragraph but to the rest of the paragraphs. This method is mostly seen in Western publications.
Line head indent at the beginning of a paragraph is not applied for any paragraph at all. A certain amount of space is inserted between the paragraphs so as to indicate the distinction of different paragraphs. In some books and magazines this method is applied.
In principle, some unfinished paragraphs should be broken rather than apply the line head indent for the following paragraph. Dialogs, quotations or subtitles that the editor inserted might appear before the unfinished paragraph.
The paragraph indent is the indentation of the line head by a fixed amount, starting from the line head side of the type area (in the case of one column) or of the column area (in the case of several columns).This method is usually applied for quotations, poetry or subtitles in a paragraph or between the paragraphs.
Generally speaking, the characters in the paragraphs which apply paragraph indent should be the same as the characters in the body content. Sometimes, due to the different typefaces, the size of characters in the paragraphs which apply paragraph indent differ from the characters in the body content. In this case, a certain amount of space might be added before and after the indent paragraph so as to make a clear distinction from other paragraphs. The space added is usually an integer times the height of paragraphs in the body content
Line Alignment Processing
Line alignment method is a process for setting the alignment of each line of text so that the actual position of the text can be matched with their preset position. The following methods are available.
Centering
The space between adjacent characters is, in principle, set solid. Also, if there is an explicit instruction to insert spaces, such spaces are inserted. If there is not solid setting but a fixed space between characters, this is used; the amount of space at the line head and line end is made equal, and the center of the character sequence is unified with the center of the line.
Line head alignment
The space between adjacent characters is, in principle, set solid. Also, if there is an explicit instruction to insert spaces, such spaces are inserted. If there is not solid setting but a fixed space between characters, this is used; the start of the character sequence is unified with the line head, and if the line is not full, the line end is kept empty.
Line end alignment
The space between adjacent characters is, in principle, set solid. Also, if there is an explicit instruction to insert spaces, such spaces are inserted. If there is not solid setting but a fixed space between characters, this is used; the end of the character sequence is unified with the line end, and if the line is not full, the line head is kept empty.
Even inter-character spacing
The space between adjacent characters is, in principle, set solid. Also, if there is an explicit instruction to insert spaces, such spaces are inserted. In addition, using the space made available during line adjustment processing, equal character spacing is applied where possible. The start of the character sequence is aligned to the position of the line head, and the end of the character sequence to the position of the line end.
Two use cases for even inter-character spacing:
A frequently seen case of even inter-character spacing is that, after applying the punctuation prohibition rules to each line, some lines will have more than one character space left, so in order to align the beginning and ending with the rest of the lines, this line should apply even inter-character spacing.
Even inter-character spacing is often used for listing names of people or objects. The last line of a paragraph or a paragraph with only one line can have even inter-character spacing applied as well.
Handling of Widows and Orphans
In the tradition of Chinese composition, an orphan does not make a line, nor does a widow make a page. The principles are as described below.
If there is only one character or one character with a punctuation mark left in the last line of a paragraph, this character is called an orphan. An orphan can be processed using the following methods, so that more than two characters can be positioned in the last line of a paragraph.
Similar to the handling for the prohibition rule that a punctuation marks should not appear at the line start, the last character of the previous line can be moved the next line, and the previous line should apply even inter-character spacing.
Delete some character(s) of the previous paragraph so that there will be enough space to move the orphan to the previous line.
Adding more characters to the last line.
The definition of orphan in Chinese typesetting has some similarity with the definition of orphan in Latin typesetting. See more content in section 5.2 of Requirements for Latin Text Layout and Pagination.
If the first line of a paragraph in a page is the last line of a paragraph from the previous page, it is called a widow. A widow can be handled via the following methods.
Move the widow line to the previous page and it can go beyond the tape area.
Move the last line of the previous page to join the widow.
Delete some characters of the paragraph so that there will be enough space for the widow to be moved to the previous page.
Add more characters to the widow so that there will be more than two lines on the page.
The definition of widow in Chinese typesetting has some similarity with the definition of widow in Latin typesetting. See more content at session 5.1 of Requirements for Latin Text Layout and Pagination.
Orphans and widows might differ in the cases below:
There is only one line in a page and the line consists of one character and a punctuation mark, which makes the text an orphan and widow at the same time.
There are multiple paragraphs in a page but the first line is a widow and the first line consists of one character and a punctuation mark, which makes the orphan and widow appear together.
There is only one line in a page, which makes the line a widow.
There are multiple paragraphs in a page but the first line is a widow.
There are multiple paragraphs in a page but one of the lines consists of one character only, which make the only character an orphan.
There are different viewpoints about how the orphans and widows should be handled in the cases above due to differences between publishers. Case (a) and case (b) have a bigger affect on the typesetting while case (c) affects it less. Cases (d) and (e) are seem more rarely.
Positioning of Headings, Notes, Illustrations, Tables and Expressions
Headings & Page Breaks
Types of Headings
In terms of text composition, there are three types of headings.
Whole-page headings
Block headings
Run-in headings
Due to the composition requirements, magazines usually handle headings in a variety of ways, while most books have their headings set up in a simpler way. Methods for handling of headings for magazines will not be described in this document.
Whole page headings are used when there is a need to separate sections in a book, usually on a separate page with the following page left blank. Sometimes subheadings, selected sentences, names of the authors or selected paragraphs will also appear with the heading. The back side of the whole page heading page is not necessarily always blank, for example,consider the Han-tobira in Japanese books, whose following even page is not blank, and is used for the main text.
A block heading is the heading occupying a whole, independent line. The main text is set from the very next line. Top level headings and medium level headings are of this type.
Headings are subtitles, which separate and indicate sub-parts with one coherent set of content. Headings are usually classified into several levels such as top level heading, medium level heading and low level heading.
The sequence of the headings on a page should be the name of the book, section heading, top level heading, medium level heading and then low level heading.
In multi-column format, block headings sometimes span multiple columns. This style is called "cross column heading".
In multi-column format, block headings sometimes span multiple columns. These are called cross-column headings.
A run-in heading is a heading immediately followed by main text without a line break, and is usually used as a low level heading. Note that a low level heading can also appear as a block heading.
Font Selection and Heading Font Size
標題主要是為了呈現階層結構,所以需要以特別的表現體裁來顯示其階層。標題的表現體裁包含以下幾種方式:
Character size for the heading: The character size of headings should be selected as appropriate in accordance with the level of headings. For example, when the character size of main text is 9 point, the small-headings are usually set with 10 points, medium-headings are usually set with 12 points and large-headings are usually set with 14 points. The character size of headings is usually larger than main text, and the character size of higher level headings are larger than the size of smaller size headings.
也有採用將本文文字尺寸依照比例放大的方式,使用這方式時,以10%–20%上下,依階層放大為佳。
Type faces for headings: Both Hei or bold Song are usually used. Other type face designs like Yuan and Kai are sometimes used as well.
Alignment of headings (inline direction): In the case of horizontal writing mode, large-headings and medium-headings are in most cases centre-aligned. In the case of vertical writing mode, headings are usually aligned to the line head with some indent.
The number of characters of line head indent for a heading depends on the heading level. If the heading level is higher, the indent character number is less, if the heading level is lower, the number of indent characters is more. The character size is based on the main text of the type area. The differences of character numbers are usually around two characters.
Whether to decorate with solid lines, or give a symbol on the top of the heading.
How to Handle Headings with New Recto and Page Break
A large heading sometimes starts with a new page following a page break, to clarify the separation between sections, in which case the process below should be followed:
Always begin with odd pages, i.e. new recto.
Books usually begin with page 1. Accordingly, vertical writing mode and books bound on the right-hand side begin with a left page, horizontal writing mode and books bound on the left-hand side begin with a right page after a new recto.
Always begin with new pages, regardless of even pages or odd pages, i.e. page breaking. Used for large-heading.
When medium-headings or small-headings appear at the last line of a page and there is no space left for the following paragraphs, the medium-headings or small-headings should be moved to the next page so as to make a proper composition.
Handling of Spaces just before the New Recto, Page Breaks and New Edges
Spaces just before new rectos, page breaks and new columns are treated as follows (the last pages are treated as the same):
In the case of single column typesetting, the spaces just before the new rectos and page breaks are left as they are.
In the case of multiple columns, the remaining space of preceding columns is left as it is.
In the case of vertical writing mode, columns are filled with text lines from upper right to lower left. There is no need to align line numbers of the upper column and lower column, and remaining spaces are left as they are.
In horizontal writing mode and multi-column format, the number of lines for each column is set to be the same, but where the result of the total number of lines divided by the column number chosen for the type area results in an odd number, the last column may have a smaller number of lines and may be followed by blank space.
Processing of Run-in Headings
Run-in headings are usually used for low level headings. The following are some examples of run-in headings. Inter-character space between the run-in heading and following main text is usually a one character space of the base character size decided for the type area. Note that the run-in heading may be set at the last line of the page, or of the column in multi column style.
The run-in heading is set with the same character size as the main text and with Hei or Song.
Set the run-in heading with one level smaller character size than the main text and use Hei or Song.
The space that the run-in headings take is not an integer times larger than the characters in the body content, and the space between the run-in headings and body content can be adjusted so as to align the body content as well as the line start and line end.
Set the run-in heading with the same character size and type-face as the main text. Note that heading numbers or Western characters are added in front of the heading.
Punctuation marks in Chinese
Name in Chinese
Character
Unicode
Name
Whether rotated 90° clockwise in vertical writing mode
Prohibited at line start
Prohibited at line end
Ubreakable
Notes
句號
。
3002
IDEOGRAPHIC FULL STOP
N
Y
N
.
FF0E
FULLWIDTH FULL STOP
Y
N
逗號
,
FF0C
FULLWIDTH COMMA
N
Y
N
頓號
、
3001
IDEOGRAPHIC COMMA
N
Y
N
冒號
:
FF1A
FULLWIDTH COLON
N
Y
N
分號
;
FF1B
FULLWIDTH SEMICOLON
N
Y
N
驚嘆號
!
FF01
FULLWIDTH EXCLAMATION MARK
N
Y*1
N
三驚嘆號疊用時應佔二個漢字大小
‼
203C
N
Y
N
佔一個漢字大小
問號
?
FF1F
FULLWIDTH QUESTION MARK
N
Y*1
N
三問號疊用時應佔二個漢字大小
⁇
2047
DOUBLE QUESTION MARK
N
Y*1
N
佔一個漢字大小
引號
「
300C
LEFT CORNER BRACKET
Y
N
Y
主要用於繁體
」
300D
RIGHT CORNER BRACKET
Y
Y
N
『
300E
LEFT WHITE CORNER BRACKET
Y
N
Y
』
300F
RIGHT WHITE CORNER BRACKET
Y
Y
N
“
201C
LEFT DOUBLE QUOTATION MARK
Y*2
N
Y
佔一個漢字大小,主要用於簡體
”
201D
RIGHT DOUBLE QUOTATION MARK
Y*2
Y
N
‘
2018
LEFT SINGLE QUOTATION MARK
Y*2
N
Y
’
2019
RIGHT SINGLE QUOTATION MARK
Y*2
Y
N
括號
(
FF08
FULLWIDTH LEFT PARENTHESIS
Y
N
Y
)
FF09
FULLWIDTH RIGHT PARENTHESIS
Y
Y
N
破折號
——
2014
EM DASH
Y
N
N
Y*3
佔二個漢字大小,呈一直線、中間不斷開
書名號
《
300B
LEFT DOUBLE ANGLE BRACKET
Y
N
Y
》
300B
RIGHT DOUBLE ANGLE BRACKET
Y
Y
N
〈
3009
LEFT ANGLE BRACKET
Y
N
Y
〉
3009
RIGHT ANGLE BRACKET
Y
Y
N
刪節號/省略號
……
2026
HORIZONTAL ELLIPSIS
Y
N
N
Y*3
佔二個漢字大小,應將省略點置於字面中央
連接號
~
FF5E
FULLWIDTH TILDE
Y
Y*1
N
-
002D
HYPHEN-MINUS
Y
Y*1
N
–
2013
EN DASH
Y
Y*1
N
—
2014
EM DASH
Y
Y*1
N
間隔號
·
00B7
MIDDLE DOT
N
Y*1
N
佔一個漢字大小,可視情況改用半個漢字大小
‧
2027
HYPHENATION POINT*4
N
Y*1
N
分隔號
/
002F
SOLIDUS
Y
Y
主要用於簡體
/
FF0F
FULLWIDTH SOLIDUS
N
Y
N
主要用於繁體
括號類
【
3010
LEFT BLACK LENTICULAR BRACKET
Y
N
Y
】
3011
RIGHT BLACK LENTICULAR BRACKET
Y
Y
N
〖
3016
LEFT WHITE LENTICULAR BRACKET
Y
N
Y
〗
3017
RIGHT WHITE LENTICULAR BRACKET
Y
Y
N
〔
3014
LEFT TORTOISE SHELL BRACKET
Y
N
Y
〕
3015
RIGHT TORTOISE SHELL BRACKET
Y
Y
N
[
FF3B
FULLWIDTH LEFT SQUARE BRACKET
Y
N
Y
]
FF3D
FULLWIDTH RIGHT SQUARE BRACKET
Y
Y
N
{
FF5B
FULLWIDTH LEFT CURLY BRACKET
Y
N
Y
}
FF5D
FULLWIDTH RIGHT CURLY BRACKET
Y
Y
N
依照出現的狀況不同,也有允許出現於行首的案例,以繁體中文出版品為主。
文字直排時通常改用角引號[『』「」]。
符號不可斷開、分行,唯無法避免時,可視狀況拆至二行。
來自大五碼。
Glossary
詞彙
漢語拼音
英語
阿拉伯數字
Ālābó shùzì
European numerals
版心
bǎnxīn
type area
半形/半角
bànxíng/bànjiǎo
half-width
標點符號
biāodiǎn fúhào
punctuation marks
標點符號擠壓
biāodiǎn fúhào jǐyā
compression rules for consecutive punctuation marks
標號
biāohào
indication punctuation marks
標音
biāoyīn
phonetic annotation
比例字體
bǐlì zìti
proportional fonts
出血
chūxiě
bleed
大五碼
Dàwǔmǎ
Big 5
地/地腳
dì/dìjiǎo
foot/bottom margin
點
diǎn
point (pt)
點號
diǎnhào
pause or stop punctuation marks
底端
dǐduān
foot side
頂端
dǐngduān
head side
逗號
dòuhào
comma
段落
duànlùo
paragraph
對齊方式
duìqí fāngshì
alignment
頓號
dùnhào
slight-pause comma
兒化音
érhuàyīn
rhotacization of syllable finals
仿宋體
fǎngsòngtǐ
Imitation Song (Fangsong)
繁體中文
fántǐ Zhōngwén
Traditional Chinese
分詞連寫
fēncí liánxiě
words as the base units
分隔號
fēngéhào
solidus
分號
fēnhào
semicolon
分字連寫
fēnzì liánxiě
characters as the base units
符號分離禁則
fúhào fēnlí jìnzé
prohibition rules for unbreakable characters
括號
guāhào
parenthesis
括注符號
guāzhù fúhào
brackets
孤行
gūháng
widows
國標碼
guóbiāomǎ
GB 18030-2005
行
háng
line
行高
hánggāo
line-height
行間批語
hángjiān pīyǔ
interlinear comments
行間注
hángjiānzhù
interlinear annotations
行首行尾禁則
hángshǒu hángwěi jìnzé
prohibition rules for line start/end
行尾點號懸掛
hángwěi diǎnhào xuánguà
hanging punctuation marks for line end
漢語拼音
Hànyǔ pīnyīn
Hanyu Pinyin
漢字
Hànzì
Chinese characters (Han characters/Hanzi)
黑體
hēitǐ
Heiti
橫排
héngpái
horizontal writing mode
合文
héwén
ligature
活字排版
huózì páibǎn
letterpress printing
間隔號
jiàngéhào
interpunct
簡體中文
jiǎntǐ Zhōngwén
Simplified Chiense
結束括注符號
jiéshù guāzhù fúhào
closing bracket(s)
介音
jièyīn
medial
驚嘆號
jīngtànhào
exclamation marks
基文
jīwén
base text
句號
jùhào
period
句末點號
jùmò diǎnhào
post-sentence pause punctuation marks
句中點號
jùzhōng diǎnhào
inter-sentence pause punctuation marks
開始括注符號
kāishǐ guāzhù fúhào
opening bracket(s)
楷體
kǎitǐ
Kai
拉丁字母
lādīng zìmǔ
Latin letters
欄
lán
column
欄間距
lán jiānjù
column gap
連接號
liánjiēhào
dash
羅馬拼音
luómā pīnyīn
Romanization
冒號
màohào
colon
密排
mìpái
seamless arrangement
末端
mòduān
end point
破折號
pòzhéhào
long dash
輕聲
qīngshēng
neutral tone
齊頭尾對齊
qítóuwěi duìqí
justified alignment
全形/全角
quánxíng/quánjiao
full-width
入聲
rùshēng
checked tone
刪節號/省略號
shānjiéhào/shěnglüèhào
ellipsis
聲調
shēngdiào
tone
聲母
shēngmǔ
initial
始端
shǐduān
starting point
書名號
shūmínghào
title mark
疏排
shūpái
distributed arrangement
宋體(明體/明朝體)
Sòngtǐ (Míngtǐ/Míngcháotǐ)
Song
天/天頭
tiān/tiāntóu
head/top margin
跳格
tiàogé
tab
彎引號
wān-yǐnhào
curve quotation mark
萬國碼
wànguómǎ
Unicode
文本
wénběn
text
問號
wènhào
question mark
文字設計/字體排印
wénzìshèjì/zìtǐpáiyìn
typography
現代標準漢語(普通話、國語、華語)
xiàndài biāozhǔn Hànyǔ
Modern Standard Chinese (Mandarin)
希臘字母
xīlà zìmǔ
Greek letters
西文
xīwén
Western languages
西文字母
xīwén zìmǔ
Western alphabet
以連字符斷行
yǐ liánzìfú duànháng
hyphenation
引號
yǐnhào
quotation
意音文字
yìyīn wénzì
ideographs
韻母
yùnmŭ
final
直角引號
zhíjiǎo yǐnhào
corner quotation mark
直排(豎排)
zhípái (shùpái)
vertical writing mode
中外文對照
Zhōng-wàiwén duìzhào
bilingual annotations
中、西文混排處理
Zhōng-xīwén hùnpái chùlǐ
Chinese and Western mixed text composition
中日韓越意音文字
Zhōng-Rì-Hán-Yuè yìyīn wénzì
CJKV Ideographs
中文
Zhōngwén
Chinese
專名號
zhuānmínghào
proper name mark
著重號
zhuózhònghào
emphasis dot
注文
zhùwén
annotation text
注音符號
zhùyīn fúhào
Zhuyin (Bopomofo)
字幅
zìfú
character advance
字距
zìjù
tracking
字面
zìmiàn
character face
字體
zìtǐ
typeface
字型
zìxíng
font
字形
zìxíng
glyph
縱橫對齊
zònghéng duìqí
grid alignment
縱中橫排
zòngzhōnghéngpái
horizonal-in-vertical setting
References
《重訂標點符號手冊》(2008年修訂版)The Revised Handbook of Punctuation (2008 edition)
《标点符号用法》(GB/T 15834―2011)General Rules for Punctuation (GB/T 15834―2011)
《出版物上数字用法》(GB/T 15835—2011)
《國語注音符號手冊》The Handbook of Mandarin Phonetic Symbols
《汉语拼音正词法基本规则》(GB/T 16159—2012)
Acknowledgements
Special thanks to the following people who contributed to this document (Contributors' names listed in in alphabetic order).