[CSSWG] Public Review Issue #184 on UTS37 - Unicode Ideographic Variation Database

For the record, the following was sent on 25 July 2011 by the CSSWG to Unicode
as an official public review comment.

====== message below ======

This comment is being sent officially on behalf of the W3C CSS Working Group
with respect to Public Review Issue 184:
on the topic of the proposed updates to Unicode Technical Standard #37:

Overall, we are very happy with the direction the edits to UTS37 are taking.
However, we don't believe they go far enough. The draft states:

    # If there are sequences that correspond to the same glyphic subset, it
    # becomes a burden for implementers, which can make a collection less
    # likely to be implemented. As a result, in an effort to minimize the
    # number of sequences that correspond to the same glyphic subset,
    # registrants are strongly encouraged to share sequences where sequences
    # in a submission are similar to those in an existing collection. As part
    # of the registration process, the registrar will encourage the sharing
    # of sequences. The sharing of sequences across collections requires
    # mutual agreement of the registrants for the affected collections.

Having multiple representations for the exact same text is not just a burden for
implementations, but an obstacle to interoperability. Neither plain text nor fonts
can be reliably exchanged among systems if some of them implement one set of IVS
mappings for a particular glyph and others implement another. Such a closed-system
approach is counter to the goals of Unicode and breaks down with real negative
consequences for users on an open system such as the Web.

To mitigate this problem, we would like the draft to state that sequences *must* be
shared where the glyphic subsets are known to be identical. Specifically, if the
registrant cannot explain (in prose) how the new glyph being registered differs
from all existing variants in the database, it should not be possible to register
a new IVS. Note that we are not suggesting that any judgement be made as to the
significance of the differences, only that a difference can be objectively described.

Furthermore, this prose should be a required part of the variant's registration.
Requiring this explanation in the database will not only prevent duplicates but
also help font designers understand which variations among glyph outlines in
the database are significant and which are merely stylistic (due to the typeface
of the submitted representative glyph). Since many of the significant differences
are subtle, these differences can escape notice; and incidental differences can
be mistaken for significant ones. So only with explicit information can font
designers be expected to accurately and correctly represent the glyphic variations
intended by the registrants.

We also suggest that Unicode take responsibility for creating and maintaining a
mapping table for all existing codepoint representations of the same glyph.
Requiring each individual font vendor to come up with its own mapping table,
using its own interpretation of which glyphs should be identical, is a recipe
for non-interoperability. Such an equivalency table should be standardized, and
as such should be the responsibility of the Unicode Consortium to maintain.

Lastly we request that a single, canonical IVS registration be made available
for each glyphic subset represented in the CJK Compatibility Ideographs and
the appropriate mappings added to the duplicate-glyph mapping table. This will
allow migration from the normalization-sensitive compatibility ideographs to
the normalization-stable IVS solution and make the deprecation and eventual
obsolescence of the compatibility ideographs a practical reality.

Thank you for your consideration,

Elika J. Etemad
Invited Expert
W3C CSS Working Group

Received on Thursday, 28 July 2011 00:47:44 UTC