Re: [LC Review] of WebCGM 2.0

I have done a little more careful thinking about the question of defaults 
character encoding ("character set")...

At 10:27 AM 7/10/2006 -0600, Lofton Henderson wrote:
[...]

>>Comment 3 (editorial): Why not Unicode as the default encoding?
>>In
>>http://www.w3.org/TR/2006/WD-webcgm20-20060623/WebCGM20-Concepts.html#webcgm_2_4
>>, (sec. 2.5.4), you describe isolatin1 as the default "character set".
>>We would propose to describe UTF-8 as the default character encoding,
>>and to use the term "character encoding" instead of "character set". See
>>also http://www.w3.org/TR/charmod/#C020 .
>
>PROPOSED REPLY (perhaps too wordy):
>Simple answer:  legacy.  WebCGM 1.0 (1999) uses the default of IsoLatin1, 
>as does the ISO CGM:1999 standard upon which the WebCGM 1.0 profile is 
>based.  (Ignoring some fine distinctions between graphical and 
>non-graphical text.)  Changing the default for WebCGM 2.0 would be pretty 
>disruptive, without apparent commensurate gain.  Particularly since it is 
>a simple matter for a metafile instance to reset its "character set" (more 
>about terminology below).
>
>In addition, there may be an issue about CGM:1999's "Rules for 
>Profiles".  At best, it is unclear whether a valid CGM profile can 
>redefine for profile-conforming metafile instances the defaults specified 
>in the base standard, CGM:1999.  If it doesn't violate the letter of the 
>rules in CGM:1999 clause 9 ("Profiles and conformance"), it appears to 
>violate the spirit.  Again, I would think that it would be a hardship on 
>implementations to have defaults that are profile sensitive and at 
>variance with the base standard.
>
>Aside... If the ISO CGM standard were being written today (instead of 
>descending from the venerable original Version 1 of ISO CGM:1987), Unicode 
>would certainly have been the chosen default.  Note that the same pertains 
>to CGM's terminology of "character set", instead of the correct 
>terminology, "character encoding".  We are aware that it is at variance 
>with CharMod.  However the incorrect terminology "character set" descends 
>from the original ISO CGM:1987, and indeed there are CGM element names 
>(CHARACTER SET LIST, CHARACTER SET INDEX, etc) that embed the incorrect 
>terminology.  If this is thought to be important, we could perhaps include 
>the correct terminology in an explanatory note?  And link occurrences of 
>"character set" to that note?
>
>QUESTION:  Is use of the correct terminology ("character encoding") 
>sufficiently important to change it throughout WebCGM 2.0 (except for 
>proper element names like CHARACTER SET LIST), at variance with ISO 
>CGM:1999 and WebCGM 1.0?  Or could an explanatory note suffice?

While I still think that the disruption to implementors of changing 7 years 
of WebCGM legacy and 20 years of ISO CGM legacy would have to be carefully 
weighed, and while it is arguably problematic according to ISO CGM's "Rules 
for profiles", it is worse...

It may be technically impossible to change the default, at least for 
non-graphical text.  Recall the mechanism from ISO CGM (1987, 1992, 1999) 
for establishing the so-called "character set" (character encoding) for 
non-graphical text.  The CGM-defined implicit default is 8-bit ISO 
Latin1.  To change that for the entire metafile, then the 1st 4 bytes of 
the string parameter of the first element of the metafile (BEGIN METAFILE) 
must contain the ISO2022 sequence to change it.  Then the rest of the 
string parameter, with is the 'metafile id', will be in the new character 
encoding ("set"), as will all subsequent non-graphical text.

In the absence of the 4-octet 2022 string, the character encoding is 
assumed to be 8-bit Latin1.  Clearly, it would not work for a profile to 
say, "in the absence of a 2022 string, the (implicit) default is 
...blah..." -- you don't know the profile until further down in the 
metafile, at the  METAFILE DESCRIPTION element.

Furthermore, WebCGM (1.0, 2.0) prohibits use of ISO 2022 switching, except 
at the very start of the metafile to establish the character encoding 
("set") for non-graphical text for the whole metafile.

-Lofton.

Received on Monday, 10 July 2006 18:15:12 UTC