W3C home > Mailing lists > Public > public-i18n-geo@w3.org > May 2003

Re: controls for GEO discusssion

From: Tex Texin <tex@i18nguy.com>
Date: Sat, 31 May 2003 03:13:28 -0400
Message-ID: <3ED85618.4747FCCB@I18nGuy.com>
To: Martin Duerst <duerst@w3.org>
CC: public-i18n-geo@w3.org

Hi, Good comments, although I have some disagreements, I liked your analysis
and think the points worth discussing.

0) Question wording. Yes, we agreed to change the wording to something close
to what you suggested.

1) Length. I think people may have a general idea of what controls are but may
not know the specifics, and especially the specifics of the ranges and the
ranges in Unicode. We could break the piece into multiple questions, but I
wonder about how appropriate these backgrounders are for an i18n qa list...
Especially this early on. We could move more of the background explanation
below the question and answer. I think you are right the question and answer
should be succint, and at the top of the page but I don't see a problem with
additional clarifying and supporting information being available on the page,
after the main point is discussed.

If there is a strong objection to the background info, I would be happy to
move it to a page on my web site, GEO can have the short version and GEO can
optionally link to my page for more info.

2) Relevance- I understand your questioning the topic, I would have done the
same. It came about because in fact I was asked the question last week.
Controls are not only used for manipulating devices. They have other uses.
An application development environment I am familiar with does a lot of
value-list processing. Depending on the nature of the data, the list separator
is changed. e.g. if it's a list of european decimals they would not want
commas as a separator. To avoid conflicts between the list values and
separators, in general routines, they use 0x01, 0x02, etc. as separators. So
they have lots of data in databases using these values.
(Yes, they could have instead adopted escape mechanisms instead.)

They ran into problems writing the data to xml. Some software liked it, others
didn't. When they looked into the errors due to control codes not being
allowed they needed advice. Which are the disallowed characters, and what are
the workarounds? Hence the article.
I believe there may be a lot of data using controls, and as with this group,
people may not have time to develop better solutions other than writing the
data out as NCRs.

3) So because of 2, I claim if XML is for data interchange, support for
interchange of controls is needed. I can agree the needs are exotic.
You can argue that the data should instead be cleaned up, but that is
impractical in some cases.
In any event, it is worthwhile to let people know what is and is not doable in
*ML.
I don't mind giving more emphasis to cleaning up the data.
I also don't mind emphasizing that control codes are to be avoided, and are
bad for scalability and on the web.
I would disagree with saying never use controls in XML.
I would presume the reason support for controls as NCRs was added, is because
some needs were identified for supporting controls.


4) separate rows for NL. I agree.

5) encoding. I believe what we said, is that if the data is in fact binary,
encoding is an option.
Essentially, if it is binary, it is not an i18n issue.

Richard, if you want to finish the changes you were going to make, you can
address Martin's comments or pass it back to me and I'll address them.
tex


Martin Duerst wrote:
> 
> Hello Tex,
> 
> Some more comments on your Q&A.
> 
> Overall, I think that the answer is much too long. It not only
> answers 'How do ... support control codes', but also 'what are
> control codes', and so on. But the question assumes a basic
> knowledge of control codes. Peolpe who don't know these
> are not even interested in reading the answer.
> 
> Also, I guess the real question is not how HTML and XML
> support control codes, but "How can I represent control
> codes in HTML or XML".
> 
> The basic message also should be improved. (X)HTML is a
> textual format used to represent text. There is absolutely
> no need to use control codes in (X)HTML. If anybody thinks
> otherwise, they didn't understand (X)HTML. I don't remember
> having been asked about control codes in (X)HTML at all.
> This should be clearly reflected in the answer.
> 
> XML in general is used both for text and for data. So
> there may be some interesting use cases for control
> codes in XML. The typical example would be an XML
> format for control code sequences for terminals
> (i.e. an XML version of a unix termcap file).
> 
> Apart from such rather exotic examples, the main reason
> that there are control codes in data usually is one of
> the following (most probably in the following order):
> 
> - Pure garbage. The right thing is to clean up your data.
> 
> - Old ways of representing data (starting with using Backspace
>    to get accented versions of characters). The right thing
>    is to convert your data, i.e. by doing the correct transcoding
>    or by adding markup.
> 
> In the table, I suggest to have separate rows for
> CR/LF/TAB and for NEL (which is special in XML 1.1).
> 
> The page says: "An alternative is to encode the data. For example,
> encode the data as base64 or as hexadecimal values, to ensure only
> supported characters are used in the markup language text."
> 
> I'm very surprised to see this on an i18n-related page.
> What this will do is that it will throw out of the window
> any and all i18n features that XML has. So from an i18n
> viewpoint, we should not recommend it, we should indeed
> clearly recommend against it.
> 
> Hope this helps.
> 
> Regards,    Martin.
> 
> At 12:49 03/05/28 -0400, Tex Texin wrote:
> >I am not sure why, but the geo list isn't distributing (my?) mail since lst
> >night.
> >
> >Here is the controls page for q&a today.
> >I may be a little late to the meeting.
> >
> >http://www.i18nguy.com/test/controls.htm
> >
> >sorry, I don't have everyone's email. (maybe that's a good thing. ;-) )
> >
> >tex
> >
> >--
> >-------------------------------------------------------------
> >Tex Texin   cell: +1 781 789 1898   mailto:Tex@XenCraft.com
> >Xen Master                          http://www.i18nGuy.com
> >
> >XenCraft                            http://www.XenCraft.com
> >Making e-Business Work Around the World
> >-------------------------------------------------------------

-- 
-------------------------------------------------------------
Tex Texin   cell: +1 781 789 1898   mailto:Tex@XenCraft.com
Xen Master                          http://www.i18nGuy.com
                         
XenCraft		            http://www.XenCraft.com
Making e-Business Work Around the World
-------------------------------------------------------------
Received on Saturday, 31 May 2003 03:14:03 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 6 January 2015 20:27:59 UTC