W3C home > Mailing lists > Public > public-i18n-geo@w3.org > June 2003

Re: controls for GEO discusssion

From: Martin Duerst <duerst@w3.org>
Date: Sun, 01 Jun 2003 17:41:20 -0400
Message-ID: <1057787653.IAA22192@phantom.w3.org>
To: Tex Texin <tex@i18nguy.com>
Cc: public-i18n-geo@w3.org

Hello Tex,

At 03:13 03/05/31 -0400, Tex Texin wrote:

>Hi, Good comments, although I have some disagreements, I liked your analysis
>and think the points worth discussing.
>0) Question wording. Yes, we agreed to change the wording to something close
>to what you suggested.

Very good.

>1) Length. I think people may have a general idea of what controls are but may
>not know the specifics, and especially the specifics of the ranges and the
>ranges in Unicode.

Well, those who don't know what they are are probably not interested
in using them. And we should avoid giving the impression that they
are something important to know.

>We could break the piece into multiple questions, but I
>wonder about how appropriate these backgrounders are for an i18n qa list...
>Especially this early on. We could move more of the background explanation
>below the question and answer.

Yes, I think this is the best thing to do.

>I think you are right the question and answer
>should be succint, and at the top of the page but I don't see a problem with
>additional clarifying and supporting information being available on the page,
>after the main point is discussed.
>If there is a strong objection to the background info, I would be happy to
>move it to a page on my web site, GEO can have the short version and GEO can
>optionally link to my page for more info.
>2) Relevance- I understand your questioning the topic, I would have done the
>same. It came about because in fact I was asked the question last week.
>Controls are not only used for manipulating devices. They have other uses.
>An application development environment I am familiar with does a lot of
>value-list processing. Depending on the nature of the data, the list separator
>is changed. e.g. if it's a list of european decimals they would not want
>commas as a separator. To avoid conflicts between the list values and
>separators, in general routines, they use 0x01, 0x02, etc. as separators. So
>they have lots of data in databases using these values.
>(Yes, they could have instead adopted escape mechanisms instead.)

I think this is a good, practical example. Whatever they do in
their database is not really our problem.

>They ran into problems writing the data to xml. Some software liked it, others

The software that tolerated it was faulty.

>When they looked into the errors due to control codes not being
>allowed they needed advice. Which are the disallowed characters, and what are
>the workarounds? Hence the article.

Very good. For the example above, the best thing to do is
to use XML for the separators. E.g.

That's what XML is for.

>I believe there may be a lot of data using controls, and as with this group,
>people may not have time to develop better solutions other than writing the
>data out as NCRs.

If they want to use XML, they should at least try to use it the
right way. And we should help them understand what the right
way is. They can always decide to do something else on their own.

>3) So because of 2, I claim if XML is for data interchange, support for
>interchange of controls is needed.

Your example doesn't show that. XML has a perfect way of exchanging
structured data.

Also, in some way, the use case above looks like they just needed
*any* character. Maybe converting it to a PUA character would
be another solution (but I don't like that, either).

>I can agree the needs are exotic.
>You can argue that the data should instead be cleaned up, but that is
>impractical in some cases.
>In any event, it is worthwhile to let people know what is and is not doable in
>I don't mind giving more emphasis to cleaning up the data.

Yes, I think we should do that.

>I also don't mind emphasizing that control codes are to be avoided, and are
>bad for scalability and on the web.

Yes, very good.

>I would disagree with saying never use controls in XML.

I haven't said that.

>I would presume the reason support for controls as NCRs was added, is because
>some needs were identified for supporting controls.
>4) separate rows for NL. I agree.
>5) encoding. I believe what we said, is that if the data is in fact binary,
>encoding is an option.
>Essentially, if it is binary, it is not an i18n issue.

The paragraph that mentions base64 does not say anything about
the data being binary, and the problems for i18n if the data
is textual. If you have discussed that, and it's going to be
updated, that's good.

Some more points:

- For XML 1.1, clearly say that it is not yet a Recommendation.
- If possible, don't use notes. In most cases, they can be
   replaced with a direct link.
- In note 4, change "For example, eacute is the Character Entity Reference"
   to "For example, &eacute; is the Character Entity Reference"

Regards,   Martin.

>Richard, if you want to finish the changes you were going to make, you can
>address Martin's comments or pass it back to me and I'll address them.
>Martin Duerst wrote:
> >
> > Hello Tex,
> >
> > Some more comments on your Q&A.
> >
> > Overall, I think that the answer is much too long. It not only
> > answers 'How do ... support control codes', but also 'what are
> > control codes', and so on. But the question assumes a basic
> > knowledge of control codes. Peolpe who don't know these
> > are not even interested in reading the answer.
> >
> > Also, I guess the real question is not how HTML and XML
> > support control codes, but "How can I represent control
> > codes in HTML or XML".
> >
> > The basic message also should be improved. (X)HTML is a
> > textual format used to represent text. There is absolutely
> > no need to use control codes in (X)HTML. If anybody thinks
> > otherwise, they didn't understand (X)HTML. I don't remember
> > having been asked about control codes in (X)HTML at all.
> > This should be clearly reflected in the answer.
> >
> > XML in general is used both for text and for data. So
> > there may be some interesting use cases for control
> > codes in XML. The typical example would be an XML
> > format for control code sequences for terminals
> > (i.e. an XML version of a unix termcap file).
> >
> > Apart from such rather exotic examples, the main reason
> > that there are control codes in data usually is one of
> > the following (most probably in the following order):
> >
> > - Pure garbage. The right thing is to clean up your data.
> >
> > - Old ways of representing data (starting with using Backspace
> >    to get accented versions of characters). The right thing
> >    is to convert your data, i.e. by doing the correct transcoding
> >    or by adding markup.
> >
> > In the table, I suggest to have separate rows for
> > CR/LF/TAB and for NEL (which is special in XML 1.1).
> >
> > The page says: "An alternative is to encode the data. For example,
> > encode the data as base64 or as hexadecimal values, to ensure only
> > supported characters are used in the markup language text."
> >
> > I'm very surprised to see this on an i18n-related page.
> > What this will do is that it will throw out of the window
> > any and all i18n features that XML has. So from an i18n
> > viewpoint, we should not recommend it, we should indeed
> > clearly recommend against it.
> >
> > Hope this helps.
> >
> > Regards,    Martin.
> >
> > At 12:49 03/05/28 -0400, Tex Texin wrote:
> > >I am not sure why, but the geo list isn't distributing (my?) mail 
> since lst
> > >night.
> > >
> > >Here is the controls page for q&a today.
> > >I may be a little late to the meeting.
> > >
> > >http://www.i18nguy.com/test/controls.htm
> > >
> > >sorry, I don't have everyone's email. (maybe that's a good thing. ;-) )
> > >
> > >tex
> > >
> > >--
> > >-------------------------------------------------------------
> > >Tex Texin   cell: +1 781 789 1898   mailto:Tex@XenCraft.com
> > >Xen Master                          http://www.i18nGuy.com
> > >
> > >XenCraft                            http://www.XenCraft.com
> > >Making e-Business Work Around the World
> > >-------------------------------------------------------------
>Tex Texin   cell: +1 781 789 1898   mailto:Tex@XenCraft.com
>Xen Master                          http://www.i18nGuy.com
>XenCraft                            http://www.XenCraft.com
>Making e-Business Work Around the World
Received on Sunday, 1 June 2003 18:41:29 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 6 January 2015 20:28:00 UTC