Re: controls for GEO discusssion

Hello Tex,

At 19:04 03/06/01 -0400, Tex Texin wrote:

>Martin,
>good comments and I generally agree.
>We should add an example using xml separators to make that option clear.
>
>You didn't say never use controls in XML, but you came close..
>
>"There is absolutely no need to use control codes in (X)HTML. If anybody
>thinks otherwise, they didn't understand (X)HTML."
>
>But I made the leap to never.

Yes. There is a difference between XML (which is for text and data)
and (X)HTML, which is for text only. In the later case, it's *never*.
In the former case, there may be some (rare!) exceptions.

Regards,    Martin.


>I think the one thing I disagree with (somewhat) is the links vs. notes. I
>think in some cases the notes are needed to clarify what is relevant in the
>links. In other cases, links alone are fine.
>
>
>Anyway, I think we can fix it to make everyone happy. Richard wanted to post
>it last Friday.
>Richard you can either post it with your final changes and then I will make
>further changes for Martin's comments and give it back to you for subsequent
>update, or you can turn it back to me and I will try to fix further.
>Or you can fix it altogether if you want.
>
>
>tex
>
>
>Martin Duerst wrote:
> >
> > Hello Tex,
> >
> > At 03:13 03/05/31 -0400, Tex Texin wrote:
> >
> > >Hi, Good comments, although I have some disagreements, I liked your 
> analysis
> > >and think the points worth discussing.
> > >
> > >0) Question wording. Yes, we agreed to change the wording to something 
> close
> > >to what you suggested.
> >
> > Very good.
> >
> > >1) Length. I think people may have a general idea of what controls are 
> but may
> > >not know the specifics, and especially the specifics of the ranges and the
> > >ranges in Unicode.
> >
> > Well, those who don't know what they are are probably not interested
> > in using them. And we should avoid giving the impression that they
> > are something important to know.
> >
> > >We could break the piece into multiple questions, but I
> > >wonder about how appropriate these backgrounders are for an i18n qa 
> list...
> > >Especially this early on. We could move more of the background explanation
> > >below the question and answer.
> >
> > Yes, I think this is the best thing to do.
> >
> > >I think you are right the question and answer
> > >should be succint, and at the top of the page but I don't see a 
> problem with
> > >additional clarifying and supporting information being available on 
> the page,
> > >after the main point is discussed.
> > >
> > >If there is a strong objection to the background info, I would be happy to
> > >move it to a page on my web site, GEO can have the short version and 
> GEO can
> > >optionally link to my page for more info.
> > >
> > >2) Relevance- I understand your questioning the topic, I would have 
> done the
> > >same. It came about because in fact I was asked the question last week.
> > >Controls are not only used for manipulating devices. They have other uses.
> > >An application development environment I am familiar with does a lot of
> > >value-list processing. Depending on the nature of the data, the list 
> separator
> > >is changed. e.g. if it's a list of european decimals they would not want
> > >commas as a separator. To avoid conflicts between the list values and
> > >separators, in general routines, they use 0x01, 0x02, etc. as 
> separators. So
> > >they have lots of data in databases using these values.
> > >(Yes, they could have instead adopted escape mechanisms instead.)
> >
> > I think this is a good, practical example. Whatever they do in
> > their database is not really our problem.
> >
> > >They ran into problems writing the data to xml. Some software liked 
> it, others
> > >didn't.
> >
> > The software that tolerated it was faulty.
> >
> > >When they looked into the errors due to control codes not being
> > >allowed they needed advice. Which are the disallowed characters, and 
> what are
> > >the workarounds? Hence the article.
> >
> > Very good. For the example above, the best thing to do is
> > to use XML for the separators. E.g.
> > value1<sep/>value2<sep/>value3...
> >
> > That's what XML is for.
> >
> > >I believe there may be a lot of data using controls, and as with this 
> group,
> > >people may not have time to develop better solutions other than 
> writing the
> > >data out as NCRs.
> >
> > If they want to use XML, they should at least try to use it the
> > right way. And we should help them understand what the right
> > way is. They can always decide to do something else on their own.
> >
> > >3) So because of 2, I claim if XML is for data interchange, support for
> > >interchange of controls is needed.
> >
> > Your example doesn't show that. XML has a perfect way of exchanging
> > structured data.
> >
> > Also, in some way, the use case above looks like they just needed
> > *any* character. Maybe converting it to a PUA character would
> > be another solution (but I don't like that, either).
> >
> > >I can agree the needs are exotic.
> > >You can argue that the data should instead be cleaned up, but that is
> > >impractical in some cases.
> > >In any event, it is worthwhile to let people know what is and is not 
> doable in
> > >*ML.
> > >I don't mind giving more emphasis to cleaning up the data.
> >
> > Yes, I think we should do that.
> >
> > >I also don't mind emphasizing that control codes are to be avoided, 
> and are
> > >bad for scalability and on the web.
> >
> > Yes, very good.
> >
> > >I would disagree with saying never use controls in XML.
> >
> > I haven't said that.
> >
> > >I would presume the reason support for controls as NCRs was added, is 
> because
> > >some needs were identified for supporting controls.
> > >
> > >
> > >4) separate rows for NL. I agree.
> > >
> > >5) encoding. I believe what we said, is that if the data is in fact 
> binary,
> > >encoding is an option.
> > >Essentially, if it is binary, it is not an i18n issue.
> >
> > The paragraph that mentions base64 does not say anything about
> > the data being binary, and the problems for i18n if the data
> > is textual. If you have discussed that, and it's going to be
> > updated, that's good.
> >
> > Some more points:
> >
> > - For XML 1.1, clearly say that it is not yet a Recommendation.
> > - If possible, don't use notes. In most cases, they can be
> >    replaced with a direct link.
> > - In note 4, change "For example, eacute is the Character Entity Reference"
> >    to "For example, &eacute; is the Character Entity Reference"
> >
> > Regards,   Martin.
> >
> > >Richard, if you want to finish the changes you were going to make, you can
> > >address Martin's comments or pass it back to me and I'll address them.
> > >tex
> > >
> > >
> > >Martin Duerst wrote:
> > > >
> > > > Hello Tex,
> > > >
> > > > Some more comments on your Q&A.
> > > >
> > > > Overall, I think that the answer is much too long. It not only
> > > > answers 'How do ... support control codes', but also 'what are
> > > > control codes', and so on. But the question assumes a basic
> > > > knowledge of control codes. Peolpe who don't know these
> > > > are not even interested in reading the answer.
> > > >
> > > > Also, I guess the real question is not how HTML and XML
> > > > support control codes, but "How can I represent control
> > > > codes in HTML or XML".
> > > >
> > > > The basic message also should be improved. (X)HTML is a
> > > > textual format used to represent text. There is absolutely
> > > > no need to use control codes in (X)HTML. If anybody thinks
> > > > otherwise, they didn't understand (X)HTML. I don't remember
> > > > having been asked about control codes in (X)HTML at all.
> > > > This should be clearly reflected in the answer.
> > > >
> > > > XML in general is used both for text and for data. So
> > > > there may be some interesting use cases for control
> > > > codes in XML. The typical example would be an XML
> > > > format for control code sequences for terminals
> > > > (i.e. an XML version of a unix termcap file).
> > > >
> > > > Apart from such rather exotic examples, the main reason
> > > > that there are control codes in data usually is one of
> > > > the following (most probably in the following order):
> > > >
> > > > - Pure garbage. The right thing is to clean up your data.
> > > >
> > > > - Old ways of representing data (starting with using Backspace
> > > >    to get accented versions of characters). The right thing
> > > >    is to convert your data, i.e. by doing the correct transcoding
> > > >    or by adding markup.
> > > >
> > > > In the table, I suggest to have separate rows for
> > > > CR/LF/TAB and for NEL (which is special in XML 1.1).
> > > >
> > > > The page says: "An alternative is to encode the data. For example,
> > > > encode the data as base64 or as hexadecimal values, to ensure only
> > > > supported characters are used in the markup language text."
> > > >
> > > > I'm very surprised to see this on an i18n-related page.
> > > > What this will do is that it will throw out of the window
> > > > any and all i18n features that XML has. So from an i18n
> > > > viewpoint, we should not recommend it, we should indeed
> > > > clearly recommend against it.
> > > >
> > > > Hope this helps.
> > > >
> > > > Regards,    Martin.
> > > >
> > > > At 12:49 03/05/28 -0400, Tex Texin wrote:
> > > > >I am not sure why, but the geo list isn't distributing (my?) mail
> > > since lst
> > > > >night.
> > > > >
> > > > >Here is the controls page for q&a today.
> > > > >I may be a little late to the meeting.
> > > > >
> > > > >http://www.i18nguy.com/test/controls.htm
> > > > >
> > > > >sorry, I don't have everyone's email. (maybe that's a good thing. 
> ;-) )
> > > > >
> > > > >tex
>
>
>--
>-------------------------------------------------------------
>Tex Texin   cell: +1 781 789 1898   mailto:Tex@XenCraft.com
>Xen Master                          http://www.i18nGuy.com
>
>XenCraft                            http://www.XenCraft.com
>Making e-Business Work Around the World
>-------------------------------------------------------------

Received on Monday, 2 June 2003 10:11:58 UTC