- From: Martin Duerst <duerst@w3.org>
- Date: Mon, 02 Jun 2003 10:08:56 -0400
- To: Tex Texin <tex@i18nguy.com>
- Cc: public-i18n-geo@w3.org
Hello Tex, At 19:04 03/06/01 -0400, Tex Texin wrote: >Martin, >good comments and I generally agree. >We should add an example using xml separators to make that option clear. > >You didn't say never use controls in XML, but you came close.. > >"There is absolutely no need to use control codes in (X)HTML. If anybody >thinks otherwise, they didn't understand (X)HTML." > >But I made the leap to never. Yes. There is a difference between XML (which is for text and data) and (X)HTML, which is for text only. In the later case, it's *never*. In the former case, there may be some (rare!) exceptions. Regards, Martin. >I think the one thing I disagree with (somewhat) is the links vs. notes. I >think in some cases the notes are needed to clarify what is relevant in the >links. In other cases, links alone are fine. > > >Anyway, I think we can fix it to make everyone happy. Richard wanted to post >it last Friday. >Richard you can either post it with your final changes and then I will make >further changes for Martin's comments and give it back to you for subsequent >update, or you can turn it back to me and I will try to fix further. >Or you can fix it altogether if you want. > > >tex > > >Martin Duerst wrote: > > > > Hello Tex, > > > > At 03:13 03/05/31 -0400, Tex Texin wrote: > > > > >Hi, Good comments, although I have some disagreements, I liked your > analysis > > >and think the points worth discussing. > > > > > >0) Question wording. Yes, we agreed to change the wording to something > close > > >to what you suggested. > > > > Very good. > > > > >1) Length. I think people may have a general idea of what controls are > but may > > >not know the specifics, and especially the specifics of the ranges and the > > >ranges in Unicode. > > > > Well, those who don't know what they are are probably not interested > > in using them. And we should avoid giving the impression that they > > are something important to know. > > > > >We could break the piece into multiple questions, but I > > >wonder about how appropriate these backgrounders are for an i18n qa > list... > > >Especially this early on. We could move more of the background explanation > > >below the question and answer. > > > > Yes, I think this is the best thing to do. > > > > >I think you are right the question and answer > > >should be succint, and at the top of the page but I don't see a > problem with > > >additional clarifying and supporting information being available on > the page, > > >after the main point is discussed. > > > > > >If there is a strong objection to the background info, I would be happy to > > >move it to a page on my web site, GEO can have the short version and > GEO can > > >optionally link to my page for more info. > > > > > >2) Relevance- I understand your questioning the topic, I would have > done the > > >same. It came about because in fact I was asked the question last week. > > >Controls are not only used for manipulating devices. They have other uses. > > >An application development environment I am familiar with does a lot of > > >value-list processing. Depending on the nature of the data, the list > separator > > >is changed. e.g. if it's a list of european decimals they would not want > > >commas as a separator. To avoid conflicts between the list values and > > >separators, in general routines, they use 0x01, 0x02, etc. as > separators. So > > >they have lots of data in databases using these values. > > >(Yes, they could have instead adopted escape mechanisms instead.) > > > > I think this is a good, practical example. Whatever they do in > > their database is not really our problem. > > > > >They ran into problems writing the data to xml. Some software liked > it, others > > >didn't. > > > > The software that tolerated it was faulty. > > > > >When they looked into the errors due to control codes not being > > >allowed they needed advice. Which are the disallowed characters, and > what are > > >the workarounds? Hence the article. > > > > Very good. For the example above, the best thing to do is > > to use XML for the separators. E.g. > > value1<sep/>value2<sep/>value3... > > > > That's what XML is for. > > > > >I believe there may be a lot of data using controls, and as with this > group, > > >people may not have time to develop better solutions other than > writing the > > >data out as NCRs. > > > > If they want to use XML, they should at least try to use it the > > right way. And we should help them understand what the right > > way is. They can always decide to do something else on their own. > > > > >3) So because of 2, I claim if XML is for data interchange, support for > > >interchange of controls is needed. > > > > Your example doesn't show that. XML has a perfect way of exchanging > > structured data. > > > > Also, in some way, the use case above looks like they just needed > > *any* character. Maybe converting it to a PUA character would > > be another solution (but I don't like that, either). > > > > >I can agree the needs are exotic. > > >You can argue that the data should instead be cleaned up, but that is > > >impractical in some cases. > > >In any event, it is worthwhile to let people know what is and is not > doable in > > >*ML. > > >I don't mind giving more emphasis to cleaning up the data. > > > > Yes, I think we should do that. > > > > >I also don't mind emphasizing that control codes are to be avoided, > and are > > >bad for scalability and on the web. > > > > Yes, very good. > > > > >I would disagree with saying never use controls in XML. > > > > I haven't said that. > > > > >I would presume the reason support for controls as NCRs was added, is > because > > >some needs were identified for supporting controls. > > > > > > > > >4) separate rows for NL. I agree. > > > > > >5) encoding. I believe what we said, is that if the data is in fact > binary, > > >encoding is an option. > > >Essentially, if it is binary, it is not an i18n issue. > > > > The paragraph that mentions base64 does not say anything about > > the data being binary, and the problems for i18n if the data > > is textual. If you have discussed that, and it's going to be > > updated, that's good. > > > > Some more points: > > > > - For XML 1.1, clearly say that it is not yet a Recommendation. > > - If possible, don't use notes. In most cases, they can be > > replaced with a direct link. > > - In note 4, change "For example, eacute is the Character Entity Reference" > > to "For example, é is the Character Entity Reference" > > > > Regards, Martin. > > > > >Richard, if you want to finish the changes you were going to make, you can > > >address Martin's comments or pass it back to me and I'll address them. > > >tex > > > > > > > > >Martin Duerst wrote: > > > > > > > > Hello Tex, > > > > > > > > Some more comments on your Q&A. > > > > > > > > Overall, I think that the answer is much too long. It not only > > > > answers 'How do ... support control codes', but also 'what are > > > > control codes', and so on. But the question assumes a basic > > > > knowledge of control codes. Peolpe who don't know these > > > > are not even interested in reading the answer. > > > > > > > > Also, I guess the real question is not how HTML and XML > > > > support control codes, but "How can I represent control > > > > codes in HTML or XML". > > > > > > > > The basic message also should be improved. (X)HTML is a > > > > textual format used to represent text. There is absolutely > > > > no need to use control codes in (X)HTML. If anybody thinks > > > > otherwise, they didn't understand (X)HTML. I don't remember > > > > having been asked about control codes in (X)HTML at all. > > > > This should be clearly reflected in the answer. > > > > > > > > XML in general is used both for text and for data. So > > > > there may be some interesting use cases for control > > > > codes in XML. The typical example would be an XML > > > > format for control code sequences for terminals > > > > (i.e. an XML version of a unix termcap file). > > > > > > > > Apart from such rather exotic examples, the main reason > > > > that there are control codes in data usually is one of > > > > the following (most probably in the following order): > > > > > > > > - Pure garbage. The right thing is to clean up your data. > > > > > > > > - Old ways of representing data (starting with using Backspace > > > > to get accented versions of characters). The right thing > > > > is to convert your data, i.e. by doing the correct transcoding > > > > or by adding markup. > > > > > > > > In the table, I suggest to have separate rows for > > > > CR/LF/TAB and for NEL (which is special in XML 1.1). > > > > > > > > The page says: "An alternative is to encode the data. For example, > > > > encode the data as base64 or as hexadecimal values, to ensure only > > > > supported characters are used in the markup language text." > > > > > > > > I'm very surprised to see this on an i18n-related page. > > > > What this will do is that it will throw out of the window > > > > any and all i18n features that XML has. So from an i18n > > > > viewpoint, we should not recommend it, we should indeed > > > > clearly recommend against it. > > > > > > > > Hope this helps. > > > > > > > > Regards, Martin. > > > > > > > > At 12:49 03/05/28 -0400, Tex Texin wrote: > > > > >I am not sure why, but the geo list isn't distributing (my?) mail > > > since lst > > > > >night. > > > > > > > > > >Here is the controls page for q&a today. > > > > >I may be a little late to the meeting. > > > > > > > > > >http://www.i18nguy.com/test/controls.htm > > > > > > > > > >sorry, I don't have everyone's email. (maybe that's a good thing. > ;-) ) > > > > > > > > > >tex > > >-- >------------------------------------------------------------- >Tex Texin cell: +1 781 789 1898 mailto:Tex@XenCraft.com >Xen Master http://www.i18nGuy.com > >XenCraft http://www.XenCraft.com >Making e-Business Work Around the World >-------------------------------------------------------------
Received on Monday, 2 June 2003 10:11:58 UTC