W3C home > Mailing lists > Public > public-i18n-geo@w3.org > June 2003

RE: controls for GEO discusssion

From: Richard Ishida <ishida@w3.org>
Date: Mon, 2 Jun 2003 14:15:14 +0100
To: "'Tex Texin'" <tex@i18nguy.com>, "'Martin Duerst'" <duerst@w3.org>
Cc: <public-i18n-geo@w3.org>
Message-ID: <002001c32909$01d3ad40$ec01000a@w3c40upc3ma3j2>

Hi Tex,

I'd like to publish this today.  I just called you but no answer.

Can you give me a call, and we can quickly decide the way forward?  +44
1753 480 292

Cheers,
RI



============
Richard Ishida
W3C

tel: +44 1753 480 292
http://www.w3.org/International/
http://www.w3.org/People/Ishida/



> -----Original Message-----
> From: public-i18n-geo-request@w3.org 
> [mailto:public-i18n-geo-request@w3.org] On Behalf Of Tex Texin
> Sent: 02 June 2003 00:04
> To: Martin Duerst
> Cc: public-i18n-geo@w3.org
> Subject: Re: controls for GEO discusssion
> 
> 
> 
> Martin, 
> good comments and I generally agree.
> We should add an example using xml separators to make that 
> option clear.
> 
> You didn't say never use controls in XML, but you came close..
> 
> "There is absolutely no need to use control codes in (X)HTML. 
> If anybody thinks otherwise, they didn't understand (X)HTML."
> 
> But I made the leap to never.
> 
> I think the one thing I disagree with (somewhat) is the links 
> vs. notes. I think in some cases the notes are needed to 
> clarify what is relevant in the links. In other cases, links 
> alone are fine.
> 
> 
> Anyway, I think we can fix it to make everyone happy. Richard 
> wanted to post it last Friday. Richard you can either post it 
> with your final changes and then I will make further changes 
> for Martin's comments and give it back to you for subsequent 
> update, or you can turn it back to me and I will try to fix 
> further. Or you can fix it altogether if you want.
> 
> 
> tex
> 
> 
> Martin Duerst wrote:
> > 
> > Hello Tex,
> > 
> > At 03:13 03/05/31 -0400, Tex Texin wrote:
> > 
> > >Hi, Good comments, although I have some disagreements, I 
> liked your 
> > >analysis and think the points worth discussing.
> > >
> > >0) Question wording. Yes, we agreed to change the wording to 
> > >something close to what you suggested.
> > 
> > Very good.
> > 
> > >1) Length. I think people may have a general idea of what controls 
> > >are but may not know the specifics, and especially the 
> specifics of 
> > >the ranges and the ranges in Unicode.
> > 
> > Well, those who don't know what they are are probably not 
> interested 
> > in using them. And we should avoid giving the impression 
> that they are 
> > something important to know.
> > 
> > >We could break the piece into multiple questions, but I 
> wonder about 
> > >how appropriate these backgrounders are for an i18n qa list... 
> > >Especially this early on. We could move more of the background 
> > >explanation below the question and answer.
> > 
> > Yes, I think this is the best thing to do.
> > 
> > >I think you are right the question and answer
> > >should be succint, and at the top of the page but I don't see a 
> > >problem with additional clarifying and supporting 
> information being 
> > >available on the page, after the main point is discussed.
> > >
> > >If there is a strong objection to the background info, I would be 
> > >happy to move it to a page on my web site, GEO can have the short 
> > >version and GEO can optionally link to my page for more info.
> > >
> > >2) Relevance- I understand your questioning the topic, I 
> would have 
> > >done the same. It came about because in fact I was asked 
> the question 
> > >last week. Controls are not only used for manipulating 
> devices. They 
> > >have other uses. An application development environment I 
> am familiar 
> > >with does a lot of value-list processing. Depending on the 
> nature of 
> > >the data, the list separator is changed. e.g. if it's a list of 
> > >european decimals they would not want commas as a 
> separator. To avoid 
> > >conflicts between the list values and separators, in general 
> > >routines, they use 0x01, 0x02, etc. as separators. So they 
> have lots 
> > >of data in databases using these values. (Yes, they could have 
> > >instead adopted escape mechanisms instead.)
> > 
> > I think this is a good, practical example. Whatever they do 
> in their 
> > database is not really our problem.
> > 
> > >They ran into problems writing the data to xml. Some 
> software liked 
> > >it, others didn't.
> > 
> > The software that tolerated it was faulty.
> > 
> > >When they looked into the errors due to control codes not being 
> > >allowed they needed advice. Which are the disallowed 
> characters, and 
> > >what are the workarounds? Hence the article.
> > 
> > Very good. For the example above, the best thing to do is
> > to use XML for the separators. E.g. 
> value1<sep/>value2<sep/>value3...
> > 
> > That's what XML is for.
> > 
> > >I believe there may be a lot of data using controls, and 
> as with this 
> > >group, people may not have time to develop better solutions other 
> > >than writing the data out as NCRs.
> > 
> > If they want to use XML, they should at least try to use it 
> the right 
> > way. And we should help them understand what the right way is. They 
> > can always decide to do something else on their own.
> > 
> > >3) So because of 2, I claim if XML is for data 
> interchange, support 
> > >for interchange of controls is needed.
> > 
> > Your example doesn't show that. XML has a perfect way of exchanging 
> > structured data.
> > 
> > Also, in some way, the use case above looks like they just needed
> > *any* character. Maybe converting it to a PUA character would be 
> > another solution (but I don't like that, either).
> > 
> > >I can agree the needs are exotic.
> > >You can argue that the data should instead be cleaned up, 
> but that is 
> > >impractical in some cases. In any event, it is worthwhile to let 
> > >people know what is and is not doable in *ML.
> > >I don't mind giving more emphasis to cleaning up the data.
> > 
> > Yes, I think we should do that.
> > 
> > >I also don't mind emphasizing that control codes are to be 
> avoided, 
> > >and are bad for scalability and on the web.
> > 
> > Yes, very good.
> > 
> > >I would disagree with saying never use controls in XML.
> > 
> > I haven't said that.
> > 
> > >I would presume the reason support for controls as NCRs 
> was added, is 
> > >because some needs were identified for supporting controls.
> > >
> > >
> > >4) separate rows for NL. I agree.
> > >
> > >5) encoding. I believe what we said, is that if the data 
> is in fact 
> > >binary, encoding is an option. Essentially, if it is binary, it is 
> > >not an i18n issue.
> > 
> > The paragraph that mentions base64 does not say anything about the 
> > data being binary, and the problems for i18n if the data is 
> textual. 
> > If you have discussed that, and it's going to be updated, 
> that's good.
> > 
> > Some more points:
> > 
> > - For XML 1.1, clearly say that it is not yet a Recommendation.
> > - If possible, don't use notes. In most cases, they can be
> >    replaced with a direct link.
> > - In note 4, change "For example, eacute is the Character 
> Entity Reference"
> >    to "For example, &eacute; is the Character Entity Reference"
> > 
> > Regards,   Martin.
> > 
> > >Richard, if you want to finish the changes you were going to make, 
> > >you can address Martin's comments or pass it back to me and I'll 
> > >address them. tex
> > >
> > >
> > >Martin Duerst wrote:
> > > >
> > > > Hello Tex,
> > > >
> > > > Some more comments on your Q&A.
> > > >
> > > > Overall, I think that the answer is much too long. It not only 
> > > > answers 'How do ... support control codes', but also 'what are 
> > > > control codes', and so on. But the question assumes a basic 
> > > > knowledge of control codes. Peolpe who don't know these are not 
> > > > even interested in reading the answer.
> > > >
> > > > Also, I guess the real question is not how HTML and XML support 
> > > > control codes, but "How can I represent control codes 
> in HTML or 
> > > > XML".
> > > >
> > > > The basic message also should be improved. (X)HTML is a textual 
> > > > format used to represent text. There is absolutely no 
> need to use 
> > > > control codes in (X)HTML. If anybody thinks otherwise, 
> they didn't 
> > > > understand (X)HTML. I don't remember having been asked about 
> > > > control codes in (X)HTML at all. This should be clearly 
> reflected 
> > > > in the answer.
> > > >
> > > > XML in general is used both for text and for data. So 
> there may be 
> > > > some interesting use cases for control codes in XML. 
> The typical 
> > > > example would be an XML format for control code sequences for 
> > > > terminals (i.e. an XML version of a unix termcap file).
> > > >
> > > > Apart from such rather exotic examples, the main reason 
> that there 
> > > > are control codes in data usually is one of the following (most 
> > > > probably in the following order):
> > > >
> > > > - Pure garbage. The right thing is to clean up your data.
> > > >
> > > > - Old ways of representing data (starting with using Backspace
> > > >    to get accented versions of characters). The right thing
> > > >    is to convert your data, i.e. by doing the correct 
> transcoding
> > > >    or by adding markup.
> > > >
> > > > In the table, I suggest to have separate rows for CR/LF/TAB and 
> > > > for NEL (which is special in XML 1.1).
> > > >
> > > > The page says: "An alternative is to encode the data. 
> For example, 
> > > > encode the data as base64 or as hexadecimal values, to 
> ensure only 
> > > > supported characters are used in the markup language text."
> > > >
> > > > I'm very surprised to see this on an i18n-related page. 
> What this 
> > > > will do is that it will throw out of the window any and 
> all i18n 
> > > > features that XML has. So from an i18n viewpoint, we should not 
> > > > recommend it, we should indeed clearly recommend against it.
> > > >
> > > > Hope this helps.
> > > >
> > > > Regards,    Martin.
> > > >
> > > > At 12:49 03/05/28 -0400, Tex Texin wrote:
> > > > >I am not sure why, but the geo list isn't distributing 
> (my?) mail
> > > since lst
> > > > >night.
> > > > >
> > > > >Here is the controls page for q&a today.
> > > > >I may be a little late to the meeting.
> > > > >
> > > > >http://www.i18nguy.com/test/controls.htm
> > > > >
> > > > >sorry, I don't have everyone's email. (maybe that's a 
> good thing. 
> > > > >;-) )
> > > > >
> > > > >tex
> 
> 
> -- 
> -------------------------------------------------------------
> Tex Texin   cell: +1 781 789 1898   mailto:Tex@XenCraft.com
> Xen Master                          http://www.i18nGuy.com
>                          
> XenCraft		            http://www.XenCraft.com
> Making e-Business Work Around the World
> -------------------------------------------------------------
> 
Received on Monday, 2 June 2003 09:15:51 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 8 January 2008 14:12:37 GMT