- From: Martin Duerst <duerst@w3.org>
- Date: Mon, 20 Sep 2004 15:41:10 +0900
- To: Bjoern Hoehrmann <derhoermi@gmx.net>
- Cc: public-qa-dev@w3.org
At 16:35 04/09/16 +0200, Bjoern Hoehrmann wrote: >* Martin Duerst wrote: > >I'm working on getting rid of the dependency of Text::Iconv anyway, > >using perl unicode stuff. I should be able to check in the code next > >week. So I wouldn't worry too much about Text::Iconv anymore. > >Do you mean you are working on a general purpose module for check, >checklink, etc. that we can plug into the new Markup Validator or As I said earlier, I'm doing some work that may eventually end up in a module. It's much easier to wrap code up into a module once the interfaces are clear than just starting with a module because it looks good to have one (which I agree it would). >do you mean you are working on a few changes to check? In case of >the latter, what version exactly? I'm still trying to figure out what's the right thing to do, 0.6.0 or HEAD. >I was under the impression that >we agreed that using Encode and proper Perl Unicode features were >not planned for 0.7.0 which will be the next version of the Markup >Validator. Who agreed? You suggested to use proper Perl Unicode, didn't you? >In that case, I would be concerned that such changes >introduce a number of additional complexities that might be >difficult to deal with without a test suite and such. A lot of things would be better with a test suite. But I'm not ready to wait for one. >It is worth >to point out that switching to proper Unicode internals is by no >means trivial, for example > > % perl -MEncode -e "print decode 'utf-16be', qq(\x00\xf6)" > Unknown encoding 'utf-16be' at -e line 1 > >using the Encode.pm that ships with Perl 5.8.2 even though the >encoding would be supported if written as "UTF-16BE". Good to know. Does this apply to all encodings, or only to a few? >Other things >to consider would be semantic changes to various symbols e.g. in >regular expressions, > > #!perl -w > use strict; > use warnings; > use Text::Iconv; > use Encode; > > my $t1 = qq(\x20\x28); > my $s1 = Text::Iconv->new("UTF-16BE" => "utf-8")->convert($t1); > my $s2 = Encode::decode("UTF-16BE", $t1); > > print "ok1\n" if $s1 =~ /\s/; > print "ok2\n" if $s2 =~ /\s/; > >This would print "ok2" but not "ok1", we would have to go through >all of these Good point. [What this is about is that \s matches more than a few ASCII characters in the case of Unicode.] >and check which behavior we desire, and have tests so >that later changes do not introduce bugs. Iconv and Encode also do >not support the same set of character encodings, GB18030 for example >is supported by the current Markup Validator but not by the Encode >version that ships with Perl 5.8.2, we would first need to figure >out for which encodings we would need to drop support or find other >replacements. Or we would just (temporarily) drop those that are not supported. >Other problems might come from our dependencies, if we rely on data >from these modules we would need to check carefully whether this >data has the UTF-8 flag set and how they cope with data that has a >UTF-8 flag set. They might have similar problems with \s and other >symbols aswell and thus cause undesired side effects. > >I really don't think we should make such changes without a proper >automated test suite in place and I am not sure whether switching >to proper Unicode internals fits into 0.7.0, for 0.8.0 when we >switch to a SGML::Parser::OpenSP infrastructure a number of these >problems would already be solved and dealing with legacy workaround >might turn out to be difficult. > >Also note that the current code works with Perl 5.6.x, Encode.pm >would only work with Perl 5.7.x+, I am not sure whether we really >agreed to shift the requirements for the 0.7.0 release. Users that >have a problem with Text::Iconv might have even more problems with >Perl 5.8.2+. Not sure. Upgrading to a new perl version may be easier than getting a specific module. Regards, Martin.
Received on Monday, 20 September 2004 22:35:49 UTC