- From: Bjoern Hoehrmann <derhoermi@gmx.net>
- Date: Thu, 16 Sep 2004 16:35:04 +0200
- To: Martin Duerst <duerst@w3.org>
- Cc: public-qa-dev@w3.org
* Martin Duerst wrote: >I'm working on getting rid of the dependency of Text::Iconv anyway, >using perl unicode stuff. I should be able to check in the code next >week. So I wouldn't worry too much about Text::Iconv anymore. Do you mean you are working on a general purpose module for check, checklink, etc. that we can plug into the new Markup Validator or do you mean you are working on a few changes to check? In case of the latter, what version exactly? I was under the impression that we agreed that using Encode and proper Perl Unicode features were not planned for 0.7.0 which will be the next version of the Markup Validator. In that case, I would be concerned that such changes introduce a number of additional complexities that might be difficult to deal with without a test suite and such. It is worth to point out that switching to proper Unicode internals is by no means trivial, for example % perl -MEncode -e "print decode 'utf-16be', qq(\x00\xf6)" Unknown encoding 'utf-16be' at -e line 1 using the Encode.pm that ships with Perl 5.8.2 even though the encoding would be supported if written as "UTF-16BE". Other things to consider would be semantic changes to various symbols e.g. in regular expressions, #!perl -w use strict; use warnings; use Text::Iconv; use Encode; my $t1 = qq(\x20\x28); my $s1 = Text::Iconv->new("UTF-16BE" => "utf-8")->convert($t1); my $s2 = Encode::decode("UTF-16BE", $t1); print "ok1\n" if $s1 =~ /\s/; print "ok2\n" if $s2 =~ /\s/; This would print "ok2" but not "ok1", we would have to go through all of these and check which behavior we desire, and have tests so that later changes do not introduce bugs. Iconv and Encode also do not support the same set of character encodings, GB18030 for example is supported by the current Markup Validator but not by the Encode version that ships with Perl 5.8.2, we would first need to figure out for which encodings we would need to drop support or find other replacements. Other problems might come from our dependencies, if we rely on data from these modules we would need to check carefully whether this data has the UTF-8 flag set and how they cope with data that has a UTF-8 flag set. They might have similar problems with \s and other symbols aswell and thus cause undesired side effects. I really don't think we should make such changes without a proper automated test suite in place and I am not sure whether switching to proper Unicode internals fits into 0.7.0, for 0.8.0 when we switch to a SGML::Parser::OpenSP infrastructure a number of these problems would already be solved and dealing with legacy workaround might turn out to be difficult. Also note that the current code works with Perl 5.6.x, Encode.pm would only work with Perl 5.7.x+, I am not sure whether we really agreed to shift the requirements for the 0.7.0 release. Users that have a problem with Text::Iconv might have even more problems with Perl 5.8.2+. I thus hope you are working on an external module, in that case it would be good if you could share some details on your plan.
Received on Thursday, 16 September 2004 14:35:56 UTC