Re: [WMVS] Some initial thoughts on code M12N... from Terje Bless on 2004-09-22 (public-qa-dev@w3.org from September 2004)

From: Terje Bless <link@pobox.com>
Date: Wed, 22 Sep 2004 09:25:29 +0200
To: Martin Duerst <duerst@w3.org>
cc: QA Dev <public-qa-dev@w3.org>
Message-ID: <r02010300-1035-95304DBE0C6811D98FFD0030657B83E8@[193.157.66.23]>
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Martin Duerst <duerst@w3.org> wrote:

>I understand that idea. But for the moment, all I can come up with are
>incremental improvements to WMVS anyway. Without sorting things out a
>bit better in the current code, it's very difficult to see how the final
>interface and functionality for the module should look like.

Right. Same thing; you're just doing the incremental improvements to the
existing code first, while figuring out what the module design should be like,
rather than designing the module first and bringing back the incremental
improvements to the inline code.


>The biggest downside, in my case, is that starting in a vacum before
>having a chance to sort out a few issues in actual existing code will
>mean that I'm just programming out in the blue, which will not be very
>productive.

Right, and for some things this is likely to fall into the "necessary evil"
category; but in general I think there is much room for sorting out issues
while we're progressing towards a point where M12N can happen.

If the charset code isn't sufficiently mature for specific work on an external
module to start, then internal refactoring and incremental improvements are
_necessary_ to get it to that level of maturity.


I think probably the bone of contention is/will be the exact point at which
internal refactoring ("internal m12n") should end and the external module
("M12N") should begin.

Personally I draw the line at starting to use package namespaces inside
"check"; if you feel it's time to say "package Foo::Bar" and stuff your code
inside that, then it's probably high time to start work on the external
module.

But I fully expect diverging opinions on this. e.g. while Björn has developed
SGML::Parser::OpenSP fully fledged outside "check", I know he's also suggested
starting off the M12N process by simply putting a "package" declaration above
the subroutines in "check" and working from there. I'm sure there are other
opinions on the choke-point as well.


>>that it forces dicipline in determining what improvements go in
>>immediately and what will have to wait until we're ready to make the
>>switch, and finally, that it takes longer before we can take advantage
>>of the new features in WMVS.
>
>Just a thought: In some way, I see "doing branches without branches".

Sure. The objections have been over the specific technical detail of using CVS
branches, due to perceived complexity and overhead, not over the
"architectual" issues of clean separation of functional code units and
out-of-band development etc. At least that's my understanding of it.

To be fair, the state of HEAD (bitrotting for many moons) and the release
branch (much code churn between .1 revisions) may have been an inevitable
result of the overhead of CVS branches (given the available developer
resources). Personally I'd like to take the blame for that myself instead of
blaming the "tool" (CVS branches), but I've been forced to concede the
possibility. :-)

In any case, so long as a portion (in this case even a majority) of the
developers feel CVS branches would be an impediment, and have few if any
benefits to outweigh it, it makes eminent sense to work around the need for
CVS branches.

Surely this lowers the bar especially for occasional contributors like
yourself, who can avoid spending time on figuring out the current branch
situation instead of writing code?


[XML::Charset]
>I think it shouldn't be a module of its own. But it may well be exposed
>as one function of a module. Would that work for you?

Well, I'm sure your judgement is better than mine on that -- :-) -- but my
general point is probably that we shouldn't be afraid of making code units too
small to stand alone. It's my experience, and strongly held opinion, that any
piece of code that merits a subroutine has potential value as a standalone
module; irrespective of the amount of ground covered.

Text::Iconv, say, is just a few tens of lines of code; as was the 0.01 version
of SGML::Parser::OpenSP.

In the specific case of "XML::Charset", my opinion was based on the fact that
Appendix F of the XML 1.0 Recommendation describes the algorithm instead of
just saying "it's possible to do it". If there is value in specifying the
algorithm there is probably value in providing a canonical implementation of
it to avoid reinventing the wheel in every module that needs to deal with XML.

But as mentioned, I'll trust your judgement over mine on this one.


>As Bjoern has already confirmed, this shouldn't have to do with HTTP.

Sure, the module names suggested were picked out of thin air.


[Charlint]
>Yes. I think we should put this off for a later stage.

Ok.


>>>BTW, I guess you mean 'transcoding' rather than 'transliteration'.
>>Remind me?? What's the definition of each of those again?
>
>Changing from one (character) encoding to another: transcoding.
>
>Writing something in a different script that the original one (e.g.
>Latin instead of cyrillic): transliteration.

Ah. I hadn't forgotten the difference; I apparently never knew the
distinction. Thanks for the correction!


>Okay. Am I correct that moving as much of the message text out of
>'check' is part of the 0.7.0 release?

Well, for the most part, what I'd intended for 0.7.0 is done. There are a
whole bunch of stuff left in there that is better done with a l10n framework
(i.e. something gettext-ish for getting the strings, and templates for
formatting) than by trying to move this out with just templates. And the l10n
framework is still not in place; and unless something radical changes, won't
be in place during the 0.7.0 release.

There's some stuff left that can (and should) get moved out during 0.7.0, but
other stuff will have to stay for the time being. Without looking at the code
I'd guess most of what you're seeing in the charset parts of the code are of
the latter kind (e.g. any &abort_if_error_flagged() calls and such).

- -- 
I have to admit that I'm hoping the current situation with regard to XML
Namespaces and W3C XML Schemas is a giant practical joke,   but I see no
signs of pranksters coming forward with a gleeful smile to announce that
they were just kidding.                              -- Simon St.Laurent

-----BEGIN PGP SIGNATURE-----
Version: PGP SDK 3.0.3

iQA/AwUBQVEo6KPyPrIkdfXsEQIdKwCgkOSk6GETVD4OlW4Hc/beOhODLA4An1Z1
TlClIqP6R0HN0lrw13QzL77G
=uJJ2
-----END PGP SIGNATURE-----
Received on Wednesday, 22 September 2004 07:25:35 UTC