Re: [WMVS] Some initial thoughts on code M12N... from Bjoern Hoehrmann on 2004-09-21 (public-qa-dev@w3.org from September 2004)

From: Bjoern Hoehrmann <derhoermi@gmx.net>
Date: Tue, 21 Sep 2004 19:20:51 +0200
To: Terje Bless <link@pobox.com>
Cc: Martin Duerst <duerst@w3.org>, QA Dev <public-qa-dev@w3.org>
Message-ID: <415f4c08.172999039@smtp.bjoern.hoehrmann.de>
* Terje Bless wrote:
>The idea is that — and let me just use the charset stuff as an example here —
>you start to build an external, standalone, module that reimplements all the
>functionality we need for the Validator. This module brought to about Beta
>quality and ideally is generic enough to be released standalone on CPAN.

I strongly agree here, especially from a management perspective. We are
all interested in M12N and have some ideas for it, but these tend to be
not too well expressed and coordinated, and might even conflict, if we
all try to do this inside check we would quickly run into problems.
Development independent of check would have the following benefits:

  * general purpose re-usable code, easier to create new services for
    similar tasks, http://qa-dev.w3.org/~bjoern/appendix-c/validator/
    for example does not deal with 'charset' parameters in HTTP headers
    and supports only a limited set of encodings just because there is
    no module that makes that easy

  * proper documentation, helps among other things expectation manage-
    ment, you should know from the documentation what the code is
    supposed to do and probably how it achieves that (revealing bugs
    without looking at the code...)

  * broad review of code, it's easier to go through smaller packages
    looking for bugs, shortcomings, etc. it is thus more likely to
    happen; also, it eases platform independent code as there are the
    CPAN testers who grap modules from CPAN and run their test suites
    on their system, informing you of failures, etc.

  * easier for outsiders to contribute patches, etc. because they don't
    have to figure out all of check first (installing the Validator to
    test changes which would require installing a web server, etc)

  * easier for insiders to focus on their code, you would be generally
    responsible for your modules and can work on them as it suits you
    best without figuring out code from others in check, or need to
    discuss changes with others before implementing them, etc.

  * proper test suite close to the relevant code, for example, no need
    to test the doctype detection code through screen scraping the HTML
    results document of the Validator if the code is elsewhere and has
    its own test suite

  * avoids duplicating code across check and checklink, etc. if there
    is a bug you only need to fix it in one place rather than many

  * people installing the Validator locally get the latest bug fixes
    immediately if the bug is in some external code, they just need to
    update from CPAN (or get them on installation already)

  * hinders individual developers striving for perfection, they don't
    see much of the code so they don't care much about it

  * it does not destabilize the WMVS, it specifically avoids abusing
    release branches for development ("The code has some bugs but I'll
    commit the fixes later this week" which then never happens and
    other developers would first need to figure out what the bugs are)
    and such.

  * ...

The downsides are that it might take longer for changes to get applied
to the release version and that it requires more work (you'd have to
write documentation, test suites, ask for feedback on relevant mailing
lists, think about module names and interfaces, ...) in fact, a lot more
work than just hacking some bits of the code on some rainy afternoon,
but I think it is certainly worth doing. As far as I am concerned it is
much simpler to develop a stand-along module than messing with `check`,
http://www.w3.org/mid/41573fd6.153598893@smtp.bjoern.hoehrmann.de for
example had complete code with 70+ tests and some documentation in about
an hour. Trying to figure out how these things work in check today along
with discovering bugs, reporting them, and trying to build on top of
that would have taken much longer.

I would go even further than Terje and say that we should avoid making
"improvements" inside check and rather make these improvements available
through new, external modules and stabilize these modules so that the
code in check could be replaced ASAP and only then benefit from these
changes.

>>HTTP::Charset is even smaller and more boring than XML::Charset: just
>>look at the content type.
>
>Don't be fooled by the off-the-cuff name of the module; our charset code does
>a _lot_ more than just look at the Content-Type. Maybe a better name would be
>«HTTP::Charset::Heuristic», which would do all the charset determination rules
>we use in «check» today, plus have options to allow, e.g., a <meta> element to
>override the Content-Type (which we don't currently do) etc.

HTTP:: would be a bad namespace then... I also disagree with Martin,
even just extracting the charset is not just one line of code, you
have to deal with cases such as

  Content-Type: text/html
  Content-Type: text/html;charset=iso-8859-1

or

  Content-Type: text/html;charset=iso-8859-1
  Content-Type: text/html;charset=utf-8

or

  Content-Type: text/html;note="charset='iso-8859-1'";charset=utf-8

or

  Content-Type: text/html
   ;charset=
   utf-8

or

  Content-Type: text/html;charset="utf-8'

or

  Content-Type: text/html;version="...";charset=iso-8859-1

and so on, that's certainly something that can go into it's own .pm.
Specifically if you add additional complexity such as reporting the
flaws in headers as those above back to the application so it can
report these to the user. So this is not just looking at one header.
It might make sense though to have this code (and its test suite,
etc.) as part of some larger distribution in the CPAN sense.

>Any changes that are small and necessary for 0.7.0 should go in HEAD;
>anything else will have to happen outside CVS or in a dedicated branch
>(e.g. «validator-m12n-charset-branch»).

Agreed (ignoring you said anything about branches... :-) any code that
needs a lot of testing or might in other ways put our release schedule
for 0.7.0 at risk should not go into HEAD.
Received on Tuesday, 21 September 2004 17:21:41 UTC