- From: Bjoern Hoehrmann <derhoermi@gmx.net>
- Date: Tue, 21 Sep 2004 19:20:51 +0200
- To: Terje Bless <link@pobox.com>
- Cc: Martin Duerst <duerst@w3.org>, QA Dev <public-qa-dev@w3.org>
* Terje Bless wrote: >The idea is that — and let me just use the charset stuff as an example here — >you start to build an external, standalone, module that reimplements all the >functionality we need for the Validator. This module brought to about Beta >quality and ideally is generic enough to be released standalone on CPAN. I strongly agree here, especially from a management perspective. We are all interested in M12N and have some ideas for it, but these tend to be not too well expressed and coordinated, and might even conflict, if we all try to do this inside check we would quickly run into problems. Development independent of check would have the following benefits: * general purpose re-usable code, easier to create new services for similar tasks, http://qa-dev.w3.org/~bjoern/appendix-c/validator/ for example does not deal with 'charset' parameters in HTTP headers and supports only a limited set of encodings just because there is no module that makes that easy * proper documentation, helps among other things expectation manage- ment, you should know from the documentation what the code is supposed to do and probably how it achieves that (revealing bugs without looking at the code...) * broad review of code, it's easier to go through smaller packages looking for bugs, shortcomings, etc. it is thus more likely to happen; also, it eases platform independent code as there are the CPAN testers who grap modules from CPAN and run their test suites on their system, informing you of failures, etc. * easier for outsiders to contribute patches, etc. because they don't have to figure out all of check first (installing the Validator to test changes which would require installing a web server, etc) * easier for insiders to focus on their code, you would be generally responsible for your modules and can work on them as it suits you best without figuring out code from others in check, or need to discuss changes with others before implementing them, etc. * proper test suite close to the relevant code, for example, no need to test the doctype detection code through screen scraping the HTML results document of the Validator if the code is elsewhere and has its own test suite * avoids duplicating code across check and checklink, etc. if there is a bug you only need to fix it in one place rather than many * people installing the Validator locally get the latest bug fixes immediately if the bug is in some external code, they just need to update from CPAN (or get them on installation already) * hinders individual developers striving for perfection, they don't see much of the code so they don't care much about it * it does not destabilize the WMVS, it specifically avoids abusing release branches for development ("The code has some bugs but I'll commit the fixes later this week" which then never happens and other developers would first need to figure out what the bugs are) and such. * ... The downsides are that it might take longer for changes to get applied to the release version and that it requires more work (you'd have to write documentation, test suites, ask for feedback on relevant mailing lists, think about module names and interfaces, ...) in fact, a lot more work than just hacking some bits of the code on some rainy afternoon, but I think it is certainly worth doing. As far as I am concerned it is much simpler to develop a stand-along module than messing with `check`, http://www.w3.org/mid/41573fd6.153598893@smtp.bjoern.hoehrmann.de for example had complete code with 70+ tests and some documentation in about an hour. Trying to figure out how these things work in check today along with discovering bugs, reporting them, and trying to build on top of that would have taken much longer. I would go even further than Terje and say that we should avoid making "improvements" inside check and rather make these improvements available through new, external modules and stabilize these modules so that the code in check could be replaced ASAP and only then benefit from these changes. >>HTTP::Charset is even smaller and more boring than XML::Charset: just >>look at the content type. > >Don't be fooled by the off-the-cuff name of the module; our charset code does >a _lot_ more than just look at the Content-Type. Maybe a better name would be >«HTTP::Charset::Heuristic», which would do all the charset determination rules >we use in «check» today, plus have options to allow, e.g., a <meta> element to >override the Content-Type (which we don't currently do) etc. HTTP:: would be a bad namespace then... I also disagree with Martin, even just extracting the charset is not just one line of code, you have to deal with cases such as Content-Type: text/html Content-Type: text/html;charset=iso-8859-1 or Content-Type: text/html;charset=iso-8859-1 Content-Type: text/html;charset=utf-8 or Content-Type: text/html;note="charset='iso-8859-1'";charset=utf-8 or Content-Type: text/html ;charset= utf-8 or Content-Type: text/html;charset="utf-8' or Content-Type: text/html;version="...";charset=iso-8859-1 and so on, that's certainly something that can go into it's own .pm. Specifically if you add additional complexity such as reporting the flaws in headers as those above back to the application so it can report these to the user. So this is not just looking at one header. It might make sense though to have this code (and its test suite, etc.) as part of some larger distribution in the CPAN sense. >Any changes that are small and necessary for 0.7.0 should go in HEAD; >anything else will have to happen outside CVS or in a dedicated branch >(e.g. «validator-m12n-charset-branch»). Agreed (ignoring you said anything about branches... :-) any code that needs a lot of testing or might in other ways put our release schedule for 0.7.0 at risk should not go into HEAD.
Received on Tuesday, 21 September 2004 17:21:41 UTC