RE: Summary of arguments FOR validity -- and another against -- and a third of alternatives

At 02:04 8/11/2005, Bruce Bailey, wrote (in a reply to Becky Gibson):
<blockquote>
(...)
Okay, I will succinctly summarize:

   "Thus, I can live with a requirement at level 1 that my code is
   well-formed - that is good coding practice and can help
   accessibility.   I can not live with a requirement for completely
   VALID code at level 1."

Your clarification regarding xml (including xhtml) is all well and good.

What is your proposal for evaluating the "well-formedness" of HTML 3.2 and 
4.01 content?

I am not aware of a widely accepted freely available automated measure for 
this.
</blockquote>

The only technique I can think of is using a validating SGML parser and 
discarding the validation errors. Even though wellformedness is not defined 
in SGML, SGML parsers can catch errors that have nothing to do with 
incorrect attributes and content models.

NSGMLS is part of the open-source SGML parser SP [1] (by James Clark) and 
now also available as OpenSP [2] (part of OpenJade). SP is also the core of 
the W3C HTML validator. Going to an online validator is not always 
practical, but there are a few alternatives.
Igor Podlubny has provided an off-line validator as a clipbook for NoteTab [3].
Matti Tukiainen has described how you can create an off-line validator with 
SP on Windows [4]. I think that people who are proficient in shellscript 
could easily create a Linux equivalent.
There used to be a VB wrapper around SGMLS (SP Wizard, by Larry Robertson), 
but it has not been updated for ages [5].
I am aware of the limitations of this "technique":
- you need to weed out the validation errors to find another type of errors;
- it may not work with all SGML parsers because - to my knowledge - the 
SGML specification does not define the output of an SGML parser or even 
that an SGML parser should validate;
- SP seems very hard to localise, so users need to know English.


[1] http://www.jclark.com/sp/index.htm
[2] http://openjade.sourceforge.net/doc/index.htm
[3] http://www.tuke.sk/podlubny/ov.html
[4] http://ktmatu.com/info/do-it-yourself-offline-html-validator/
[5] http://www.eccnet.com/sgmlug/spwizard/


Regards,

Christophe Strobbe


-- 
Christophe Strobbe
K.U.Leuven - Departement of Electrical Engineering - Research Group on 
Document Architectures
Kasteelpark Arenberg 10 - 3001 Leuven-Heverlee - BELGIUM
tel: +32 16 32 85 51
http://www.docarch.be/ 


Disclaimer: http://www.kuleuven.be/cwis/email_disclaimer.htm

Received on Wednesday, 9 November 2005 10:06:03 UTC