- From: Elliotte Harold <elharo@metalab.unc.edu>
- Date: Mon, 16 Feb 2009 07:30:29 -0800
- To: Julian Reschke <julian.reschke@gmx.de>
- Cc: Bijan Parsia <bparsia@cs.man.ac.uk>, Bijan Parsia <bparsia@cs.manchester.ac.uk>, www-tag@w3.org
Julian Reschke wrote: > In all cases though, *testing* the document using conforming software > will highlight errors early on. People hand-editing XML, even experts, will make well-formedness mistakes. Take that as a given. The same is true of people hand editing Java, C++, Perl, Haskell or SQL. The difference is that these languages are routinely passed to compilers or interpreters that rapidly reveal all syntax errors. Nowadays we even use editors that reveal syntax errors as we type. Consequently syntax errors rarely make it into production (except among college students of questionable honesty). Is it annoying that the compilers can't autocorrect syntax errors? Yes, it is; but we have learned from experience that when compilers try to autocorrect syntax errors more often than not they get it wrong. Fixing syntax errors at the compiler level leads to far more serious, far more costly, and far harder to debug semantic errors down the line. Draconian error handling leads to fewer mistakes where the person sitting at the keyboard meant one thing but typed another. Syntax errors are one of the prices developers have to pay in order to produce reliable, maintainable software. Languages have been developed that attempt, to grater or lesser degrees, to avoid the possibility of syntax error. They have uniformly failed. Although HTML and XML are less complex than Turing complete-programming languages, I do not think they are sufficiently less complex to make the lessons learned in Java, C, Perl, etc. inapplicable. Attempts to auto-correct syntax errors will only cause bigger, costlier, harder to debug problems further down the road. We have already seen this with HTML. Today it is far easier to develop and debug complex JavaScript and CSS on web pages by starting with well-formed, valid XHTML. There's simply less to infer about what the browser is doing with the page. Even if HTML 5 brings us to a realm where there are no cross-browser differences in object model--a state I doubt we'll see though I'd be happy to be proved wrong--we'll still be faced with the reality that the code in front of the developer's face is not the code the browser is rendering. Debugging problems with web applications and web pages will require deep knowledge of HTML error correction arcana. Tools will be developed to expose the actual object model, but these tools will not be universally available or used. The simplest, least costly approach is to pay a small cost upfront to maintain well-formedness and reject malformed documents. Hand authors would quickly learn that you have to "compile" your document before uploading it and fix any syntax errors that appear. The cost savings for hand authors in future debugging and development would be phenomenal. Sadly, for various non-contingent reasons this hasn't happened with HTML on the Web and seems unlikely to. However I see no reason to back away from well-formedness in all the other domains where it achieves such colossal benefits. Error correcting parsers would be a step backwards. Until computers become sufficiently smart to understand natural language (if they ever do), well-formedness and draconian error handling are the best tools we have for interfacing our language with theirs and avoiding costly misunderstandings at the translation boundary. -- Elliotte Rusty Harold elharo@metalab.unc.edu Refactoring HTML Just Published! http://www.amazon.com/exec/obidos/ISBN=0321503635/ref=nosim/cafeaulaitA
Received on Monday, 16 February 2009 15:31:06 UTC