Re: HTML and XML from Elliotte Harold on 2009-02-16 (www-tag@w3.org from February 2009)

From: Elliotte Harold <elharo@metalab.unc.edu>
Date: Mon, 16 Feb 2009 15:11:06 -0800
To: Bijan Parsia <bparsia@cs.man.ac.uk>
Cc: www-tag@w3.org
Message-ID: <4999F28A.70300@metalab.unc.edu>
Bijan Parsia wrote:

> I'll also point out that programming in these languages is a specialist 
> activity with high rewards. Even then, it would be interesting to see 
> how broken failed projects are and how much time goes into syntax 
> management:

Not much, in the case of experienced programmers. I do think properly 
designed syntax is important. I do not think that makes non-draconian 
error handling a good idea. Nor do I find XML syntax to be particularly 
onerous compared to, for example, Haskell's, Ruby's, or C++'s. The basic 
syntax of angle bracketed tags, quoted attribute values, and 
start-tag-end-tag matches seems sound. I don't see anything to be gained 
by changing it at this point.

> 
> The cognitive overhead of well formedness can be negligible or severe 
> when creating XML documents. Well formedness, of course, is rarely the 
> *point* or the *interesting* set of constraints. It seems quite possible 
> that it's more difficult than it needs to be.

I don't think so. We've tried some alternatives in the SGML 
space--omitted end-tags for example. They didn't carry their weight.

>> Consequently syntax errors rarely make it into production (except 
>> among college students of questionable honesty).
> 
> Blah. I'm not sure what the point of this comment was. However, in the 
> context, it's not very nice.

Somebody--I forget who--brought up the issue of college students who 
turned in syntactically incorrect programs that somehow magically 
compiled into .class files. As a former professor myself, I'm a little 
surprised anyone could be naive about exactly how this situation arises.

>  From experience? I would love to see the data. I know Interlisp's DWIM 
> facility didn't "take off", but there could be many reasons. All I could 
> easily find on this was:
>     http://catless.ncl.ac.uk/Risks/7.13.html#subj3

AppleScript is the case I'm most familiar with, but there have been 
others. DWIM I'm not familiar with. But there's good reason the 
programming languages landscape is what it is. Functional vs. imperative 
we argue about. Compiled vs. interpreted we argue about. Garbage 
collected or manually managed we argue about. Precise vs imprecise 
language we don't argue about any more. The answer's just too obvious. 
The only way you'd convince anyone otherwise would be by producing an 
imprecise, error-corrected language that worked. Short of that, don;t 
expect anyone to waste much time on this any more.

>> Fixing syntax errors at the compiler level leads to far more serious, 
>> far more costly, and far harder to debug semantic errors down the line.
> 
> Really? I just don't know. Some interpreted language environments miss 
> lots of syntax errors until you hit that line of code during a run.

True. The earlier errors are exposed the better. Not all programming 
languages are equal in this regard.


>> Draconian error handling leads to fewer mistakes where the person 
>> sitting at the keyboard meant one thing but typed another.
> 
> I've no idea, really.
> 
>> Syntax errors are one of the prices developers have to pay in order to 
>> produce reliable, maintainable software. Languages have been developed 
>> that attempt, to grater or lesser degrees, to avoid the possibility of 
>> syntax error. They have uniformly failed.
> 
> Of course, we're not talking about avoiding the possibility of syntax 
> error, but of how to cope with error.


One way and one way only: fail fast, early, and hard. Reveal the error 
at the first opportunity. Do not allow the error to build up steam and 
cause bigger problems.


> One key difference between programs and data is that I often need to 
> manipulate the data even if it has syntax errors. I usually end up doing 
> that with text tools. How is that better than dealing with a structure 
> that might be extracted? That's what ends up happening *anyway* a good 
> deal of the time as I patch the errors so I can just *see* and *query* 
> the thing.

Because human intelligence is far better at these problems than 
computers are. I don't know whether software might some day be good 
enough to handle this. It certainly isn't now.

> Isn't the question not which is easier to program against. I totally 
> prefer well formed XML etc. etc. I thought the issue was how best to 
> cope with problem data and the prevalence of that problem data. The 
> claim has been advanced that people (some people) can always, more or 
> less, with relative ease, produce well formed XML and transport it in 
> various ways to consumers over the Web.
> 
> This just doesn't seem to be true.

I disagree. A poor environment that has been excessively forgiving of 
bad syntax has provided non incentive for document producers to create 
well-formed documents. Change the environment, and the producers will 
change their output. Yes, this is something of a chicken-and-egg 
problem. I do not know if it will be resolved. I am certain it could be 
resolved given the will in any one of the right organizations.

> But given the reality of invalid HTML5 and non-well-formed XML...how do 
> we minimize the cost of the errors? How do we distribute the costs where 
> they can be effectively borne?

That's another permathread. See the archives of the whatwg group.

> I've not see that quick learning, even within computer science, as my 
> first message showed.

I have. I've taugght XML multiple times at a perhaps somewhat less-elite 
university than you mentioned (or perhaps not--I'm not intimately 
familiar with yours) and well-formedness was simply not an issue or a 
concern. It was a significantly lower hurdle to jump over than syntax 
issues when I taught Java to the same students (and these were students 
who already knew C++). There's no reason any computer science student at 
any level can't handle this.

> Also, if we expect XML to be used by broader populations in wider 
> contexts, then this seems unrealistic.

I'm not sure we do expect that, any more than we expect typical computer 
users to write VBA macros from scratch. However those power users who do 
write VBA macros can certainly handle XML syntax.

> I don't know why y'all ignore my DBLP example. It was real. I never 
> ended up using the data, alas. I don't recall if I reported it, but, 
> frankly, it was a clearly a significant challenge to fix. Perhaps that's 
> just one price I must pay for people to have the colossal benefits.

I ignored it because I'm not familiar with it. There's a lot of XML data 
in the world.

> I'm not clear why one category of errors (well formedness ones) are so 
> much worse than other levels (e.g., validity ones). They are all errors. 
> One nice thing about XML is separating these classes of errors so that 
> even if the document is not valid wrt the relevant schema, you can still 
> work with it (transform it, etc.). What's so much worse about well 
> formedness errors?

Another permathread: syntax is interoperable. Semantics are not. You and 
I can share syntax, and we can share it with 10,000 other disconnected 
individuals. We cannot, will not, and should not attempt to share 
semantics when we were are working on different projects with different 
purposes and understandings.

> In a standards situation there are lots of different possible costs 
> including opportunity costs. Perhaps we'll have to live with XML as it 
> is. Perhaps we can do better. But surely it's better to investigate 
> carefully, rather than make rather unsupported claims with colossal 
> confidence :)

You may be new here. There have been demonstrably better syntaxes over 
the last 10+ years. They've gone nowhere, and attracted casual interest 
at best. None of them offered sufficient improvements to justify the 
cost of switching over. (Maybe that's too harsh: I can think of one that 
clearly has gone somewhere though it's certainly not a 1-1 mapping with 
XML or a full replacement for it.)

> To engage requires, at the very least, either acknowledging some common 
> standards of evidence, or proposing some alternative ones, or critiquing 
> the ones I've provided. That is, if we are interesting in finding stuff 
> out.

You seem to be coming at this from an academic approach that this 
community has not historically bought into. Most folks here, even those 
of us who come from academia, don't put much stake in academic studies 
of programming productivity and the like. We put a great deal of stake 
in what we've learned from our own experience and those of our 
colleagues, as well as what the market has accepted.

The only way you're going to convince anyone of anything here is by 
producing something better. If you have a better alternative to XML as 
it exists today, then build it. If it really is better enough, then--all 
arguments to the contrary,--people will switch to it, and quickly. We've 
seen this happen before, more than once. However for every new format nd 
language that succeeded, there have  been a hundred failures or more. 
Some of the failures were interesting and we learned from them. Most 
weren't even that.

-- 
Elliotte Rusty Harold  elharo@metalab.unc.edu
Refactoring HTML Just Published!
http://www.amazon.com/exec/obidos/ISBN=0321503635/ref=nosim/cafeaulaitA
Received on Monday, 16 February 2009 23:11:43 UTC