Re: HTML and XML from Bijan Parsia on 2009-02-17 (www-tag@w3.org from February 2009)

From: Bijan Parsia <bparsia@cs.man.ac.uk>
Date: Tue, 17 Feb 2009 10:07:35 +0000
To: elharo@metalab.unc.edu
Cc: www-tag@w3.org
Message-Id: <DAA5A6EF-1EE1-457D-9DFA-12A7E13E1F25@cs.man.ac.uk>
I suspect further discussion will be unproductive. But I will make a  
few final points.

On 16 Feb 2009, at 23:11, Elliotte Harold wrote:

> Bijan Parsia wrote:
[snip]
>>> Consequently syntax errors rarely make it into production (except  
>>> among college students of questionable honesty).
>> Blah. I'm not sure what the point of this comment was. However, in  
>> the context, it's not very nice.
>
> Somebody--I forget who

Me.

> --brought up the issue of college students who turned in  
> syntactically incorrect programs that somehow magically compiled  
> into .class files. As a former professor myself, I'm a little  
> surprised anyone could be naive about exactly how this situation  
> arises.

There is no evidence of cheating in this context. I'm a little  
surprised you, as a former professor, would causally make such a  
comment.

The programs submitted were only .class files. When *decompiled*, it  
results in syntactically incorrect java files. Obviously, there are  
lots of possibilities here.

At this point, I'm not surprised that you would leap to this  
conclusion, and express it publicly :(

>>  From experience? I would love to see the data. I know Interlisp's  
>> DWIM facility didn't "take off", but there could be many reasons.  
>> All I could easily find on this was:
>>     http://catless.ncl.ac.uk/Risks/7.13.html#subj3
>
> AppleScript is the case I'm most familiar with,

You conflate automated error recovery/repair with a syntax with lots  
of forms. AppleScript is draconian in its error handling.

Of course, part of the issue is how much counts as an error.

[snip]
> Precise vs imprecise language we don't argue about any more.

And we aren't arguing it here. The question is not whether we accept  
imprecise input, but whether we make the (low level) handling of all  
byte streams precise. That's what HTML5 strives for. That's how I  
understand the XML5 effort.

This is, in fact, in the spirit of XML. By separating validity from  
well-formedness, XML changed the class of inputs that were handled  
draconianly. Validity against a schema isn't necessary to get a data  
model. Why should well-formedness be?

[snip]
>> One key difference between programs and data is that I often need  
>> to manipulate the data even if it has syntax errors. I usually end  
>> up doing that with text tools. How is that better than dealing  
>> with a structure that might be extracted? That's what ends up  
>> happening *anyway* a good deal of the time as I patch the errors  
>> so I can just *see* and *query* the thing.
>
> Because human intelligence is far better at these problems than  
> computers are. I don't know whether software might some day be good  
> enough to handle this. It certainly isn't now.

This is evidence that you are still confused about the matter under  
discussion. Just as an application that depends on its input  
conforming to a certain schema cannot meekly accept arbitrary well  
formed input, so too an application which depends on the data being  
wellformed (but, perhaps, described in prose) cannot blindly accept  
arbitrary byte streams as input (or at least as input on par with  
well-formed input). But many classes of application (and of user)  
need to have more elaborate handling of various classes of error. One  
way to improve the situation of such applications when using XML is  
to define the data model that results from parsing, as XML, arbitrary  
input streams.

This is surely far from any AI issue.

[snip]
>> I've not see that quick learning, even within computer science, as  
>> my first message showed.
>
> I have. I've taugght XML multiple times at a perhaps somewhat less- 
> elite university than you mentioned (or perhaps not--I'm not  
> intimately familiar with yours) and well-formedness was simply not  
> an issue or a concern

That's interesting, if weak, data. My data are also somewhat weak,  
but I'll point out that I went *into* my situation thinking that well- 
formedness wasn't an issue or concern.

> . It was a significantly lower hurdle to jump over than syntax  
> issues when I taught Java to the same students (and these were  
> students who already knew C++).

That seems unquestionable. But that doesn't mean that XML hurdle is  
negligible. One hurdle being very high does not imply that something  
else isn't a hurdle.

> There's no reason any computer science student at any level can't  
> handle this.

I don't believe I said that. I just pointed out that there are, even  
in this context where we can expect much from our audience, training  
and usability issues.

>> Also, if we expect XML to be used by broader populations in wider  
>> contexts, then this seems unrealistic.
>
> I'm not sure we do expect that, any more than we expect typical  
> computer users to write VBA macros from scratch. However those  
> power users who do write VBA macros can certainly handle XML syntax.

If the audience is confined in this way, it certainly makes a  
difference to how we evaluate usability.

>> I don't know why y'all ignore my DBLP example. It was real. I  
>> never ended up using the data, alas. I don't recall if I reported  
>> it, but, frankly, it was a clearly a significant challenge to fix.  
>> Perhaps that's just one price I must pay for people to have the  
>> colossal benefits.
>
> I ignored it because I'm not familiar with it. There's a lot of XML  
> data in the world.

It was one of my reported experiences. It seems to be a perfectly  
reasonable scenario where a well defined parse model for mal-formed  
XML would have benefited this consumer. As it was, I was left  
creating, essentially, an ad hoc parse model for that XML or giving  
up. (Or waiting for the producer to correct it.) These options all  
seem much less palatable than having a standard tool in my kit.

>> I'm not clear why one category of errors (well formedness ones)  
>> are so much worse than other levels (e.g., validity ones). They  
>> are all errors. One nice thing about XML is separating these  
>> classes of errors so that even if the document is not valid wrt  
>> the relevant schema, you can still work with it (transform it,  
>> etc.). What's so much worse about well formedness errors?
>
> Another permathread: syntax is interoperable. Semantics are not.

? Let's restrict ourselves to DTDs: There is no semantics added. Only  
further syntactic constraints. With a DTD the set of legal terms is  
restricted. With well formed XML, it is not. Where's the semantics?

> You and I can share syntax, and we can share it with 10,000 other  
> disconnected individuals. We cannot, will not, and should not  
> attempt to share semantics when we were are working on different  
> projects with different purposes and understandings.

Red herring.

>> In a standards situation there are lots of different possible  
>> costs including opportunity costs. Perhaps we'll have to live with  
>> XML as it is. Perhaps we can do better. But surely it's better to  
>> investigate carefully, rather than make rather unsupported claims  
>> with colossal confidence :)
>
> You may be new here.

Not at all. Nor am I new to XML. I did spend time at UNC as a  
graduate student in philosophy. Perhaps you know Jane Greenberg? I  
did some work with her.

> There have been demonstrably better syntaxes over the last 10+  
> years. They've gone nowhere, and attracted casual interest at best.  
> None of them offered sufficient improvements to justify the cost of  
> switching over.

Sure, but adding a parse model for mal formed XML avoids a lot of  
those problems by working closely with XML rather than seeking to  
replace it.

[snip]
>> To engage requires, at the very least, either acknowledging some  
>> common standards of evidence, or proposing some alternative ones,  
>> or critiquing the ones I've provided. That is, if we are  
>> interesting in finding stuff out.
>
> You seem to be coming at this from an academic approach that this  
> community has not historically bought into.

I prefer to have an evidence based approach. I also prefer that  
people make clear their methodology and their criteria for accepting  
evidence and arguments. I think it's a fruitful way to make progress.

> Most folks here, even those of us who come from academia, don't put  
> much stake in academic studies of programming productivity and the  
> like.

And surveys of the web? Are they equally valueless? Usability studies?

> We put a great deal of stake in what we've learned from our own  
> experience and those of our colleagues,

Well, if I accept your standards of evidence, how on earth do you  
expect to convince me of anything? We have disparate experience and  
we are not colleagues (e.g., you evidently don't treat me as such as  
you systematically dismiss my experience).

I would have thought that consensus building required something else.

> as well as what the market has accepted.
>
> The only way you're going to convince anyone of anything here is by  
> producing something better. If you have a better alternative to XML  
> as it exists today, then build it.

I don't understand why it's important to you to engage in argument if  
it matters naught to you the content or the fact of the argument. Why  
do you even bother to engage in discussion rather than just evaluting  
built things as they come alone? It is a puzzle.

> If it really is better enough, then--all arguments to the  
> contrary,--people will switch to it, and quickly. We've seen this  
> happen before, more than once. However for every new format nd  
> language that succeeded, there have  been a hundred failures or
> more. Some of the failures were interesting and we learned from  
> them. Most weren't even that.

This is undoubtably true. But irrelevant. I understand that you think  
draconian error handling is a sine qua non of XML such that if you  
add a deterministic parse model for mal formed XML it is as if you  
designed a new format from scratch. But that's surely not an  
*obvious* principle. Given your misunderstandings, as I see it, of  
what's being explored, its not, perhaps, too surprising that you  
would go into lash mode. Oh well.

Ok, these weren't a "few" :) But I shall strive to make them final.  
At least from me.

Cheers,
Bijan.
Received on Tuesday, 17 February 2009 10:07:28 UTC