Re: HTML and XML from Julian Reschke on 2009-02-16 (www-tag@w3.org from February 2009)

From: Julian Reschke <julian.reschke@gmx.de>
Date: Mon, 16 Feb 2009 21:54:31 +0100
To: Bijan Parsia <bparsia@cs.man.ac.uk>
CC: Bijan Parsia <bparsia@cs.manchester.ac.uk>, www-tag@w3.org
Message-ID: <4999D287.8010803@gmx.de>

Bijan Parsia wrote:
>> The procedure is to run the XML through an XML parser. How to invoke 
>> that parser is platform-specific.
> 
> I feel confident that if those are your instructions, that will not be 
> sufficient.

These are not instructions. And I'm not supposed to come up with them.

>> I only mentioned IE because that's something available to something 
>> like 90% of the users, out of the box.
> 
> I think you're off track. The question was, as I understood it, was of 
> the basic usability of XML such that it is warranted to required and 
> expect producers to produce only well-formed XML.
> 
> It's pretty clearly, from our discussion alone, not at all obvious that 
> XML is remotely usable for broad swaths of the population. It's unclear, 
> of course, whether heroic parsing would help. But I've presented a real 
> case where it would have.

Could you please define "population"? The full population? Software 
developers?

I'll be the first one to agree that XML is not something for everybody, 
but I thought we were talking about CS students?

>>> using a browser in a way that many (most) users of browsers would not 
>>> expect to use it or a rather obscure tool. Furthermore, your 
>>> instructions are incomplete, as I'm pretty sure that a .txt suffix on 
>>> the file name for this content:
>>> """<test>
>>>    <foo>dfdf<b>fd</foo></b>
>>> </test ref="dfsdf>"""
>>> will load it without giving any errors. (Checked, so it did.) And if 
>>> I serve it with the right mime type, even the .xml won't help.
>>
>> Yes. So? Works as designed. Teach people how to do it right.
> 
> I see that you aren't interested in investigating the usability of XML. 
> Oh well.

Yes, "oh well".

The fact that if you feed text/plain into IE causes it to process it as 
text/plain is a feature.

If you think this is a problem, tell the students not to.

>>> I reiterate that it is, prima facie, non-trivial in many computing 
>>> environments to produce well formed XML.
>>
>> It may not be trivial to produce it, but it *is* trivial to test it.
> 
> My example above shows that that's false. Furthermore, testing doesn't 
> mean that producing it is easy. If correcting is too difficult people 
> will give up and either publish what they have or don't publish.

Testing is easy for anybody who really wants to. And testing will tell 
you whether it's well-formed.

Now interpreting well-formedness error messages may be tricky. In case 
of obscure messages, the problem usually can be managed by making the 
input smaller. Just as in any other computer language.

>>> ...
>>> In fact, the problems tended to occur in elements I didn't *care* 
>>> about. So, in order to extract some data, I have to fix all the 
>>> well-formedness errors *then* use my XQuery?
>>> ...
>>
>> Actually, the producer is supposed to fix the bug, not the consumer :-)
> 
> Thus, I should leave that data inaccessible to me until the producer 
> fixes it?

Depends on your priorities.

If you decide to fix the problem yourself, how can you be sure that your 
interpretation of the data is correct?

> ...

Best regards, Julian

Received on Monday, 16 February 2009 21:07:13 UTC