Re: HTML and XML from Bijan Parsia on 2009-02-16 (www-tag@w3.org from February 2009)

From: Bijan Parsia <bparsia@cs.man.ac.uk>
Date: Mon, 16 Feb 2009 19:33:20 +0000
To: Julian Reschke <julian.reschke@gmx.de>
Cc: Bijan Parsia <bparsia@cs.manchester.ac.uk>, www-tag@w3.org
Message-Id: <461F81B7-9D36-4C14-9ED7-29C65DAB643E@cs.man.ac.uk>
On 16 Feb 2009, at 15:57, Julian Reschke wrote:

> Bijan Parsia wrote:
>> On 16 Feb 2009, at 14:34, Julian Reschke wrote:
>>> Bijan Parsia wrote:
>>>> On 16 Feb 2009, at 12:06, Julian Reschke wrote:
>>>> ...
>>>> DTDs with errors in major coursework in the presence of oXygen  
>>>> and pretty extensive training is within the past few weeks.
>>>> ...
>>>
>>> Were the students told how to test their submissions?
>> Have you ever used oXygen? The testing is built into the editing.
>
> No, I haven't. So they didn't use it, apparently?

I don't know. Perhaps they used it and it got garbled in transport.  
Perhaps they used it and didn't recognize or see the red squiggles.  
oXygen doesn't prevent you from saving malformed xml (thank goodness!)

This just emphasizes the difficulty, of course.

>> I notice you didn't say whether you found it easier to believe  
>> that my graduate students earlier had trouble producing well  
>> formed XML.
>
> I have no opinion on that specifically.
[snip]

OK.

>> Thanks for giving additional evidence in support of my point. You  
>> did not give, in your reply, a reliable procedure for testing XML  
>> well formedness for many people. I'll not that your instructions  
>> involve
>
> The procedure is to run the XML through an XML parser. How to  
> invoke that parser is platform-specific.

I feel confident that if those are your instructions, that will not  
be sufficient.

> I only mentioned IE because that's something available to something  
> like 90% of the users, out of the box.

I think you're off track. The question was, as I understood it, was  
of the basic usability of XML such that it is warranted to required  
and expect producers to produce only well-formed XML.

It's pretty clearly, from our discussion alone, not at all obvious  
that XML is remotely usable for broad swaths of the population. It's  
unclear, of course, whether heroic parsing would help. But I've  
presented a real case where it would have.

>> using a browser in a way that many (most) users of browsers would  
>> not expect to use it or a rather obscure tool. Furthermore, your  
>> instructions are incomplete, as I'm pretty sure that a .txt suffix  
>> on the file name for this content:
>> """<test>
>>    <foo>dfdf<b>fd</foo></b>
>> </test ref="dfsdf>"""
>> will load it without giving any errors. (Checked, so it did.) And  
>> if I serve it with the right mime type, even the .xml won't help.
>
> Yes. So? Works as designed. Teach people how to do it right.

I see that you aren't interested in investigating the usability of  
XML. Oh well.

>> I reiterate that it is, prima facie, non-trivial in many computing  
>> environments to produce well formed XML.
>
> It may not be trivial to produce it, but it *is* trivial to test it.

My example above shows that that's false. Furthermore, testing  
doesn't mean that producing it is easy. If correcting is too  
difficult people will give up and either publish what they have or  
don't publish.

>> ...
>>> Users authoring docbook or XSLTs do not seem have trouble with it.
>> Those are pretty expert audience, esp. the XSLT.
>> ...
>
> I'd expect there to be more XSLT users than DocBook users. Anyway.

I would not be surprised if the XSLT using population was in good  
circumstances to produce well formed XSLT. It's still a pretty  
specialist circumstance given that the domain of XSLT is...XML. That  
reduces the cognitive distance between the domain and the format  
(which is a basic justification for XSLT having and XML syntax).  
Tough to generalize, though.

>> ...
>> In fact, the problems tended to occur in elements I didn't *care*  
>> about. So, in order to extract some data, I have to fix all the  
>> well-formedness errors *then* use my XQuery?
>> ...
>
> Actually, the producer is supposed to fix the bug, not the  
> consumer :-)

Thus, I should leave that data inaccessible to me until the producer  
fixes it?

How does that make found XML more usable?

In any case, nothing you've said, afaict, with the possible except of  
the XSLT/Docbook observation purports to dispute my difficulty point,  
which is all I'm after at this juncture. It would be very interesting  
to study those populations.

Cheers,
Bijan.
Received on Monday, 16 February 2009 19:29:48 UTC