W3C home > Mailing lists > Public > public-html-xml@w3.org > August 2011

Re: Suggested revised text for HTML/XML report intro

From: Noah Mendelsohn <nrm@arcanedomain.com>
Date: Wed, 17 Aug 2011 10:52:37 -0400
Message-ID: <4E4BD5B5.3030502@arcanedomain.com>
To: John Cowan <cowan@mercury.ccil.org>
CC: Anne van Kesteren <annevk@opera.com>, "public-html-xml@w3.org" <public-html-xml@w3.org>, Larry Masinter <LMM@acm.org>


On 8/16/2011 2:26 PM, John Cowan wrote:
> Noah Mendelsohn scripsit:
>
>> >  I'm missing something. This seems to say: "We may be looking at two or
>> >  more approaches either of which would be valuable to the community; since
>> >  there's no obvious reason (yet) to pick one over the other, let's not
>> >  advocate either."
> I am saying that.  But I am also saying that there is no reason to suppose
> that a uniform answer is suitable for all the different XML document formats
> present and future.  I am also saying that a uniform answer would promote
> specification bloat.

Seems to me that those tradeoffs should be investigated a bit before we 
punt. If it turns out that either approach would indeed work well to solve 
a lot of problems for users, then there may well be value on picking one 
and running with it. That will get everyone interoperating with a common 
set of formats and processing rules. If we find instead that either 
solution would be a force fit for too many cases, then of course we should 
be hesitant about endorsing one or the other.

David Carlisle wrote:

> and non well formed input produces
> some document.

I'm sympathetic to the rest of your note, but I think this sets the bar a 
bit too low.  In principle, it allows an arbitrary output unrelated to the 
input, though I know you don't intend quite that.

The document produced by the parse of the non well formed input will likely 
be used for something downstream. I think it's highly desirable that the 
mapping attempt to preserve within reason parts of the document that appear 
well formed, and that some attention be given to mapping other parts of the 
documents in ways that allow synchronization to be re-established after 
passing the point of error in at least some common cases.  I believe that's 
roughly what the HTML5 processing rules are designed to do, though perhaps 
with some bias toward details of that specific vocabulary (I haven't really 
studied the fixups in detail.)

FWIW: I agree that none of this should be defined in terms of some 
particular schema language like XSD; I'm less sure whether, in designing 
the general rules for XML, it might not be worth considering particular use 
cases (I.e. likely errors) from XHTML to motivate the design of the 
non-wellformed input to fixed up output mapping.

Noah


Noah
Received on Wednesday, 17 August 2011 14:53:06 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Wednesday, 17 August 2011 14:53:07 GMT