W3C home > Mailing lists > Public > public-xml-er@w3.org > February 2012

Re: Draft

From: Innovimax W3C <innovimax+w3c@gmail.com>
Date: Thu, 23 Feb 2012 00:30:12 +0100
Message-ID: <CAAK2GfEnFCPVt6T1us7EU2D+AtRheoR-m-FEJ+H-FRKwsQpTyQ@mail.gmail.com>
To: Jeni Tennison <jeni@jenitennison.com>
Cc: Norman Walsh <ndw@nwalsh.com>, W3C XML-ER Community Group <public-xml-er@w3.org>
On Wed, Feb 22, 2012 at 2:51 PM, Jeni Tennison <jeni@jenitennison.com> wrote:
> On 21 Feb 2012, at 16:30, Norman Walsh wrote:
>> The things that are not XML are well defined. We get to decide what
>> things are not XML-ER.
>> I'm not sure what the right answer is. Some things seem clearly not to
>> be XML-ER. For example, if I feed a JPEG image to the XML-ER parser,
>> it's hard to imagine any value coming from any "document" produced by
>> parsing that "successfully".
>> OTOH, a plain text document is less clearly "not XML-ER" to me. This is
>> one place where a schema-agnostic parser is at a disadvantage. If you hand
>> The quick brown fox
>> to an HTML parser, it can manufacture a bunch of wrapper elements.
>> I was just thinking about this the other day. I wonder if XML-ER
>> "documents" that don't have a clear root element should get one:
>> <er:document xmlns:er="whateverwedecide">The quick brown fox</er:document>
> I'd suggest that in cases where the input really doesn't look anything like XML (ie whose first non-whitespace character isn't a <), an XML-ER parser does whatever it is that HTML does. HTML is as good a vocabulary as any for representing such content and the rules are already defined and implemented, particularly in the key places where we expect XML-ER to be used.
> That would effectively limit the scope of what we have to define for XML-ER parsing, which is a good thing. The side-effect of course is that something like:
> I forgot my document element but I'll still
> have a <table><p>containing a paragraph!</p>
> <tr><td>just because I can</td></tr></table>
> would lead to all sorts of strange HTML-specific fix-up taking place, but any documents that are that badly munged are almost bound to actually be HTML anyway :)

Really interesting idea...

But one nasty consequence is that XML-ER parser will have to contain
an HTML5 parser...



Innovimax SARL
Consulting, Training & XML Development
9, impasse des Orteaux
75020 Paris
Tel : +33 9 52 475787
Fax : +33 1 4356 1746
RCS Paris 488.018.631
SARL au capital de 10.000 
Received on Wednesday, 22 February 2012 23:30:40 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 6 January 2015 19:47:26 UTC