- From: Takuki Kamiya <tkamiya@us.fujitsu.com>
- Date: Mon, 30 Jan 2012 18:26:33 -0800
- To: Robin Berjon <robin@berjon.com>, "public-exi@w3.org" <public-exi@w3.org>
Hi Robin, Thanks for the inquiry. That is indeed a good question! In response to your question, the WG had a conversation to collect the experiences among the WG members. Described below is the epitome of what we noted in the conversation. There is an issue in supporting HTML on the encoder (i.e. server) side that stems either from HTML's intrinsic difference from XML or from in some cases markup errors in HTML documents. For example, HTML allows for certain void elements, attributes without values or quotations around values. They are all legitimate HTML, nontheless do not parse well with XML parsers. In other cases, there are erraneous HTML documents such as ones that contain elements that do not nest correctly balanced. EXI encoders that operate on documents (i.e. files) find it difficult to transform the input HTML document into an EXI stream when plain XML parsers are used to parse a document that exhibits any one or more of the HTML's quirks described above. Otherwise, when the document parses with XML parsers without errors as are the cases with valid polyglot [1] documents, EXI encoders are able to process the HTML document just as if it were an XML document. It does not appear to be unusual for HTML documents served for mobile devices to be consistently parsable by XML parsers. This may be related to what is suggested in the Mobile Web Best Practices document [2]. The WG noted that any HTML document including the one with certain errors can be transformed into a model in a way consistent across browser implementations, employing the rules defined by HTML 5. It is the expectation that EXI can be universally applied to HTML documents when HTML parsers that adequately implement the rules are used by EXI encoders. EXI has its registered content-coding tag "exi" that is available for use. Conceptually, you may well be able to consider the combination of the transformation (i.e. tidying-up HTML) and the transmogrification (i.e. generating EXI) collectively as one operation that represents the content-coding "exi" over HTML documents. [1] http://www.w3.org/TR/html-polyglot/ [2] http://www.w3.org/TR/mobile-bp/ Thanks! -taki -----Original Message----- From: Robin Berjon [mailto:robin@berjon.com] Sent: Monday, January 16, 2012 2:48 AM To: public-exi@w3.org Subject: EXI on HTML Hi all! I don't know if you're aware of this, but there is currently a W3C task force that's looking at HTML/XML reconciliation[0] and that is getting close to publishing a document[1] about some aspects of the problem. One topic that has surfaced a few times already is the applicability of EXI to HTML, despite the X in its name. Arguing in the abstract that it's possible (at least for a class of documents) is both true and unconvincing, so I was wondering if there'd be anyone here willing to share experience with this there? I don't think that the TF is looking for a full report on the issue, but something along the lines of "We tried it, it works (or not) except in this or that case, we had to apply this magic here to work around that problem, etc." would likely prove quite illuminating. Equally, "we looked into it and it turns out to be a daft idea" would be helpful if only in putting the matter to rest. Thanks for any input! [0] http://lists.w3.org/Archives/Public/public-html-xml/ [1] http://www.w3.org/2010/html-xml/snapshot/report.html -- Robin Berjon - http://berjon.com/ - @robinberjon
Received on Tuesday, 31 January 2012 02:27:24 UTC