- From: Kingsley Idehen <kidehen@openlinksw.com>
- Date: Mon, 21 Jan 2013 18:36:18 -0500
- To: www-tag@w3.org
- Message-ID: <50FDD0F2.7070200@openlinksw.com>
On 1/21/13 5:16 PM, David Sheets wrote: > On Mon, Jan 21, 2013 at 1:25 PM, Kingsley Idehen <kidehen@openlinksw.com> wrote: >> On 1/21/13 4:15 PM, David Sheets wrote: >>> On Mon, Jan 21, 2013 at 11:47 AM, Kingsley Idehen >>> <kidehen@openlinksw.com> wrote: >>>> On 1/21/13 2:19 PM, Melvin Carvalho wrote: >>>> >>>> On 21 January 2013 20:13, Anne van Kesteren <annevk@annevk.nl> wrote: >>>>> On Mon, Jan 21, 2013 at 7:24 PM, Kingsley Idehen >>>>> <kidehen@openlinksw.com> >>>>> wrote: >>>>>> Please correct me if my characterization is wrong, but it appears to me >>>>>> that >>>>>> this entire affair is about content-type (mime type) squatting i.e., >>>>>> trying >>>>>> to squeeze (X)HTML into content-type: text/html. If this is true, why >>>>>> on >>>>>> earth would such an endeavor be encouraged by the W3C or its TAG? >>> How is the definition of *a valid subset of text/html* squatting? >> >> Is XHTML now a subset of HTML? Is (X)HTML a subset of HTML? As I stated, as >> part of my open comments, what am I missing in my characterization? > It's not clear to me that they have that relation. There does exist a > subset of HTML that is also XHTML and vice versa. > >>>>> Maybe because XML is listed quite prominently under "What is Web >>>>> architecture?" in http://www.w3.org/2004/10/27-tag-charter.html though >>>>> I would consider that particular part of the charter misguided. (It's >>>>> also not at all practiced these days.) >>> This is plainly false. Existence of new XML vocabularies demonstrates >>> practice. It cannot also be true that it is "not at all practiced >>> these days". >>> >>>> This is a good point, imho. In 2004 it was perhaps reasonable to make a >>>> 'bet' on XML. However, favouring any one particular serialization >>>> potentially lacks future proofing. However, favouring the principles >>>> behind >>>> XML, such as namespacing etc., makes more sense. >>> Fragmentation is not future-proof. >>> >>>> Wikipedia has a reasonably nice write up on this topic: >>>> >>>> http://en.wikipedia.org/wiki/Comparison_of_data_serialization_formats >>>> >>>>> >>>>> -- >>>>> http://annevankesteren.nl/ >>>>> >>>> At this juncture though, my main question is about XHTML or (X)HTML (the >>>> polyglot) being squeezed into content-type designation: text/html. In >>>> reality we have two content types with distinct characteristics which >>>> thereby entails two distinct content-types: text/html (for HTML) and >>>> application/xhtml+xml (for XHTML). >>>> >>>> Put differently, there is no content-type for the (X)HTML polyglot. Thus, >>>> we >>>> have the struggle right now which is all about trying to make text/html >>>> the >>>> designated content-type for the aforementioned polyglot. >>> I was under the impression that an explicit goal of standardizing the >>> HTML5 parser was so that HTML consumers and producers could rely on a >>> de jure interpretation of nonsensical markup. While many consider >>> XML's restrictions nonsensical, it is prima facie absurd that >>> champions of HTML5's apologetic parser refuse to consider the subset >>> of HTML5 that is also valid XHTML5 as clearly important to a >>> population of authors. >> So this is the key point of contention i.e., XHTML5 (unlike other XHTML >> incarnations) is a genuine subset of HTML. > I don't believe they have this relation. There is a set of documents > that satisfies both standards, however. > >>> >From my perspective, anti-polyglot proponents advocate global >>> text/html interpretation of nearly everything *except* XHTML. >> Can you point me to an example? I ask primarily for clarity. > I'm not sure which assertion you'd like an example for so I've made > some guesses. > > <http://www.whatwg.org/specs/web-apps/current-work/multipage/parsing.html#parsing> > describes a very lenient HTML parser which defines an interpretation > for many strange text/html documents ("Hello, <I><b>world</i>!"). > > Advocacy for this lenient parser has been prevalent for several years. > The same standard provides guidance on writing "good" HTML which does > not take advantage of all of the HTML parser's quirks. This is a > subset of HTML. > > For evidence of resistance to the standard definition of the subset of > HTML that is also XHTML with the same meaning, you need only look to > this and previous threads on this topic. > >>> XHTML is >>> stricter than HTML and polyglot serializations *should* exist for any >>> DOM (at least one would hope, what with the complexity burden of a >>> fully conformant HTML parser). >>> >>> Are there legitimate technical architecture objections to specifying >>> the set intersection of XHTML and HTML expressions? >> >> Potentially, once you attempt to write parsers for HTML5 resources that >> include Microdata and/or RDFa structured data islands. > How does the definition of a mutually compatible subset complicate > HTML5 parsers which include microdata/rdfa? By definition, the > polyglot subset must work in existing HTML5 parsers. If HTML5 semantic > markup cannot share syntax with XHTML, the document cannot be > serialized in a polyglot fashion. What have I missed? > >>> I believe that there are many who would be interested in such >>> guidelines who are typically underrepresented in these discussions. >>> >>> I am genuinely confused by arguments which appear to encourage liberal >>> emission and deride conservative emission. Are web standards no longer >>> concerned with robustness? HTML's new parser specification appears to >>> disagree... >> Once there's clarification on the issue of HTML and XHTML5 subset, the >> problems will become clear. All you have to do is attempt to use or write a >> parser for structured data (MicroData, Microformats, RDFa) embedded in an >> HTML5 document . > I began just this task and have since put it on hold due to the > complexity of the HTML5 parsing algorithm and personal time > constraints. If a document is both valid HTML and valid XHTML5, how is > handling this content harder than just handling HTML5 content? > > There may certainly be important classes of documents for which no > polyglot serialization is possible. Unique HTML features, unique XHTML > features, and the HTML/XHTML overlap are all important for author > education. Aesthetically, I would like as few special cases for each > HTML and XHTML and greater syntax compatibility; but with a faction > advocating extensive bugwards compatibility and disinterest in > unification, I'm not holding my breath. > >> In my experience, undue burden is being pushed on the developers of parsers. > Which parser developers does a polyglot spec burden? Polyglot should > be parsable by both HTML5 and XML parsers without modification. > > > > Yes, but HTML5 parser developers think in terms of HTML. They don't think about XML in any shape or form. Net effect, XML related rules are kicked to the curb. -- Regards, Kingsley Idehen Founder & CEO OpenLink Software Company Web: http://www.openlinksw.com Personal Weblog: http://www.openlinksw.com/blog/~kidehen Twitter/Identi.ca handle: @kidehen Google+ Profile: https://plus.google.com/112399767740508618350/about LinkedIn Profile: http://www.linkedin.com/in/kidehen
Attachments
- application/pkcs7-signature attachment: S/MIME Cryptographic Signature
Received on Monday, 21 January 2013 23:36:41 UTC