W3C home > Mailing lists > Public > whatwg@whatwg.org > November 2006

[whatwg] The problems with namespaces in text/html

From: Henri Sivonen <hsivonen@iki.fi>
Date: Sun, 5 Nov 2006 21:24:14 +0200
Message-ID: <4F92196D-1868-4873-AD05-584FD9D716D4@iki.fi>
On Nov 5, 2006, at 16:39, Elliotte Harold wrote:

> Henri Sivonen wrote:
>
>>> Is there anything else that stops every HTML5 document from being  
>>> a well-formed XML document?
>> Case-insensitivity and empty elements for example.
>
> These would stop some documents from being well-formed, not all.  
> I'm sure you're allowed to use all lower case. can you use empty- 
> element tags if you wish? Or must it be <br> and not <br /> or  
> <br></br>?

It must be <br> to be conforming.

You are still stuck with the syntactic similarity of XML and HTML5.  
You wouldn't use an XML parser to parse RELAX NG Compact Syntax,  
would you?

Also, even if a subset of HTML5 documents happened to be parseable as  
XML, it doesn't help unless the authors whose documents you consume  
happen to only produce that subset. If your app insists on using an  
XML parser for text/html content, it isn't very useful for processing  
the stuff found on the Web.

>> But in any case, even if an HTML5 byte stream happens to be  
>> parseable as XML 1.0, you get the wrong infoset if you use an XML  
>> parser instead of an HTML5 parser.
>
> Walter, we need you!
>
> There is no right infoset. There is no wrong infoset.

Given this HTML document:
<!DOCTYPE html><HTML><title>Foo</Title><p>bar</html>
a parser should convey to the application a tree that has the  
following features:
  * There is a root element node with the local name "html" in the  
"http://www.w3.org/1999/xhtml" namespace.
  * The root element node has two child nodes.
  * The root element node has an element node with the local name  
"head" in the "http://www.w3.org/1999/xhtml" namespace as its first  
child.
  * The root element node has an element node with the local name  
"body" in the "http://www.w3.org/1999/xhtml" namespace as its last  
child.
  * The first child of the root element has a single child node,  
which is an element node with the local name "title" in the "http:// 
www.w3.org/1999/xhtml" namespace.
  * The first child of the root element has a single child node,  
which is an element node with the local name "p" in the "http:// 
www.w3.org/1999/xhtml" namespace.
  * The element with the local name "title" in the "http://www.w3.org/ 
1999/xhtml" namespace has a single child node, which is a text node  
with the value "Foo".
  * The element with the local name "p" in the "http://www.w3.org/ 
1999/xhtml" namespace has a single child node, which is a text node  
with the value "bar".

If your parser reports something else, it is not suitable for parsing  
HTML5 and is *wrong* per spec.

> the infoset I derive from the document is my concern, not yours.

You want certain stuff to be in a particular namespace. From this  
thread, it seems that you want to make it my problem to produce  
particular namespace declaration syntax--instead of making it your  
concern to use an HTML5 parser.

-- 
Henri Sivonen
hsivonen at iki.fi
http://hsivonen.iki.fi/
Received on Sunday, 5 November 2006 11:24:14 UTC

This archive was generated by hypermail 2.4.0 : Wednesday, 22 January 2020 16:58:49 UTC