- From: Henri Sivonen <hsivonen@iki.fi>
- Date: Tue, 10 Jul 2007 08:49:46 +0300
- To: Karl Dubost <karl@w3.org>
- Cc: HTMLWG WG <public-html@w3.org>
On Jul 10, 2007, at 04:47, Karl Dubost wrote: > In a message in another thread, Henri said: > http://www.w3.org/mid/05FFFAC3-F914-451A-B2A7-BBEAC81A2537@iki.fi > > Le 9 juil. 2007 à 17:04, Henri Sivonen a écrit : >> An HTML5 parser is a piece of software that implements the section >> of the spec titled "Parsing HTML documents". >> http://www.w3.org/html/wg/html5/#parsing > > Then following links through the spec, it is not obvious where to > find the right information. What do you mean when you say "right information"? The section I linked to defines the parsing algorithm for HTML5 (i.e. text/html as opposed to XHTML5 / application/xhtml+xml) documents. > "HTML Document" points to the following definition: > > Document objects are assumed to be XML documents > unless they are flagged as being HTML documents when > they are created. Whether a document is an HTML > document or an XML document affects the behaviour of > certain APIs, as well as a few CSS rendering rules. > [CSS21] > -- http://www.w3.org/html/wg/html5/#html- > Thu, 28 Jun 2007 21:11:41 GMT > > The first thing which might lead to confusion is the "flagged as > being HTML documents". Note that the part you quoted talks about Document objects--that is, objects that implement the Document DOM interface. It is not talking about documents in general (e.g. as byte streams). Is this not clear from the styling of the <code> element and from the context? > I have looked for what is an HTML document, and then I got an HTML > document is an XML document except if it is an HTML document. In this context, it only talks about telling apart two types of objects that implement the Document interface. Moreover, it is assumed that there are only two kinds of objects that implement the Document interface. > Maybe, we should defined what we mean by flagged … when they are > created. The note following the paragraph you quotes makes it even clearer what flagged means: | A Document object created by the createDocument() API on the | DOMImplementation object is initially an XML document, but | can be made into an HTML document by calling document.open() on it. The flagging happens in the first sentence under http://www.w3.org/ html/wg/html5/#page-load See also step #8 under http://www.w3.org/html/wg/html5/#controlling > 1. Created in the DOM? > 2. Created on the filesystem? > 3. Created in the Browser memory? > > I have the feeling that most people will read 2. Which isn't what it means because you don't create objects that implement the Document interface in the file system. The spec makes it obvious that it is talking about objects in the memory of the browser and that a flag is a bit in the memory. > But then there is an issue. What do we do with files which are > accessed through the local filesystem. Usually ".html", ".htm" > means for the browser, use the HTML parser. Though they are many > cases where people might open a file with a PHP extension for example. The draft is optimized for the Web and, hence, specifies things in detail for HTTP. If the bytes don't arrive via HTTP, there needs to be a mapping of some kind that gives the UA information whether the document arrived as if with Content-Type text/html or as if with an XML Content-Type. On common systems, it is reasonable to map .html to text/html, .xhtml to application/xhtml+xml and .xml to application/ xml when reading local files. > The data can come from the local filesystem as well. There is > something which is called the "content model flag" related to the > input stream. The content model flag is internal to the parsing algorithm for text/ html. I'm curious: Why you mention it? > * When an input stream is actually flagged as being HTML? The kind of flagging you quoted is about objects that implement the Document interface--not about streams. However, an input stream is treated as HTML if it is labeled with the text/html Content-Type. > * How do we flag an input stream as being an HTML document? > * HTTP text/html Yes, but in that case "HTML Document" doesn't mean what it means in the part of the spec you quoted. > * local filesystem? System-specific mapping to type information equivalent to HTTP. > Related question: > A document sent with application/xhtml+xml must be treated by an > XML parser. > What an HTML parser does when receiving such a document. ignores > it? (in the case I have built an application which has only an HTML > parser and not an XML Parser.) It is the responsibility of the application not to pass a byte stream that is not labeled as text/html to an HTML parser. If the application doesn't have an XML parser (very unlikely), it should say it isn't prepared to process XML. (Just like it would do with a PDF or Word file if it didn't have components for reading those.) -- Henri Sivonen hsivonen@iki.fi http://hsivonen.iki.fi/
Received on Tuesday, 10 July 2007 05:49:58 UTC