W3C home > Mailing lists > Public > public-html-comments@w3.org > August 2009

Re: [HTML5] 2.8 Character encodings

From: Ian Hickson <ian@hixie.ch>
Date: Thu, 27 Aug 2009 23:40:35 +0000 (UTC)
To: "Dr. Olaf Hoffmann" <Dr.O.Hoffmann@gmx.de>
Cc: public-html-comments@w3.org
Message-ID: <Pine.LNX.4.62.0908272034160.13789@hixie.dreamhostps.com>
On Sun, 16 Aug 2009, Dr. Olaf Hoffmann wrote:
> Ian Hickson:
> > On Wed, 12 Aug 2009, Dr. Olaf Hoffmann wrote:
> > > The meaning of some elements is different in 'HTML5' as well or is 
> > > defined in a more restrictive way, what excludes some use cases 
> > > possible in HTML4.
> >
> > Yes, but in practice that's not an issue since HTML5 describes how 
> > HTML4 UAs actually did things.
> User agents present the elements somehow, often this does not directly
> imply a meaning.

I agree, only the spec can imply a meaning.

> And if we take (again, I already discussed this with Anne) the sample
> of the element small, the presentation implicates no specific meaning,
> what is ok for HTML4, because the definition does not imply a specific
> meaning either. The audience has to derive this from relations to the
> content around it. 'HTML5' defines a meaning for small.
> Therefore the 'HTML5' definition does not apply for all use cases
> in HTML4 documents, just to a subset. 

By that reasoning, the 'HTML4' definition does not apply for all instances 
of HTML4 documents either. After all, most HTML4 documents have some level 
of conformance error, and an overwhelming number of HTML4 documents use 
elements incorrectly. I don't think this is a useful line of reasoning.

> However, because user agents do not need to care about the meaning,
> the presentation may not differ. Authors have to care about the meaning
> and cannot use the element in 'HTML5' for some use cases. 
> This is not necessarily a problem for authors, if there are other elements
> intented in 'HTML5' for their use case and because HTML4 and 'HTML5'
> are different versions and looking at the version indication (doctype) one 
> can at least indentify, when authors use the HTML4 definitions. As long as
> 'HTML5' has no version indication, there is no simple way to indicate,
> that the definition in 'HTML5' applies.

Just always assume the HTML5 definition applies. As far as I can tell, 
that won't cause any problems. Can you point to a page where doing this 
causes a practical problem with an software package?

> Or the meaning of cite is defined more precisely, what is ok for
> a new version, but not applicable for the usage in HTML4 documents.

HTML4's description was apparently vague enough that different people 
consider it to mean different things. My interpretation of HTML4's text is 
that HTML5's definition of <cite> is a superset of HTML4's.

> Ok - what really a proper content is, depends on several things,
> if simply a private communication or a dictum is quoted, it has no
> title and one has to note the name of the person. 
> If a work has specific authors, both title and authors and maybe
> a source or a unique identifier may belong to the citation information. 

Not really sure what you're saying here.

> acronym - non conforming feature in the current draft, well
> defined in HTML4. 
> I think, with the instead recommended element abbr there is a problem
> with other (legacy?) versions of MSIE.
> Obviously here the 'HTML5' draft does not include an explanation of
> the meaning of HTML4 documents and does not necessarily 
> do a better job concerning the description of the interpretation of legacy
> viewers. 

I don't understand what you're asking for here. HTML5 says that <acronym> 
should be handled as a synonym for <abbr>.

> The content model of dl is more restrictive in 'HTML5' - surely
> it cannot describe uses of the less restrictive model of HTML4.

<dl> hasn't changed as far as I can tell.

> And viewers have no problem to present such uses, therefore
> 'HTML5' may have a better definition of definition lists, excluding
> some not very nice use cases, but it does not describe several
> really existing HTML4 documents or how they are presented
> by current viewers.

I don't follow. Could you include some examples maybe?

> I think, for object some attributes are missing.
> Well, some authors used some of them wrong and 
> something like declare was not widely implemented.
> Both does not indicate directly a problem with the HTML4
> definition.

I think that's exactly what it indicates, actually.

> I think, there is still no declarative method in
> the draft to start some time dependent content of object,
> therefore declare is really missing in 'HTML5', not only
> for object. However, if an author uses it in a HTML4
> document, one cannot expect that the behaviour of
> a browser ignoring this attribute is that, what was
> intented by the author ;o)
> The implementation gap simply excludes some use
> cases of object in practice - maybe one of the reason,
> why there is currently a lot of strange content around,
> trying to simulate such functionality somehow to work 
> around the gap.

I'm not sure what you're asking for here.

> I think, there are several more samples, all of them show, that
> 'HTML5' does not describe all 'valid' HTML4 documents properly.

Could you list them? I should fix them, if so.

> I do not think, 'HTML5' has to do this, because it is a new version
> of the language.

I think HTML5 must do this, because it is a new version of the language.

> It is just pretty useless to disclaim such simple
> facts and incompatibilities.

Not sure what you mean.

> > > And has far as I have seen, those changes are not mentioned
> > > in the current draft (as well as maybe some missing attributes).
> > > If we take the sample of the version attribute itself, it does not
> > > define what it means, HTML4 for example does.
> >
> > HTML4's statements on the matter are inconsistent with actual
> > implementations and legacy content.
> I cannot see, what is inconsistent here:
> "version = cdata [CN]
> Deprecated. The value of this attribute specifies which HTML DTD version 
> governs the current document. This attribute has been deprecated because it 
> is redundant with version information provided by the document type 
> declaration.
> "

This is inconsistent, e.g., with the following text in HTML4:

# The document type declaration names the document type definition (DTD) 
# in use for the document [...]

Which is it? The DOCTYPE or the version="" attribute?

> This does not even suggest a specific use of the attribute or that
> the interpretation or presentation of a simple browser must depend 
> on such an information.

Indeed, the text you quoted is completely empty of normative conformence 
criteria. It doesn't define anything; the spec would lose nothing if that 
text was removed. This is typical of much of HTML4.

Anyway, I'm not interested in arguing about the flaws of HTML4. It's a 
decade too late for that.

> [...]

I don't really understand what you want me to do, at this point. If you 
could concisely state what problem exists in the HTML5 spec that you 
believe should be addressed, I can try to address it (if it really is a 
problem). However, this conversation at this point is meandering 
apparently aimlessly and I'm not sure that it will lead to a productive 

> > > A current draft cannot change the meaning of a previous 
> > > specification/recommendation and it does not change the meaning of 
> > > documents written in this previous language version.
> >
> > Actually, it can, when the older specification was incorrect.
> How can it be incorrect, if the semantical meaning of the content of an 
> element is defined?

The spec isn't the final word on the meaning of the language. The use of 
the language is the final word on the meaning of the language.

If everyone uses <embed> to embed a plugin, then that's what <embed> 
means. If everyone uses <object codebase=""> to specify the source of the 
plugin, then that's what that attribute means, even if HTML4 says that the 
attribute gives the base URL for the classid="" attribute.

Ian Hickson               U+1047E                )\._.,--....,'``.    fL
http://ln.hixie.ch/       U+263A                /,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'
Received on Thursday, 27 August 2009 23:39:21 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 6 January 2015 20:26:25 UTC