W3C home > Mailing lists > Public > public-html-comments@w3.org > September 2009

Re: [HTML5] 2.8 Character encodings

From: Ian Hickson <ian@hixie.ch>
Date: Wed, 2 Sep 2009 22:24:34 +0000 (UTC)
To: "Dr. Olaf Hoffmann" <Dr.O.Hoffmann@gmx.de>
Cc: public-html-comments@w3.org
Message-ID: <Pine.LNX.4.62.0909022219320.6775@hixie.dreamhostps.com>

My apologies for top-posting, but I couldn't really work out how to reply 
in context to this e-mail (included in full below).

I think we have some sort of fundamental difference in understanding of 
the purpose of specifications, and I don't understand your view well 
enough to figure out a common ground from which to satisfy your comments 
in the spec.

>From my point of view, a language spec's purpose is to ensure 
interoperable behaviour between software products. In this view, 
obsoleting a feature from an earlier version of the language leads to that 
feature not having meaning in the new language other than the 
backwards-compatible processing requirements. Authors of previous versions 
of the language are irrelevant, since (if they acknowledge the new 
language definition) they are now required to use the new version of the 
language, and thus the old language is irrelevant to them.


On Fri, 28 Aug 2009, Dr. Olaf Hoffmann wrote:
>
> Ian Hickson:
> > On Sun, 16 Aug 2009, Dr. Olaf Hoffmann wrote:
> > > Ian Hickson:
> > > > On Wed, 12 Aug 2009, Dr. Olaf Hoffmann wrote:
> > > > > The meaning of some elements is different in 'HTML5' as well or is
> > > > > defined in a more restrictive way, what excludes some use cases
> > > > > possible in HTML4.
> > > >
> > > > Yes, but in practice that's not an issue since HTML5 describes how
> > > > HTML4 UAs actually did things.
> > >
> > > User agents present the elements somehow, often this does not directly
> > > imply a meaning.
> >
> > I agree, only the spec can imply a meaning.
> >
> > > And if we take (again, I already discussed this with Anne) the sample
> > > of the element small, the presentation implicates no specific meaning,
> > > what is ok for HTML4, because the definition does not imply a specific
> > > meaning either. The audience has to derive this from relations to the
> > > content around it. 'HTML5' defines a meaning for small.
> > >
> > > Therefore the 'HTML5' definition does not apply for all use cases
> > > in HTML4 documents, just to a subset.
> >
> > By that reasoning, the 'HTML4' definition does not apply for all instances
> > of HTML4 documents either. After all, most HTML4 documents have some level
> > of conformance error, and an overwhelming number of HTML4 documents use
> > elements incorrectly. I don't think this is a useful line of reasoning.
> 
> 'Not really sure what you're saying here.'
> I never had a problem to understand the specified meaning of most
> HTML4 elements, therefore I cannot see errors in the definitions
> of the meaning of those elements, especially because for several
> of them the semantical meaning is vague - what corresponds often
> to vague use cases in many documents. In general HTML does not
> have many elements and therefore to cover every possible type
> of text the meaning of elements has to be very broad. 
> Another approach could be of course to have a larger collection
> of elements to markup text (what is in parts available in other
> formats for more or less specific use cases).
> That some authors use some elements for not intented purposes
> is not directly a problem of the specification, this can be a social
> problem, a problem of limited intellectual capabilities, of indifference 
> and ignorance. Such things cannot be changed with another version
> of a language. Maybe this cannot be changed at all.
> 
> >
> > > However, because user agents do not need to care about the meaning,
> > > the presentation may not differ. Authors have to care about the meaning
> > > and cannot use the element in 'HTML5' for some use cases.
> > > This is not necessarily a problem for authors, if there are other
> > > elements intented in 'HTML5' for their use case and because HTML4 and
> > > 'HTML5' are different versions and looking at the version indication
> > > (doctype) one can at least indentify, when authors use the HTML4
> > > definitions. As long as 'HTML5' has no version indication, there is no
> > > simple way to indicate, that the definition in 'HTML5' applies.
> >
> > Just always assume the HTML5 definition applies. 
> 
> This would be surely wrong for several use cases, 'HTML5' excludes.
> It is no problem for me, that 'HTML5' excludes them for 'HTML5'
> documents. However, for HTML4 documents they are still possible and
> this is not time dependent and does not depend on other people
> specifying other versions of the language.
> To believe in 'HTML5' for such documents would implicate, that some
> constructions have no meaning at all, because the elements are
> used for the wrong purpose.
> Or if I define a HTML-version on my own, exchanging the
> definitions of the elements p and blockquote, and ensuring
> that this new HTML-version superseedes any other version,
> I think this does not really change the meaning of p and
> blockquote in HTML4 or in 'HTML5', it changes only the
> meaning for documents written in my own HTML-version.
> Or more related to the discussion - in 'my version' of
> 'HTML5' I could define a version attribute - how does
> this change the meaning of the W3C draft?
> It changes meaning and usability of my draft, but not
> that of the W3C draft. And the changes are not
> necessarily relevant for the presentations in a user
> agent.
> 
> 
> > As far as I can tell, 
> > that won't cause any problems. Can you point to a page where doing this
> > causes a practical problem with an software package?
> >
> 
> Well, authors are no software packages, therefore 
> 'Not really sure what you're saying here.'
> 
> > > Or the meaning of cite is defined more precisely, what is ok for
> > > a new version, but not applicable for the usage in HTML4 documents.
> >
> > HTML4's description was apparently vague enough that different people
> > consider it to mean different things. My interpretation of HTML4's text is
> > that HTML5's definition of <cite> is a superset of HTML4's.
> >
> > > Ok - what really a proper content is, depends on several things,
> > > if simply a private communication or a dictum is quoted, it has no
> > > title and one has to note the name of the person.
> > > If a work has specific authors, both title and authors and maybe
> > > a source or a unique identifier may belong to the citation information.
> >
> > Not really sure what you're saying here.
> >
> 
> I just compare what is often included in citations with what
> is currently noted about the content of cite in the current
> draft.
> I think you mean, that the draft definition is a subset of
> that what is possible as content in HTML4?
> It is no problem for me, that the draft defines in more
> detail, what to note in cite in 'HTML5' documents, however,
> HTML4 does not do it and therefore this element can
> contain other content in HTML4 documents.
> 
> For example:
> <blockquote>
> Not really sure what you're saying here.<br>
> <cite>Ian</cite>
> </blockquote>
> looks ok for me within HTML4, not in 'HTML5'.
> 
> Or
> <p>
> It was demonstrated, that the simultaneous
> optical excitation of atoms and molecules 
> within collisions can be observed with differential
> detection in a beam experiment. <br> 
> <cite id="ref1">V. A. Aleskseev, J. Grosser, O. Hoffmann, F. Rebentrost in 
> JCP <b>129</b> 201102 (2008)</cite>
> </p>
> 
> Alternatively the cite could contain an element a with
> a reference:
> <cite><a href="#ref1">[1]</a></cite>
> 
> There are different styles for citation and different
> information - whom, what and which resource, not
> only the title 'title of a work'.
> 
> For 'HTML5' one has to write something like this:
> <cite>Simultaneous optical excitation of Na electronic and
> CF<sub><small>4</small></sub> vibrational modes in
> Na+CF<sub><small>4</small></sub> collisions</cite>
> 
> What is a quite different information.
> This sample includes a problem with the element
> small of course, therefore this is again only ok in
> HTML4, but not in 'HTML5', there I think one has
> to use MathML to markup the molecule.
> 
> 
> 
> 
> > > acronym - non conforming feature in the current draft, well
> > > defined in HTML4.
> > > I think, with the instead recommended element abbr there is a problem
> > > with other (legacy?) versions of MSIE.
> > > Obviously here the 'HTML5' draft does not include an explanation of
> > > the meaning of HTML4 documents and does not necessarily
> > > do a better job concerning the description of the interpretation of
> > > legacy viewers.
> >
> > I don't understand what you're asking for here. HTML5 says that <acronym>
> > should be handled as a synonym for <abbr>.
> 
> It is noted under:
> '12.2 Non-conforming features'
> with:
> "acronym
> Use abbr instead.
> "
> 
> Because every acronym is an abbreviation too,
> there is no problem in doing this in 'HTML5' - one can 
> use microdata/RDFa to specify it in more detail, if
> required.
> However, in HTML4 it is not a 'non-conforming feature'
> and can have a slightly different and more specific meaning 
> as abbr.
> Therefore this is surely another example of something
> an author can use in HTML4 with a meaning, but should
> not in 'HTML5' because it is indicated as a 
> 'non-conforming feature'. Obviously, this definition does
> not apply to acronym within HTML4 documents.
> 
> 
> 
> >
> > > The content model of dl is more restrictive in 'HTML5' - surely
> > > it cannot describe uses of the less restrictive model of HTML4.
> >
> > <dl> hasn't changed as far as I can tell.
> >
> 
> HTML4:
> <!ELEMENT DL - - (DT|DD)+              -- definition list -->
> 
> 'HTML5':
> Content model:
> Zero or more groups each consisting of one or more dt elements 
> followed by one or more dd elements.
> 
> 
> 
> 
> > > And viewers have no problem to present such uses, therefore
> > > 'HTML5' may have a better definition of definition lists, excluding
> > > some not very nice use cases, but it does not describe several
> > > really existing HTML4 documents or how they are presented
> > > by current viewers.
> >
> > I don't follow. Could you include some examples maybe?
> 
> These are for example the nasty 'poetry' samples as discussed
> a longer time ago. Because HTML still has no elements for poetry,
> at least in HTML4 documents one has to work around this.
> In XHTML one can use related elements from other languages
> or in XHTML+RDFa one can indicate the meaning with RDFa,
> in 'HTML5' this may work with microdata/RDFa as well.
> However, for the last two variants one has to use proper
> elements with a sufficient structure model for strophes (stanzas)
> and strophe lines. For example dl/dd (excluded in 'HTML5') or 
> div/div or maybe section/div.
> Because 'HTML5' has other elements and dl/dd is not
> applicable, the best possible solution in 'HTML5' looks different 
> than in HTML4 or XHTML1.x. 
> 
> Maybe some use dl for recipes, bills or some structures
> in the bible for example (is already discussed in the related
> wiki). 'HTML5' does not describe this, what is no problem
> for documents of previous HTML versions and authors
> of 'HTML5' documents can find other (maybe better)
> solutions. This is why 'HTML5' is different from HTML4
> and why it does not define the meaning of some structures
> in HTML4 documents.
> This happens mainly, because there is no concept in
> 'HTML5' either just do simplify the element collection
> and to use something like RDFa to provide a semantical
> meaning or to define a more complete collection of
> semantical elements for specific use cases.
> In 'HTML5' it is more a matter of tast mixture
> of changes or improvements, therefore clearly
> different from HTML4 and none of them is a
> true subset of the other. They share many common
> or similar features.
> 
> >
> > > I think, for object some attributes are missing.
> > > Well, some authors used some of them wrong and
> > > something like declare was not widely implemented.
> > > Both does not indicate directly a problem with the HTML4
> > > definition.
> >
> > I think that's exactly what it indicates, actually.
> >
> > > I think, there is still no declarative method in
> > > the draft to start some time dependent content of object,
> > > therefore declare is really missing in 'HTML5', not only
> > > for object. However, if an author uses it in a HTML4
> > > document, one cannot expect that the behaviour of
> > > a browser ignoring this attribute is that, what was
> > > intented by the author ;o)
> > > The implementation gap simply excludes some use
> > > cases of object in practice - maybe one of the reason,
> > > why there is currently a lot of strange content around,
> > > trying to simulate such functionality somehow to work
> > > around the gap.
> >
> > I'm not sure what you're asking for here.
> 
> Well - not related to this discussion here, but 
> SMIL and SVG have declarative methods to begin
> and to end for example video and audio  in a
> declarative way. HTML4 had at least declare to
> begin such objects. 'HTML5' does not have it.
> Maybe the best approach for authors is still to
> embed SVG or flash to do the job. 'HTML5' 
> clearly fails to provide a simple declarative method
> to allow authors to specify buttons or a selection
> tools to begin and end such media.
> To be able to begin for example a video or
> audio after an interactivity of the user is often
> important to allow to select between different
> options.
> 
> >
> > > I think, there are several more samples, all of them show, that
> > > 'HTML5' does not describe all 'valid' HTML4 documents properly.
> >
> > Could you list them? I should fix them, if so.
> >
> 
> From my point of view 'HTML5' is a new version of HTML and
> can be therefore different. No need to fix something or to
> list it. If an author indicates that the version 'HTML5' is used,
> this new definitions and meanings apply.
> If HTML4 is indicated, the old meanings apply. No problem at
> all with version indication. And no need to spend time to
> compare and to list differences (which can have good 
> reasons of course).
> 
> 
> 
> > > I do not think, 'HTML5' has to do this, because it is a new version
> > > of the language.
> >
> > I think HTML5 must do this, because it is a new version of the language.
> >
> > > It is just pretty useless to disclaim such simple
> > > facts and incompatibilities.
> >
> > Not sure what you mean.
> 
> ;o)
> 
> >
> > > > > And has far as I have seen, those changes are not mentioned
> > > > > in the current draft (as well as maybe some missing attributes).
> > > > > If we take the sample of the version attribute itself, it does not
> > > > > define what it means, HTML4 for example does.
> > > >
> > > > HTML4's statements on the matter are inconsistent with actual
> > > > implementations and legacy content.
> > >
> > > I cannot see, what is inconsistent here:
> > >
> > > "version = cdata [CN]
> > > Deprecated. The value of this attribute specifies which HTML DTD version
> > > governs the current document. This attribute has been deprecated because
> > > it is redundant with version information provided by the document type
> > > declaration.
> > > "
> >
> > This is inconsistent, e.g., with the following text in HTML4:
> >
> > # The document type declaration names the document type definition (DTD)
> > # in use for the document [...]
> >
> > Which is it? The DOCTYPE or the version="" attribute?
> >
> 
> Not really sure what you're saying here.
> If there is no doctype (as in XHTML+RDFa), the indication in
> version applies. If there is a doctype, that applies.
> If doctype and version information are incompatible, the
> version seems to be undefined, because I think there is no
> information what takes precedence. Therefore authors have
> to avoid such conflicts as they should in general.
> 
> 
> > > This does not even suggest a specific use of the attribute or that
> > > the interpretation or presentation of a simple browser must depend
> > > on such an information.
> >
> > Indeed, the text you quoted is completely empty of normative conformence
> > criteria. It doesn't define anything; the spec would lose nothing if that
> > text was removed. This is typical of much of HTML4.
> >
> 
> 
> Another variant of (X)HTML is more specific about this.
> In XHTML+RDFa it is noted:
> 
> 'There SHOULD be a @version attribute on the html element with the 
> value "XHTML+RDFa 1.0"'
> 
> And this is more relevant, because HTML4 documents have the
> doctype to indicate the version, this XHTML variant has not
> necessarily a doctype.
> 'HTML5' has no other version indication currently, but the
> XHTML namespace has, therefore at least for the XHTML/XML
> variant of 'HTML5' one can indicate a version attribute belonging
> to the XHTML namespace, but because 'HTML5' still does not
> say, how to indicate the version, the value of the attribute is
> still a question - best choice could be the URI of the 
> recommendation maybe, because there are not two
> versions with the same URI if nothing went wrong.
> 
> 
> 
> > Anyway, I'm not interested in arguing about the flaws of HTML4. It's a
> > decade too late for that.
> >
> > > [...]
> >
> > I don't really understand what you want me to do, at this point. If you
> > could concisely state what problem exists in the HTML5 spec that you
> > believe should be addressed, I can try to address it (if it really is a
> > problem). 
> 
> Well that is simple. Allow authors to indicate 'HTML5' as a version,
> for example with
> <html version="http://www.w3.org/TR/html5/" ...>
> This would be already better than the XHTML+RDFa approach.
> This is mainly a meta information about the semantical meaning
> of the document content, not a requirement for user agents to
> do something specific. It is similar to those microdata information.
> For simple presentation you need not to care about it, but if
> there is someone trying to find out the relation between the
> current document and the meaning of the used language and
> its elements, this is an interesting information.
> HTML documents often have meta information not relevant
> for any user agent, for example meta elements containing
> descriptions or keywords or with encoding information, if the
> server already sent the encoding information. 
> However, it is not completely useless, just because it is not relevant 
> for some user agents or some situations.
> 
> 
> > However, this conversation at this point is meandering 
> > apparently aimlessly and I'm not sure that it will lead to a productive
> > conclusion.
> >
> 
> My personal impression is, that this happens quite often with 
> discussions in the 'HTML5 WG', if semantical issues or issues 
> interesting for authors are discussed.
> To change this, maybe one has to find out, what the collective
> problem of the WG with such issues is ;o)
> 
> > > > > A current draft cannot change the meaning of a previous
> > > > > specification/recommendation and it does not change the meaning of
> > > > > documents written in this previous language version.
> > > >
> > > > Actually, it can, when the older specification was incorrect.
> > >
> > > How can it be incorrect, if the semantical meaning of the content of an
> > > element is defined?
> >
> > The spec isn't the final word on the meaning of the language. The use of
> > the language is the final word on the meaning of the language.
> >
> 
> This applies more for spoken languages and dictionaries.
> The dictionaries only describe, how the words of a language are
> currently used.
> With a specified language this is different. It is a technical terminology
> with fixed meanings. This is one of the main advantages, why to
> use something like markup languages at all.
> 
> 
> > If everyone uses <embed> to embed a plugin, then that's what <embed>
> > means. If everyone uses <object codebase=""> to specify the source of the
> > plugin, then that's what that attribute means, even if HTML4 says that the
> > attribute gives the base URL for the classid="" attribute.
> 
> In HTML4 documents 'embed' means nothing. And at least on my
> Linux computers not even all browsers interprete this (for example
> I think, Opera still ignores it, at least in combination with SVG ;o)
> 
> And even if millions of people believe in 1+1=1, this does not
> mean, that this is the meaning or that this is true or that this
> implicates to change the convention, that '+' typically means
> addition and not multiplication. It mainly indicates,
> that millions of people are wrong. This is not surprising. And one
> of the advantages of well defined technical terminologies is, that
> a minority is still able to express and to share relevant information, 
> even if the majority is not able to understand or to use such information
> at all. And if they refer to a specification of there terminology, every
> one else can at least learn it and can understand, what was intended.
> On the other hand, it is quite simple to check, whether an assertion
> within this terminology is meaningful or not. 
> If the meaning would be adjusted to the majority, this would
> mainly result in more stupidity.
> 
> 
> Olaf
> 
> 
> 

-- 
Ian Hickson               U+1047E                )\._.,--....,'``.    fL
http://ln.hixie.ch/       U+263A                /,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'
Received on Wednesday, 2 September 2009 22:22:12 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Wednesday, 1 June 2011 00:14:00 GMT