- From: Dr. Olaf Hoffmann <Dr.O.Hoffmann@gmx.de>
- Date: Wed, 5 Aug 2009 18:16:27 +0200
- To: public-html-comments@w3.org
Anne van Kesteren: > On Wed, 05 Aug 2009 10:58:43 +0200, Dr. Olaf Hoffmann <Dr.O.Hoffmann@gmx.de> wrote: > > No, I think, both CSS and 'HTML5' share the same problem without > > a version indication. However, CSS is just about styling, in doubt > > one can switch it off. (X)HTML is about content and it is more important > > to have a well defined and not plurivalent meaning of elements > > (or encoding or other things related to content). > > If there is no versioning the latest version of the standard defines the > meaning of elements. So that problem does not exist. > As already mentioned, this is the source of the problem. If the document does not indicate, to which version the markup belongs, the meaning is not stable or maybe the creation date has to be used to guess, what the meaning was, when the author created the document. The meaning/content of a document does not change, just because a new version specification of the used markup language appears. Especially because different interest groups can publish different versions of HTML at any time, an author cannot predict the meaning of his own documents, if there is no relation provided to the specification relevant for the current document. Authors wanting to have a meaningful and time independent markup for long living documents, they have to use a version with version indication and they cannot profit from the current efforts of the HTML5-WG. Previous (X)HTML versions with version indication can still be used for this purpose, but each version without an indication method is excluded due to the gap, that one cannot indicate the relation between the current document and the used language version (if there is more than one). > >> This is the same point as above. Authors do not write against > >> specifications. > > > > Well not all authors ... > > As an author I started to write a specification, because (X)HTML did > > not cover, what I wanted to markup ;o) > > Writing _against_ a specification and writing _a_ specification are very > different things. Of course. First someone has to write a specification, then one can refer to it and write documents using this. It is the main reason for such specification, that people are enabled to write documents in the specified format. Without a relation between the single document and the specification, the markup of the document remains meaningless or the meaning is plurivalent or arbitrary. If things are undefined, one can write a specification to be able to write well defined documents (what is the main goal - but if no one else has done it, one first has to create the tools, before one can start the intended work). You wrote 'Authors do not write against specifications.' I can simply disprove this, because I am an author and I do write documents as specified (by others or by myself). Maybe we can affirm that some authors write documents as specified, but much more authors do not care about specifications, but believe in the interpretation/presentation of their preferred browser. Anyway, finally a specification covers documents from both types of authors, especially 'HTML5' due to the advanced error management. > > > And those people with microformats and RDF(a) indicate, that not > > all authors want to write things, already appearing somehow in > > common browsers, but have no semantical well defined meaning ;o) > > I do not understand this sentence. They create methods and structures to be able to indicate the intended meaning by pointing directly to a scheme or a definition. If done for all content, this can replace a version indication and the used language itself does not need any meaningful elements anymore, because the meaning is defined by the directly referenced definitions. Obviously it is much more convenient to refer to a complete language version for the complete document and only to refer to advanced definitions for those fragments the language itself has no meaningful elements. > > > Those authors, who care mainly about the appearance in current > > browsers typically do not have long living documents in mind, even > > if with some luck some of these documents remain several years. > > You make it sound like this is a fact, but this is not at all my > experience. Well I have a long experience discussing real problems of authors in forums. Many of those authors are on a very low knowledge level, but often they want to have everything immediately and without any efforts ;o) And the majority of authors write for now or this year (no matter, how long those documents later really remain on the server). Their planning horizon is close to zero. If they manage to learn some more details, some of them become more careful to write reusable, 'valid' content to be able to save time if they publish it again in a completely new structure or with another PHP-script one or two years later. Often the authors of these PHP-scripts seem to be on a low knowledge level too for (X)HTML, not necessarily for PHP itself. There are only a few of them writing documents for 'eternity', the majority believes, that they have to adjust it anyway from time to time to the current behaviour of browsers or at least the style to the current taste. Because some fraction cannot really separate style from content (implicated too by some restrictions of CSS), they have to care about the content each time they change the styling. Additionally and unfortunately still electronic documents published in the internet do not have the reputation to be reliable and in the same way referencable as paper books or journals are; such authors may be one of the reasons for this low reputation. Another issue is of course, that it is typically no problem to read a paper book 50 years after publication, for an electronic format it is. One of the tasks of formats like (X)HTML can be to show, that documents still have a well defined meaning after 50 or more years. The chances are now not so bad with proper encoding information and documents with version indication for the used markup. But because up to now, no electronic format survived such a long time, the majority of authors do not have such a long planning horizon. > > > Or they do not have much experience and rely to much on the > > behaviour of the current version of their preferred browser. > > Right, which is why better interoperability is important. This is scorched earth due to the experiences with different behaviour in the last ten or more years. Browsers behaved quite different and authors wasted a lot of time especially with differences in CSS interpretation. Browser still behave differently for HTML, for example only some of them provide a complete access to meta information, links, alternate stylesheets, cite information for quotes, date information for del and ins. Even if for one or two formats those problems are assumed to be widely solved, it will take many years to convince authors, that this is really the case. > > > Within the last ten years such authors typically had a lot of work > > updating a few documents to the current behaviour of the newest > > browser version. I think, this is not really a promising perspective > > for electronic formats (as for computer hardware this is no > > standard, this permanent handicraft work). > > Do you have any evidence for this? Documents from 10 years ago typically > still render fine. > Ok, for HTML in those days authors typically used that fraction, what was already implemented. But for example the change from the Mozilla suite to firefox (many do not use SeaMonkey) implicated some restrictions for example for navigation/menus using link elements. Of course, there are much more issues due to changes and bug fixes for CSS or redefinitions in CSS2.1. For some styles I had to adjust my own projects from time to time, now either I write more stable styles or the behaviour of common browsers stabilises. > >> There is > >> > >> http://html5gallery.com/ > >> > >> for instance, which collects sites made in HTML5. > > > > Indeed, within the content, the meta description or keywords it often > > appears, that those projects are intended to be HTML5. For several > > others it can be guessed, because elements like header, footer and > > section are used. > > You mean "not used"? Those elements are part of HTML5. No, it can be guessed, that they use 'HTML5', because other versions do not have these elements. Currently 'HTML5' is the only version having these elements (maybe with the exception of some of them, which appeared already in the early XHTML2 drafts). However, apart from the doctype one can write an 'HTML5' document only with elements already defined for 'HTML4' or earlier versions, therefore the collection of used elements is no indication. (But you did not claim that and I just explained, why the version cannot be derived from the element collection to pronounce, that this is not the way to indicate or to identify the usage of 'HTML5' in a document.) > > > But that this information is spread along these meta element attributes > > content or the content of the pages indicates even more, that a > > version indication like version="HTML5" is missing. > > I don't see why. This is only an interpretation of an observation - several of these pages claim to be 'HTML5' within the content or the meta elements or indicated somehow a relation to the keyword 'HTML5'. One reason could be, that there is no other way like a version indication to provide an unambiguous relation between the single document and the version 'HTML5', but those authors have the desire to indicate the relation. They are convinced, that 'HTML5' is useful and want to express somehow, that they already use it. Maybe more for psychological reasons it seems to matter to indicate, that they (already) use this new version, even if parts of the 'HTML5 WG' seem to assume, that no one cares. Obviously these authors care. If it would not have been important for those authors to indicate the relation, they would not have indicated it. Because it was created, it can be assumed that it has a purpose for those authors. > > > Is it expected, that such an informations always appears within > > the content or the description or keywords of a 'HTML5' document? > > Or is it intended, that the used elements are analysed to identify > > 'HTML5'? > > The version does not need to be identified. As I said before, HTML is > versionless. HTML not and not XHTML, only 'HTML5' currently, other versions have a version indication. And if there is a need or desire to identify a version, depends on the audience of each single document, this cannot be generalised with an arbitrary claim. If someone like me has use cases for identification, this generates a need or desire. > > > (I think, the gallery itself does not use elements specific for HTML5). > > Is it defined in the current draft how to indicate 'HTML5' with meta > > elements? > > That would introduce versioning, so no. > > > Looks currently like poor design of 'HTML5' - and that many of them > > indicate the fact, that they use 'HTML5' with such workarounds looks > > pretty much like a gap in the current draft ;o) Why else to note several > > times within the document the used version in different ways, > > surely because they try to assure, that their documents are really > > identified somehow as 'HTML5' ;o) > > Why would they try to assure that their documents are identified as HTML5? Why I want to indicate a version, I already explained, why they indicate a relation explicitly, you have to ask them ;o) > Browsers process HTML in the same way regardless, so it does not matter. > I've said all this before though and I'm feeling this discussion is just > going in circles. In parts, but mainly, because you still mix up interpretation of content and content (or its meaningful representation). If you don't understand the difference, there will be indeed nothing to learn for you and no progress in the discussion. > > >> > How do they identify them as 'HTML5'? and distinguish from undefined > >> > tag soup without a version indication? > >> > >> That is not needed. > > > > Well, this dicussion and the samples you provided indicate, that there > > is at least a desire for a version indication - maybe to get well defined > > documents, maybe to show, that the author is a cool and funky designer > > using already languages, which are still drafts, even if it is not known, > > how to indicate, that they really use this cool new language version ;o) > > Could you maybe indicate what you mean? I do not get the same impression as > you looking at the source code of those sites. > See above, several of these projects referenced in the gallery explictely indicate themselves somehow as 'HTML5' or indicate some relation to this keyword. > >>> Not, how these document are maybe interpreted today, what does the > >>> author indicate, what they are? > >> > >> I'm not sure what you mean. See above, the problem with the difference between interpretation of content and content or the markup of content to indicate a semantical meaning or intention. I think, this is one major progress in conservation of information, after people learned to memorise information using lyrics or poetry, to write text, to structure written text with paragraphs, headings etc, to use machines to print books and to save information in an electronic way. But information does not exist independently from the conservation methods, there has to be always a cultural agreement or alternatively a specification, how the information is conserved to be able to extract it again or to interprete it, in doubt independently from specific currently available programs. And because today we have different methods to conserve information in electronic formats, one has to indicate somehow, to which specification the markup of the current document is related. > > > > Because languages like XHTML+RDFa, HTML4, 'HTML5', SVG, > > MathML, SMIL, RDF etc define somehow, what the meaning of the > > content of an element is, and different versions define it (slightly) > > different, the meaning can be only derived by knowing the version. > > No, the meaning can only be derived by asking the author. The version does > not have much to do with it. E.g. lots of authors abuse <blockquote> for > indenting and others abuse longdesc="" for search engine spam. That is not > the meaning of those HTML features. > Here you mix up errors or indifference or ignorance of authors with the content of a document. Once an author has decided to use a markup language with a specified meaning and publishes such a document, the meaning of the document can be derived from the document. Indeed this can be different form the intentions of the author. And of course, sometimes you just derive the information, that the author is an indifferent ignorant or a cretin or that the author simulates a cretin. But if an author indicates something as a blockquote, it is a blockquote. And if something is indicated as a longdesc, this is the equivalent for the related image - if any browser would make this information together with the image available for everyone, it would be simple for many people to identify the author as a spammer comparing image and text. And it would surely help authors to improve the content of such alternative descriptions, because they become accessible for everyone. Surely this would already reduce the abuse, because spammers typically do not want to be discovered as spammers by the 'ordinary' human audience. Another example would be an object with flash and as alternative text only some advertisement link to the flash player from adobe. The derived meaning is simply, that the flash document is only advertisement for the flash player and does not contain further information. If the audience is not interested in such a player, there is no need to install or to activate it. This is a method I use already a long time simply to save time and traffic. Once you start to believe in markup, this is an effective filter to sort out already a lot of nonsense around. > > Sometimes, if the functionality changes too, the intended behaviour > > can only be derived by knowing the version. > > To create more than currently cool and funky designer pages it is > > therefore important for some authors to indicate, what they really > > mean, not just, how things appear. And if elements are defined > > in different language versions to have different meanings, a version > > indication is required. > > Maybe in an ideal world this would be the case, but given that nobody wants > to implement versioning, versioning makes things vastly more complex, and > older specifications (e.g. HTML 4 and CSS 2.0) are very poorly written and > ambiguous, this is unlikely to happen. > Implementation in current simple browsers is another question, as interpretation is - as we already mentioned, there is no way for authors to force a specific interpretation. But nevertheless it is still relevant to indicate, what was intended. For example years later a successor may discover, that current versions of browser do something strange (this happend quite often especially for styling in the last 10 years). Having a version indication, reading the old specification it is still possible to workaround the problem and to republish the document with a new version, better interpreted by current browsers. Without a version indication one can just guess and there is only a smaller, more time consuming chance to reconstuct the intended behaviour. There are several use cases for an unambiguous relation between a single document and a specification. And there is no need to know them all or to predict all possible use cases of the future, it is information available for different purposes and not just for a specific tool. > To do a step back, do you have an example of an HTML 4 page you once > created that would get "weird meaning" in HTML 5? I already noted the small element - in HTML4 (and already HTML3.2) this was often used in our section, often together with sup and sub to indicate properties of atomic and molecular states, symmetries, indices, chemical formulas etc. 'HTML5' restricts the meaning and excludes such use cases. Of course, it introduces now parts of MathML one can use instead and with a more related semantical meaning (would be one of my favourite test cases for 'HTML5' currently). But this is only available, if those old documents are converted into 'HTML5'. In HTML4 it was only possible to derive the meaning from surrounding content and cultural agreement, not from the specification, but it was a possible usage. And this was identificable both with the presentation and the markup itself. Similar things may appear for other elements having now a more restricted meaning or content model than in HTML4. Fortunately HTML4 has a version indication, therefore the more restrictive (not necessarily bad) definitions of 'HTML5' do not apply and the documents do not have to be updated to preserve the intended meaning. Without a version indication and with your idea, that the latest version defines the meaning, one has to update those old documents once the 'HTML5' draft becomes a specification for HTML. > > >> Currently <!doctype html> is required for something to be considered > >> HTML 5. However, all HTML is consumed using the algorithm defined in the > >> specification. (Implementations have always done this, though have > >> differences between them because not everything was defined back in the > >> days.) > > > > I think, this is not completely wrong for several HTML versions, and not > > for XHTML or XHTML+RDFa and maybe the best choice too for > > documents having an XHTML:html element as root element, but > > elements from several other namespaces as well, maybe including > > entity definitions within the doctype. > > I do not understand this sentence. For example XHTML+RDFa does not need a doctype, nevertheless you can use <!doctype html ...> for example to specify entities, if you need them. And if you have a compound document you often have no DTD and you can use this doctype indication too. Because it mainly says, that html is the root element, you can use it for any HTML version, not just for 'HTML5' (of course for most of them there is no version indication anymore, once the information within this doctype is lost, indeed from our current point of view it was not very clever to combine the version indication with the DTD information for some previous (X)HTML versions - the classical problem of using one screw for two tasks).
Received on Wednesday, 5 August 2009 16:18:59 UTC