- From: David Woolley <david@djwhome.demon.co.uk>
- Date: Tue, 28 Mar 2006 08:24:10 +0100 (BST)
- To: www-html@w3.org
> > You can't use _any_ of HTML's semantics > > to unambiguously get data out of the Web in the manner you describe. That depends. It is often the case that those sources that are most presentational have the least real content. There is an approximate ordering, of increasingly correct usage: Vanity sites Commercial Governmental (because they outsource to commercial web designers) Academic PR departments (and alumni offices, etc.) Charities Personal sites with non-vanity content. Academics writing for themselves. The last category tends to have the most real content and is also most likely to use structural markup properly. There is also an invisible category of documents on the intranets of research based companies. > The potential has not been used, and there is little reason to think that > XHTML 2.0 would change this. And even millions of pages would not help > much if that means that only, say, one out of a hundred of defining > occurrences of terms has been marked up with <dfn>. That really depends on whether XHTML 2.0 becomes a must have on people's CVs. If it does, it will be grossly abused. If it doesn't, it is likely to have a quite high level of correct usage, although a small level of total usage. If, as I suspect, you don't really believe in structural HTML at all, could I suggest that tagged PDF is a much better compromise.
Received on Tuesday, 28 March 2006 07:36:21 UTC