W3C home > Mailing lists > Public > www-validator@w3.org > February 2010

Re: Custom attributes in HTML elements

From: Jukka K. Korpela <jkorpela@cs.tut.fi>
Date: Fri, 12 Feb 2010 06:17:49 +0200
Message-ID: <AF25D023B0B34E1C97D8F44ABDE365F3@JukanPC>
To: "Claudia Murialdo" <cmurialdo@gmail.com>, <www-validator@w3.org>
Claudia Murialdo wrote:

> I need to add a few custom attributes in some HTML elements, which is
> the best way to do it in order to keep validating with w3c validation
> service?.

Define a DTD that contains them. This isn't trivial, but it isn't rocket 
science either. See
http://www.cs.tut.fi/~jkorpela/html/own-dtd.html

> I want my page validates with HTML 4.01 Transitional.

You can't eat your cake and keep it. "Validates with with HTML 4.01 
Transitional" is a common loose expression for having a document that 
declares _the_ HTML 4.01 Transitional DTD. If you use any attribute that is 
not declared in that DTD, the document is not valid. Any attempts at 
avoiding this simple conclusion are based on misunderstandings of what 
validity is (in the relevant technical sense, the SGML sense).

> I read that one
> way is to extend de HTML 4.0 Transitional DTD and put this in the
> DOCTYPE declaration of the page (as it says in
> http://htmlhelp.com/tools/validator/customdtd.html),

This is somewhat confusing since the good old htmlhelp.com refers to HTML 
4.0, not HTML 4.01, though the difference is small.

_The_ way to keep using the W3C validation service, or any SGML validator or 
close relative, is to modify the DTD to reflect the markup you want to use.

Of course, this achieves nothing but the usefulness of checking that your 
document's syntax isn't unintentionally malformed, i.e. that you use markup 
the way you have declared. Well, some people might refer to the additional 
potential benefit of showing off that your page "validates", but there are 
so many more effective ways of deception. Most people couldn't care less 
whether someone else's page "validates".

> but I would like
> to know if I can do it in another way so that I can have the original
> and public w3c DTD in the DOCTYPE and add my customs attributes adding
> a new DTD or a namespace. Can I do that?.

Adding new DTD? No, an SGML document has only one DTD by definition. 
Namespace? No, that's not an SGML thing at all.

But by SGML rules, a document's DTD may appear in two parts, as external 
subset (as normal with HTML: the DTD is in a file and just referenced in the 
document type declaration, the <!DOCTYPE ...> stuff) and internal subset 
(part of DTD directly appearing inside the document type declaration).

I wonder why I didn't consider that possibility years ago. It would save 
copying and would make it easier to keep track of your modifications. And up 
to a point it works nicely. E.g., if you wanted to use attribute FOO, with 
any string as value, in P elements, you would just say

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"
   "http://www.w3.org/TR/html4/loose.dtd"
[
<!ATTLIST P
  foo         CDATA     #IMPLIED
]
>

And you document would validate. The W3C validator would even say
"This document was successfully checked as HTML 4.01 Transitional!"
in the heading. But that's just misguided technobabble, in a misguided 
attempt at being understandable and helpful. Later, much less noticeably, 
the validator says what the babble means:
"This means that the resource in question identified itself as "HTML 4.01 
Transitional" and that we successfully performed a formal validation using 
an SGML, HTML5 and/or XML Parser(s) (depending on the markup language 
used)."
So it's just the _string_ "-//W3C//DTD HTML 4.01 Transitional//EN" in the 
DOCTYPE declaration that makes the validator characterize the document as 
HTML 4.01 Transitional.

However (and now I vaguely remember why I haven't used this nicer approach), 
web browsers have never followed HTML specifications properly. In 
particular, they don't understand anything about document type declarations, 
except in the banal sense of recognizing some forms of them as special, by 
fairly simple string matching, in the infamous DOCTYPE sniffing behavior: 
they use simple string patterns to make a choice between rendering modes 
(like Quirks and "standard").

The problem here is that browsers don't even _parse_ DOCTYPE declarations 
properly: they'll take "] >" as document content and display them at the 
start of the page. This happens even in IE 8 and Firefox 3.5, so it won't 
ever change. (Well, unless SGML gets rehabilitated and its great merits as 
extended XML will be recognized and taken into use... :-))

-- 
Yucca, http://www.cs.tut.fi/~jkorpela/ 
Received on Friday, 12 February 2010 04:19:20 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Wednesday, 25 April 2012 12:14:39 GMT