- From: Jukka K. Korpela <jkorpela@cs.tut.fi>
- Date: Mon, 10 Dec 2007 10:51:44 +0200
- To: "Peter Epps" <pgepps@gmail.com>
- Cc: <www-validator@w3.org>
David Dorward wrote: > On 9 Dec 2007, at 05:59, Peter Epps wrote: >> Validating http://www.buildersdiscountmart.com/buildernew/cabinets/ >> Error [63]: "character data is not allowed here" >> >> What's up? Did I miss something in the spec? > > You seem to have some non-printing characters just before the title > element. More exactly, there are two BOM characters there. BOM = Byte Order Mark = zero-width no-break space = U+FEFF, intended for use at the start of a UTF-16 or UTF-32 data to make sure that the correct byte order is used in interpreting data. It is allowed at the start of UTF-8 data too but has no use there. In the _midst_ of Unicode data, it is permitted though discouraged and there it means what the name zero-width no-break space says. The important thing here is that it is not a whitespace character in SGML terms. Since it's a data character, it's not permitted there, since we're in the midst of the <head> element, where no data characters as such may appear - only elements. In UTF-8, BOM appears as the three-octet sequence EF BB BF. When viewed in a program that mistakenly treats the data as ISO-8859-1 encoded, it looks like the otherwise highly unlikely three-character sequence . You can see this if you manuall change the encoding in your browser (e.g. with the View/Encoding command) to ISO-8859-1 > Try deleting everything from the < of the title element back > to the > of the previous end tag. Then add those two characters, and > any whitespace you want, back in. Well, technically it is sufficient to remove the BOM characters. How you do this depends on the authoring software. Beware that there are other occurrences of BOM in the document. Since they are in contexts where character data _is_ allowed, the validator does not catch them, but they can still cause problems. Who knows what they'll cause e.g. in script code? The ultimate problem is probably that the document has been created by putting together some pieces generated by programs that insert BOM at the start of data, for some reason. When concatenating such pieces, the BOM characters should be removed. Jukka K. Korpela ("Yucca") http://www.cs.tut.fi/~jkorpela/
Received on Monday, 10 December 2007 08:52:03 UTC