- From: Jukka K. Korpela <jkorpela@cs.tut.fi>
- Date: Sun, 23 Feb 2014 14:33:43 +0200
- To: Ferren MacIntyre <fmaci@inbox.com>, www-validator@w3.org
2014-02-20 11:10, Ferren MacIntyre wrote: > See, e.g., > http://www.electromontis.net/evoligion/_C/C10.shtml#C10 > at the extremities of the text in the green box. I’m not sure I understand what you mean by that reference. Markup validation is really independent of the visual rendering, and the data error that prevents validation is after the element that is rendered as a box with green background. > For a year or so I have used left-and right-pointing solid equilateral > triangles as navigation pointers to previous and following chapters. You mean “◀” and “▶”, U+25C0 BLACK LEFT-POINTING TRIANGLE and U+25B6 BLACK RIGHT-POINTING TRIANGLE? I can’t see any attempt at using them on the page. > These have not been coded, but just visible glyphs. Well, as characters, you mean, I suppose. That’s fine if you use UTF-8. > Suddenly the > validator gives up, Saying 'Waiter, there's a bad byte in my code! I > can't eat that!'. It says something like '\xD5 on line 20', but it > won't show the source code, and the source code doesn't have an \xD5 > that I can find, so it leaves me guessing. The error message is: “Sorry, I am unable to validate this document because on line 23 it contained one or more bytes that I cannot interpret as utf-8 (in other words, the bytes found are not valid values in the specified Character Encoding). Please check both the content of the file and the character encoding indication. The error was: utf8 "\xD5" does not map to Unicode” It may look cryptic, but I think it is as good as we can get, given the technical nature of the problem—except that the last sentence is rather misleadin. It is indeed a “bad byte” that is the issue, and it is a problem at the level of character data representation, not in markup; this is why the validator gives up: it does not even start validation. The data, when declared as UTF-8, would be invalid data even when interpreted as plain text. More exactly, line 23 is <p class="cen" style=[0xD5]font-weight:bold;">Chapter Navigation:<br /> where I have denoted the problem byte, D5 hexadecimal, as [0xD5]. It should apparently be the common Ascii quotation mark ("), starting a quoted attribute value and matching the closing quote later. There is a very similar error on line 848. The sentence “utf8 "\xD5" does not map to Unicode” is misleading, but it is difficult to say this compactly: The data is being interpreted as UTF-8 encoded, but the byte 0xD5 was encountered in a context where it is not possible in UTF-8. The byte can appear in Unicode, but only as the first byte of a two-byte encoded form of a character so the second byte is in the range 0x80…0xBF. And here the next byte is one that corresponds to the letter “f” (0x66). A validator could deal with such errors by ignoring the offending byte, and perhaps that would be better than just quitting. But I’m afraid this might require fundamental changes to the code of the validator, at a level where nobody is really working with it—in the low-level routines. In any case, you need to fix the character-level errors anyway, so why not start with them and then proceed to validation errors proper? After all, the error causes real trouble: you can see that browsers do not render “Chapter Navigation” in bold face, even though that is clearly the intention. Browsers apparently read the attribute value as beginning with an odd character that prevents them from seeing the CSS code as intended. > I tried ⊲ and &$8883, which get past the validator, but they are > puny little things of no merit whatsoever. I think you mean ⊲ and ⊳, which are character references for U+22B2 NORMAL SUBGROUP OF and U+22B3 CONTAINS AS NORMAL SUBGROUP, “⊲” and “⊳”. They might be used as special arrowhead-like symbols (though they are defined as mathematical symbols), but this does not seem to relate to the problem at hand. > Per Jukkela's (sp?) suggestion, I persist in using % on image widths, > and 'accessed on xxxx-xx-xx' inside URL anchors, which the validator > fusses about and I need, but at least the validator will soldier on > after bitching, which it won't do with the triangles. These seem to be separate issues. They can be seen only after fixing the character-level error that blocks validation. And HTML5 just flags attributes like width=33% as errors because the authors of HTML5 think that such constructs are outdated, bad style, etc. The error on line 769 is a real one: there’s the tag A <a href="//mathildasanthropologyblog.wordpress.com/feed/" as of 2008-05-28> where “as of 2008-05-28” is misplaced—it is parsed as attributes (and ignored by browsers since no such attributes are recognized. Yucca
Received on Sunday, 23 February 2014 12:34:17 UTC