Re: Less than hellpful error message from Jukka K. Korpela on 2014-02-23 (www-validator@w3.org from February 2014)

From: Jukka K. Korpela <jkorpela@cs.tut.fi>
Date: Sun, 23 Feb 2014 14:33:43 +0200
To: Ferren MacIntyre <fmaci@inbox.com>, www-validator@w3.org
Message-ID: <5309EAA7.7080200@cs.tut.fi>
2014-02-20 11:10, Ferren MacIntyre wrote:

> See, e.g.,
> http://www.electromontis.net/evoligion/_C/C10.shtml#C10
> at the extremities of the text in the green box.

I’m not sure I understand what you mean by that reference. Markup 
validation is really independent of the visual rendering, and the data 
error that prevents validation is after the element that is rendered as 
a box with green background.

> For a year or so I have used left-and right-pointing solid equilateral
> triangles as navigation pointers to previous and following chapters.

You mean “◀” and “▶”, U+25C0 BLACK LEFT-POINTING TRIANGLE and U+25B6 
BLACK RIGHT-POINTING TRIANGLE? I can’t see any attempt at using them on 
the page.

> These have not been coded, but just visible glyphs.

Well, as characters, you mean, I suppose. That’s fine if you use UTF-8.

 > Suddenly the
> validator gives up, Saying 'Waiter, there's a bad byte in my code! I
> can't eat that!'. It says something like  '\xD5 on line 20', but it
> won't show the source code, and the source code doesn't have an \xD5
> that I can find, so it leaves me guessing.

The error message is:

“Sorry, I am unable to validate this document because on line 23 it 
contained one or more bytes that I cannot interpret as utf-8 (in other 
words, the bytes found are not valid values in the specified Character 
Encoding). Please check both the content of the file and the character 
encoding indication.

The error was: utf8 "\xD5" does not map to Unicode”

It may look cryptic, but I think it is as good as we can get, given the 
technical nature of the problem—except that the last sentence is rather 
misleadin. It is indeed a “bad byte” that is the issue, and it is a 
problem at the level of character data representation, not in markup; 
this is why the validator gives up: it does not even start validation. 
The data, when declared as UTF-8, would be invalid data even when 
interpreted as plain text.

More exactly, line 23 is

<p class="cen" style=[0xD5]font-weight:bold;">Chapter Navigation:<br />

where I have denoted the problem byte, D5 hexadecimal, as [0xD5]. It 
should apparently be the common Ascii quotation mark ("), starting a 
quoted attribute value and matching the closing quote later. There is a 
very similar error on line 848.

The sentence “utf8 "\xD5" does not map to Unicode” is misleading, but it 
is difficult to say this compactly: The data is being interpreted as 
UTF-8 encoded, but the byte 0xD5 was encountered in a context where it 
is not possible in UTF-8. The byte can appear in Unicode, but only as 
the first byte of a two-byte encoded form of a character so the second 
byte is in the range 0x80…0xBF. And here the next byte is one that 
corresponds to the letter “f” (0x66).

A validator could deal with such errors by ignoring the offending byte, 
and perhaps that would be better than just quitting. But I’m afraid this 
might require fundamental changes to the code of the validator, at a 
level where nobody is really working with it—in the low-level routines. 
In any case, you need to fix the character-level errors anyway, so why 
not start with them and then proceed to validation errors proper? After 
all, the error causes real trouble: you can see that browsers do not 
render “Chapter Navigation” in bold face, even though that is clearly 
the intention. Browsers apparently read the attribute value as beginning 
with an odd character that prevents them from seeing the CSS code as 
intended.

> I tried &#8882 and &$8883, which get past the validator, but they are
> puny little things of no merit whatsoever.

I think you mean &#8882; and &#8883;, which are character references for 
U+22B2 NORMAL SUBGROUP OF and U+22B3 CONTAINS AS NORMAL SUBGROUP, “⊲” 
and “⊳”. They might be used as special arrowhead-like symbols (though 
they are defined as mathematical symbols), but this does not seem to 
relate to the problem at hand.

> Per Jukkela's (sp?) suggestion, I persist in using % on image widths,
> and 'accessed on xxxx-xx-xx' inside URL anchors, which the validator
> fusses about and I need, but at least the validator will soldier on
> after bitching, which it won't do with the triangles.

These seem to be separate issues. They can be seen only after fixing the 
character-level error that blocks validation. And HTML5 just flags 
attributes like width=33% as errors because the authors of HTML5 think 
that such constructs are outdated, bad style, etc. The error on line 769 
is a real one: there’s the tag

A <a href="//mathildasanthropologyblog.wordpress.com/feed/" as of 
2008-05-28>

where “as of 2008-05-28” is misplaced—it is parsed as attributes (and 
ignored by browsers since no such attributes are recognized.

Yucca
Received on Sunday, 23 February 2014 12:34:17 UTC