- From: Harold Baughan [RockSolidSite.com] <hbaughan@rocksolidsite.com>
- Date: Wed, 18 Feb 2004 09:41:26 -0500
- To: <html-tidy@w3.org>
Hello Bjoern, Re-examination is now complete. I think I've isolated the problem... Text was copied from an HTML 4.1 file via Notepad and pasted into an XHTML 1.1 file. Adjustments were made, Tidy was run, and the result was validated with the W3C on-line validator. The file validated properly as XHTML 1.1 under this declaration... <?xml version="1.1" encoding="us-ascii"?> <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"> <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en"> Then I ran the Beautify function using Tidy Beta. Now the on-line validator responded... > Sorry, I am unable to validate this document because on lines 92, > 99-101, 103, 106-107, 129-130, 138-139, 142, 146-147, 149, 154-157, > 161, 163-165, 169, 175-176, 182, 188 it contained one or more bytes > that I cannot interpret as us-ascii (in other words, the bytes found > are not valid values in the specified Character Encoding). Please > check both the content of the file and the character encoding indication. The first occurrence appeared in this snippet. 89 </p> 90 </td> 91 </tr> 92 </table> 93 </div> For this, the Frhed editor shows \</p> \</td> \</tr> \</table><bh:a0> \</div> So, there might be a *couple* of things going on, here. Last night I might have imported an "a0" when there were two spaces after punctuation, the first one being an . That would be my fault. However, *this* one is definitely coming from the Tidy beautify function. Note that when the validator responds with more than one line number (99-101) it is indicating that the last character on the first line is <bh:a0>, then there are space characters <bh:20> up to the first character of the next line. I hope this helps you to repair the problem. Question... should I change from us-ascii to another character set in the meantime? Which one? Thanks. A curious item... I looked at the file with Netscape 7.0. At every <bh:a0> a question mark showed up in a black diamond. So, at least the problem points are now visible! Why is this curious? ...Because I seldom have anything nice to say about Netscape, and this time it was actually useful. :-) I am not on the distribution list, so the only way I can receive a response is directly. Good luck with this one. Cordially, Harold Baughan ^v^v^v^v^v^v^v^v^v^v^v^v^v^v^v^v^v^v^v^v^v^v^v^v^v^v^v^ Baughan & Company, email: hbaughan@rocksolidsite.com - - - - - - ----- Original Message ----- From: "Bjoern Hoehrmann" <derhoermi@gmx.net> To: "Harold Baughan [RockSolidSite.com]" <hbaughan@rocksolidsite.com> Cc: <html-tidy@w3.org> Sent: Tuesday, February 17, 2004 5:25 PM Subject: Re: Non-us-ascii characters in Tidy Beta version 15-Jan-2004 > * Harold Baughan [RockSolidSite.com] wrote: > >On Feb. 9 I asked several questions re: Tidy Beta version 15-Jan-2004, and > >received some good help in operation. However, one error keeps getting > >introduced. Note... the version in use is a plug-in to Chami's HTML-Kit > >Version 1.0, Build 292, on Win-98. > > It'd be best if you try the command line application to reproduce the > problem with an ideally simple test case. If you are able to reproduce > it, send a message to the list or file a bug report on the sf.net site. > I cannot fix bugs I cannot reproduce. > > >After a beautify function, a non-visible, non-us-ascii character is being > >added somewhere in several strings. It seems to be happening on a line > >which includes <br /> by itself, or a space after </a> at the end of a line, > >or when there are two spaces between a period and the beginning of the next > >sentence (such as in "...word. Word...") > > By default, Tidy does not generate non-ascii output unless there are > non-ascii characters inside constructs where it cannot use character > references (comments, for example); in this case the characters come > out garbled. So there must be some configuration option active, > -latin1 for example. If the character is not visible it is most likely > U+00A0 ( ). Tidy would insert them e.g. if there is a <nobr> > element in the source document. > > >Does anyone know how to make whatever this character visible so that it can > >be edited out? And, of course, it should be looked into for the next build. > > A hex editor would probably work best, if you don't have one yet, there > are lots freely available. Certain viewer applications might also help, > e.g. the Total Commander file manager supports hex view. You could also > put the file online and I'll have a look. >
Received on Wednesday, 18 February 2004 09:45:00 UTC