W3C home > Mailing lists > Public > html-tidy@w3.org > January to March 2019

Re: UTF8 /ASCII- us error when running Tiny URL in Linux

From: Lee <ler762@gmail.com>
Date: Fri, 8 Feb 2019 06:45:37 -0500
Message-ID: <CAD8GWsu8Pj6m0-FYkVauKF76Atw16jfLUw-Gc8F8SscefKh9gw@mail.gmail.com>
To: Jacob Renhald <jacobrenhald@outlook.com>
Cc: "html-tidy@w3.org" <html-tidy@w3.org>
On 2/7/19, Jacob Renhald <jacobrenhald@outlook.com> wrote:
> Been trying to tidy up the html strings on one of my websites, running the
> code through linux. For some reason I can't seem to make it work.
>
> I've run:
> sudo apt-get install tidy
>
> To "tidy" it up I go:
>
> curl localhost address | tidy -iq (please note I have all articles stored as
> a xhtml file).
>
> From my understanding the -q is for quiet input while the "i" is for indents
> and it fixes the main issue.
>
> I'm trying to tidy up all the htmls on this subpage:
> https://www.kredittkortinfo.no/artikler/, which is a big mess.
>
> Problem I'm running into is that the UTF8 gets translated into the ascii-USA
> version and I can no longer read the text file....I must be doing something
> wrong.

It looks like 'tidy -iq -utf8' should work:
$ man tidy
   Character encodings
       -utf8  use UTF-8 for both input and output

but it didn't for me with LANG=C

Just out of curiosity - what output does 'locale' give you?

This does work for me:
export LANG=nb_NO.utf8   (or en_US.utf8 or even C.utf8)
tidy -iq test.html

Regards,
Lee
Received on Friday, 8 February 2019 11:46:03 UTC

This archive was generated by hypermail 2.3.1 : Friday, 8 February 2019 11:46:04 UTC