Re: UTF8 /ASCII- us error when running Tiny URL in Linux

On 2/7/19, Jacob Renhald <jacobrenhald@outlook.com> wrote:
> Been trying to tidy up the html strings on one of my websites, running the
> code through linux. For some reason I can't seem to make it work.
>
> I've run:
> sudo apt-get install tidy
>
> To "tidy" it up I go:
>
> curl localhost address | tidy -iq (please note I have all articles stored as
> a xhtml file).
>
> From my understanding the -q is for quiet input while the "i" is for indents
> and it fixes the main issue.
>
> I'm trying to tidy up all the htmls on this subpage:
> https://www.kredittkortinfo.no/artikler/, which is a big mess.
>
> Problem I'm running into is that the UTF8 gets translated into the ascii-USA
> version and I can no longer read the text file....I must be doing something
> wrong.

It looks like 'tidy -iq -utf8' should work:
$ man tidy
   Character encodings
       -utf8  use UTF-8 for both input and output

but it didn't for me with LANG=C

Just out of curiosity - what output does 'locale' give you?

This does work for me:
export LANG=nb_NO.utf8   (or en_US.utf8 or even C.utf8)
tidy -iq test.html

Regards,
Lee

Received on Friday, 8 February 2019 11:46:03 UTC