W3C home > Mailing lists > Public > html-tidy@w3.org > April to June 2016

How to convert things like ã to utf-8?

From: Peng Yu <pengyu.ut@gmail.com>
Date: Tue, 17 May 2016 08:11:34 -0500
Message-ID: <CABrM6w=dnDGktmaOL_si4hPcsYp7cK+w58xN5Z=nM_jV6aX6Ug@mail.gmail.com>
To: html-tidy@w3.org

For the following xml, I want to convert things like &#x00E3; to utf-8.


But I still see things like &#x00E3; with the following command. Does
anybody know what is the correct command to do the conversion? Thanks.

~$ curl "http://ieeexplore.ieee.org/gateway/ipsSearch.jsp?sortfield=py&hc=1000&sortorder=desc&an=6706948"
> tmp1.xml
~$ tidy -q -xml --preserve-entities no --output-encoding utf8 tmp1.xml
> tmp2.xml
~$ vim tmp1.xml
~$ grep Bilz tmp2.xml
<![CDATA[Bilz&#x00E3;  Ara&#x00FA; jo;  Liang Zhao]]>

Received on Tuesday, 17 May 2016 13:12:01 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 17 May 2016 13:12:04 UTC