W3C home > Mailing lists > Public > html-tidy@w3.org > April to June 2016

Re: How to convert things like ã to utf-8?

From: Geoff McLane <ubuntu@geoffair.info>
Date: Wed, 18 May 2016 17:56:44 +0200
To: html-tidy@w3.org
Message-ID: <573C90BC.2000905@geoffair.info>
Hi Peng,

Thank you for your inquiry...

Could you add it as an issue on -
  https://github.com/htacg/tidy-html5/issues
where I am sure it will get more attention...

Also add the version of tidy used, and the
expected output... thanks...

Regards,
Geoff.

On 17/05/16 15:11, Peng Yu wrote:
> Hi,
>
> For the following xml, I want to convert things like &#x00E3; to utf-8.
>
> http://ieeexplore.ieee.org/gateway/ipsSearch.jsp?sortfield=py&hc=1000&sortorder=desc&an=6706948
>
> But I still see things like &#x00E3; with the following command. Does
> anybody know what is the correct command to do the conversion? Thanks.
>
> ~$ curl "http://ieeexplore.ieee.org/gateway/ipsSearch.jsp?sortfield=py&hc=1000&sortorder=desc&an=6706948"
>> tmp1.xml
> ~$ tidy -q -xml --preserve-entities no --output-encoding utf8 tmp1.xml
>> tmp2.xml
> ~$ vim tmp1.xml
> ~$ grep Bilz tmp2.xml
> <![CDATA[Bilz&#x00E3;  Ara&#x00FA; jo;  Liang Zhao]]>
>
> --
> Regards,
> Peng
>
Received on Wednesday, 18 May 2016 15:57:14 UTC

This archive was generated by hypermail 2.3.1 : Wednesday, 18 May 2016 15:57:17 UTC