W3C home > Mailing lists > Public > www-international@w3.org > April to June 2005

Re: 'XML-encoded" as a misused term in Google sitemaps

From: Felix Sasaki <fsasaki@w3.org>
Date: Wed, 15 Jun 2005 12:08:18 +0900
Message-ID: <42AF9BA2.8070302@w3.org>
To: Chris Lilley <chris@w3.org>
Cc: www-international@w3.org

Hi Chris,

Thanks for the mail, and sorry for the late reply! "XML-encoded" is 
really a bad term, and the referenc eto HTML 4 is awful :( .
Do you know a way to contact them, or who to contact, except subscribing 
to [1]?

Again thanks a lot. Best, Felix.

[1] http://groups-beta.google.com/group/google-sitemaps

Chris Lilley wrote:

>Hello www-international,
>
>I noticed this curious term "XML-encoded"
>http://www.google.com/webmasters/sitemaps/docs/en/protocol.html#faq_xml_encoding
>
>on the Google sitemaps page. The problem is that it encourages people to
>assume that XML required an IRI to be escaped to a URI. It would be
>better if Google used the already defined terms, and made it clear that
>this escaping is a special requirement of their particular format and
>not of XML in general. The escaping should refer to RFC 3987 and not
>HTML 4 (which is not even an XML format, and is not the defining
>instance of the escape mechanism).
>
>It should also refer to RFC 3986 and not 2396, of course.
>
>http://www.google.com/webmasters/sitemaps/docs/en/protocol.html
>
>Q: How do I XML-encode a URL?
>
>To properly encode your URLs, follow the procedure recommended by the
>HTML 4.0 specification, section B.2.1. Convert the string to UTF-8 and
>then URL-escape the result. For details about Internationalized Resource
>Identifiers, also see RFC2396 (sections 2.3 and 2.4) and RFC3987.
>
>The following is an example python script for XML encoding a URL:
>
>    $ python
>    Python 2.2.2 (#1, Feb 24 2003, 19:13:11)
>    >>> import xml.sax.saxutils
>    >>> xml.sax.saxutils.escape("http://www.test.org/view?widget=3&count>2")
>
>The encoded URL from the example above is:
>
>    http://www.test.org/view?widget=3&amp;count&gt;2
>
>Q: Does it matter which character encoding method I use to generate my Sitemap files?
>
>Yes. Your Sitemap files must use UTF-8 encoding.
>
>
>  
>
Received on Wednesday, 15 June 2005 03:08:27 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 2 June 2009 19:17:05 GMT