- From: Felix Sasaki <fsasaki@w3.org>
- Date: Wed, 15 Jun 2005 12:08:18 +0900
- To: Chris Lilley <chris@w3.org>
- Cc: www-international@w3.org
Hi Chris, Thanks for the mail, and sorry for the late reply! "XML-encoded" is really a bad term, and the referenc eto HTML 4 is awful :( . Do you know a way to contact them, or who to contact, except subscribing to [1]? Again thanks a lot. Best, Felix. [1] http://groups-beta.google.com/group/google-sitemaps Chris Lilley wrote: >Hello www-international, > >I noticed this curious term "XML-encoded" >http://www.google.com/webmasters/sitemaps/docs/en/protocol.html#faq_xml_encoding > >on the Google sitemaps page. The problem is that it encourages people to >assume that XML required an IRI to be escaped to a URI. It would be >better if Google used the already defined terms, and made it clear that >this escaping is a special requirement of their particular format and >not of XML in general. The escaping should refer to RFC 3987 and not >HTML 4 (which is not even an XML format, and is not the defining >instance of the escape mechanism). > >It should also refer to RFC 3986 and not 2396, of course. > >http://www.google.com/webmasters/sitemaps/docs/en/protocol.html > >Q: How do I XML-encode a URL? > >To properly encode your URLs, follow the procedure recommended by the >HTML 4.0 specification, section B.2.1. Convert the string to UTF-8 and >then URL-escape the result. For details about Internationalized Resource >Identifiers, also see RFC2396 (sections 2.3 and 2.4) and RFC3987. > >The following is an example python script for XML encoding a URL: > > $ python > Python 2.2.2 (#1, Feb 24 2003, 19:13:11) > >>> import xml.sax.saxutils > >>> xml.sax.saxutils.escape("http://www.test.org/view?widget=3&count>2") > >The encoded URL from the example above is: > > http://www.test.org/view?widget=3&count>2 > >Q: Does it matter which character encoding method I use to generate my Sitemap files? > >Yes. Your Sitemap files must use UTF-8 encoding. > > > >
Received on Wednesday, 15 June 2005 03:08:27 UTC