W3C home > Mailing lists > Public > www-international@w3.org > April to June 2005

Re: The term "XML-encoded" in Google sitemaps

From: Felix Sasaki <fsasaki@w3.org>
Date: Wed, 22 Jun 2005 11:47:11 +0900
To: www-international@w3.org
Message-ID: <op.ssq3cxxrx1753t@ibm-60d333fc0ec.w3.mag.keio.ac.jp>

Here is some information about the term "XML-encode" in the google  
sitemaps FAQ.

Best, Felix.

On Wed, 22 Jun 2005 07:20:23 +0900, Adam M. Costello wrote:

> Felix Sasaki <fsasaki@w3.org> wrote:
>
>> Chris Lilley recently recognized some strange usage of the term
>> "XML-encoded" in the documentation of Google sitemaps.  Do you know
>> who we should contact to change the documentation?
>
> I've filed a bug report, which appears below.  Thanks for reporting
> this!
>
>  which has contact information.
>
> AMC, http://www.nicemice.net/amc/
>
> http://www.google.com/webmasters/sitemaps/docs/en/protocol.html#Frequently_Asked_Questions
>
> The answer to the first question, "How do I XML-encode a URL?",  
> contradicts
> itself.
>
> First, it says to follow the instructions in HTML 4.0 section B.2.1,  
> which
> is all about representing non-ASCII characters using UTF-8 and URI
> percent-escaping, which is necessary only because the URI syntax is
> ASCII-only, it has nothing to do with HTML/XML syntax.  Section B.2.1  
> says
> nothing about escaping ASCII characters (like less-than, greater-than,
> ampersand, and double-quote) using SGML character references to get  
> around
> the restrictions imposed by HTML/XML syntax.
>
> But then the FAQ goes on to give an example that just calls the python
> library function xml.sax.saxutils.escape(), which just escapes less-than,
> greater-than, and ampersand using SGML character references.  That  
> function
> has nothing to do with non-ASCII characters and URI percent-escaping.
>
> In other words, this FAQ introduces a new non-standard term  
> ("XML-encode"),
> and then gives two completely different definitions for it.
>
> Please figure out what kinds of escaping are really needed for URLs in
> sitemaps, and refer to the proper authoritative specs as necessary
> (RFC-3986 for URIs, RFC-3997 for IRIs and their conversion to URIs, and  
> the
> XML 1.0 or 1.1 spec for XML (not an HTML spec)).  If non-ASCII IRIs are  
> not
> allowed in sitemaps, please make it clear that that is a restriction
> imposed by the sitemap document type, not a restriction of XML in  
> general.
>
> Thanks!
>
> Credit to Felix Sasaki and Chris Lilley at W3C for bringing this to my
> attention.
>
> ------- Additional Comments From amc  06/21/05 15:10 -------
> I forgot to mention:  It would probably be good to avoid introducing a  
> new
> official-sounding-but-non-standard term ("XML-encode").  You could use
> standard terminology as found in the specs, or just plain English like
> "encode" or "represent".
Received on Wednesday, 22 June 2005 02:47:17 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 2 June 2009 19:17:05 GMT