- From: Felix Sasaki <fsasaki@w3.org>
- Date: Wed, 22 Jun 2005 11:47:11 +0900
- To: www-international@w3.org
Here is some information about the term "XML-encode" in the google
sitemaps FAQ.
Best, Felix.
On Wed, 22 Jun 2005 07:20:23 +0900, Adam M. Costello wrote:
> Felix Sasaki <fsasaki@w3.org> wrote:
>
>> Chris Lilley recently recognized some strange usage of the term
>> "XML-encoded" in the documentation of Google sitemaps. Do you know
>> who we should contact to change the documentation?
>
> I've filed a bug report, which appears below. Thanks for reporting
> this!
>
> which has contact information.
>
> AMC, http://www.nicemice.net/amc/
>
> http://www.google.com/webmasters/sitemaps/docs/en/protocol.html#Frequently_Asked_Questions
>
> The answer to the first question, "How do I XML-encode a URL?",
> contradicts
> itself.
>
> First, it says to follow the instructions in HTML 4.0 section B.2.1,
> which
> is all about representing non-ASCII characters using UTF-8 and URI
> percent-escaping, which is necessary only because the URI syntax is
> ASCII-only, it has nothing to do with HTML/XML syntax. Section B.2.1
> says
> nothing about escaping ASCII characters (like less-than, greater-than,
> ampersand, and double-quote) using SGML character references to get
> around
> the restrictions imposed by HTML/XML syntax.
>
> But then the FAQ goes on to give an example that just calls the python
> library function xml.sax.saxutils.escape(), which just escapes less-than,
> greater-than, and ampersand using SGML character references. That
> function
> has nothing to do with non-ASCII characters and URI percent-escaping.
>
> In other words, this FAQ introduces a new non-standard term
> ("XML-encode"),
> and then gives two completely different definitions for it.
>
> Please figure out what kinds of escaping are really needed for URLs in
> sitemaps, and refer to the proper authoritative specs as necessary
> (RFC-3986 for URIs, RFC-3997 for IRIs and their conversion to URIs, and
> the
> XML 1.0 or 1.1 spec for XML (not an HTML spec)). If non-ASCII IRIs are
> not
> allowed in sitemaps, please make it clear that that is a restriction
> imposed by the sitemap document type, not a restriction of XML in
> general.
>
> Thanks!
>
> Credit to Felix Sasaki and Chris Lilley at W3C for bringing this to my
> attention.
>
> ------- Additional Comments From amc 06/21/05 15:10 -------
> I forgot to mention: It would probably be good to avoid introducing a
> new
> official-sounding-but-non-standard term ("XML-encode"). You could use
> standard terminology as found in the specs, or just plain English like
> "encode" or "represent".
Received on Wednesday, 22 June 2005 02:47:17 UTC