- From: Felix Sasaki <fsasaki@w3.org>
- Date: Wed, 22 Jun 2005 11:47:11 +0900
- To: www-international@w3.org
Here is some information about the term "XML-encode" in the google sitemaps FAQ. Best, Felix. On Wed, 22 Jun 2005 07:20:23 +0900, Adam M. Costello wrote: > Felix Sasaki <fsasaki@w3.org> wrote: > >> Chris Lilley recently recognized some strange usage of the term >> "XML-encoded" in the documentation of Google sitemaps. Do you know >> who we should contact to change the documentation? > > I've filed a bug report, which appears below. Thanks for reporting > this! > > which has contact information. > > AMC, http://www.nicemice.net/amc/ > > http://www.google.com/webmasters/sitemaps/docs/en/protocol.html#Frequently_Asked_Questions > > The answer to the first question, "How do I XML-encode a URL?", > contradicts > itself. > > First, it says to follow the instructions in HTML 4.0 section B.2.1, > which > is all about representing non-ASCII characters using UTF-8 and URI > percent-escaping, which is necessary only because the URI syntax is > ASCII-only, it has nothing to do with HTML/XML syntax. Section B.2.1 > says > nothing about escaping ASCII characters (like less-than, greater-than, > ampersand, and double-quote) using SGML character references to get > around > the restrictions imposed by HTML/XML syntax. > > But then the FAQ goes on to give an example that just calls the python > library function xml.sax.saxutils.escape(), which just escapes less-than, > greater-than, and ampersand using SGML character references. That > function > has nothing to do with non-ASCII characters and URI percent-escaping. > > In other words, this FAQ introduces a new non-standard term > ("XML-encode"), > and then gives two completely different definitions for it. > > Please figure out what kinds of escaping are really needed for URLs in > sitemaps, and refer to the proper authoritative specs as necessary > (RFC-3986 for URIs, RFC-3997 for IRIs and their conversion to URIs, and > the > XML 1.0 or 1.1 spec for XML (not an HTML spec)). If non-ASCII IRIs are > not > allowed in sitemaps, please make it clear that that is a restriction > imposed by the sitemap document type, not a restriction of XML in > general. > > Thanks! > > Credit to Felix Sasaki and Chris Lilley at W3C for bringing this to my > attention. > > ------- Additional Comments From amc 06/21/05 15:10 ------- > I forgot to mention: It would probably be good to avoid introducing a > new > official-sounding-but-non-standard term ("XML-encode"). You could use > standard terminology as found in the specs, or just plain English like > "encode" or "represent".
Received on Wednesday, 22 June 2005 02:47:17 UTC