- From: Daniel Veillard <daniel@veillard.com>
- Date: Thu, 14 Feb 2008 22:46:59 +0100
- To: "Henry S. Thompson" <ht@inf.ed.ac.uk>
- Cc: public-xml-core-wg <public-xml-core-wg@w3.org>
On Thu, Feb 14, 2008 at 05:28:33PM +0000, Henry S. Thompson wrote: > > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > So I did a little experiment: Using a moderately random collection of > http: URIs created elsewhere for a web language corpus, I looked at > 10404 web pages. Of these 1520 pages, from 758 distinct hosts, > contained references (22408 in total) to URIs which included # > followed by a digit. Of the references 2939 were local, i.e. of the > form "#[0-9]...". Of the 13024 _unique_ fragments 11453 were actually > integers, i.e. of the form #[0-9]+ and a further 66 were decimals, > i.e. of the form #[0-9]+.[0-9]* > > I then refetched the same pages (139 didn't make it the second time, > so the total was down to 10265), and found 992 pages, from 636 > distinct hosts, which contained anchors (<a href= or <a id=) which > began with a digit. > > So, there are a _lot_ of ostensibly broken fragments and anchors out > there. > > No, I did not check what percentage of the data was XML, I'll do that. Unfortunately matches the feedback I got. Basically people expect the XPath query id('123') to work, i.e. IDNeSS of the attribute to be asserted even if it didn't match the proper validity constraint associated and that to work from XPath/XPointer. Things like VISA processing commonly use number only IDs, and expect that to work for manipulating and signing content http://www.aleksey.com/pipermail/xmlsec/2003/001528.html I'm sure there is many other framework where this brokeness is expected. Daniel -- Daniel Veillard | libxml Gnome XML XSLT toolkit http://xmlsoft.org/ daniel@veillard.com | Rpmfind RPM search engine http://rpmfind.net/ http://veillard.com/ | virtualization library http://libvirt.org/
Received on Thursday, 14 February 2008 21:45:17 UTC