- From: Gerald Oskoboiny <gerald@impressive.net>
- Date: Sun, 8 Jul 2001 13:58:03 -0400 (EDT)
- To: Elgoog <elgoog@google.com>, googlebot@google.com
- Cc: public message archive <www-archive@w3.org>
Hi, A while ago I wrote to you pointing out a bug in Google's indexer, that it does not index URIs that have '@'s in them. You seem to have made some progress on that, since there are now some URIs with '@'s in your index; sample: http://www.google.com/search?hl=en&safe=off&q=+site%3Aimpressive.net+oskoboiny+%2B%22www-talk%22+archive+%22new+search%22&btnG=Google+Search but it still fails to index URIs that look like: http://impressive.net/archives/fogo/20000809025411.N7692@impressive.net it only indexes URIs that look like this: http://impressive.net/archives/www-talk/25020093151848@HUJIVMS i.e. if there is a hostname after the '@', the pages are ignored. (so ironically, the only messages in my archives that are being indexed by google are those that have bogus message-id's :( ) Could you please fix this? Those URIs are valid, per sec. 2 of RFC 2396. Past correspondence follows: On Mon, Feb 19, 2001 at 06:05:16AM -0500, Gerald Oskoboiny wrote: > On Tue, Sep 19, 2000 at 04:01:23PM -0700, Elgoog wrote: > > At present, Google does not support special characters like !, @, or %. We > > are looking to add this functionality in the long run and appreciate your > > constructive criticism. > > Hmm... 5 months later google still doesn't index URIs with '@' in them. > 5 months qualifies as "long run" in Web years, no? > > Come on, you probably just need to tweak a single regexp somewhere! > > Pretty please? I have hundreds of thousands of pages on the web > with '@'s in their URIs, and many more to come. > > Thanks :) > > > Thanks for your support! > > The Google Team > > > > -----Original Message----- > > From: Gerald Oskoboiny [mailto:gerald@impressive.net] > > Sent: Sunday, September 03, 2000 9:44 PM > > To: googlebot@google.com > > Subject: googlebot skips URIs with '@'? > > > > > > Hi, > > > > Google is fantastic, way better than everything else out there. > > > > But... > > > > Why doesn't it index URIs that have '@'s in them? Sample: > > > > http://impressive.net/archives/fogo/20000809025411.N7692@impressive.net > > > > I'm pretty sure those are valid URIs, per sec. 2.2 of RFC 2396. > > > > alltheweb.com doesn't have any problem with them: > > > > http://www.ussc.alltheweb.com/cgi-bin/search?exec=FAST+Search&type=all&query=oskoboiny+fogo+mailing+list+archives > > > > and inktomi grabs them all, too. > > > > but google ignores them all: > > > > http://www.google.com/search?q=oskoboiny+fogo+mailing+list+archives&hl=en&safe=off&filter=0 > > > > thanks, > > > > -- > > Gerald Oskoboiny <gerald@impressive.net> > > http://impressive.net/people/gerald/ > > > > > > -- > Gerald Oskoboiny <gerald@impressive.net> > http://impressive.net/people/gerald/ -- Gerald Oskoboiny <gerald@impressive.net> http://impressive.net/people/gerald/
Received on Sunday, 8 July 2001 14:00:12 UTC