- From: Gerald Oskoboiny <gerald@impressive.net>
- Date: Sun, 8 Jul 2001 13:58:03 -0400 (EDT)
- To: Elgoog <elgoog@google.com>, googlebot@google.com
- Cc: public message archive <www-archive@w3.org>
Hi,
A while ago I wrote to you pointing out a bug in Google's indexer,
that it does not index URIs that have '@'s in them. You seem to
have made some progress on that, since there are now some URIs
with '@'s in your index; sample:
http://www.google.com/search?hl=en&safe=off&q=+site%3Aimpressive.net+oskoboiny+%2B%22www-talk%22+archive+%22new+search%22&btnG=Google+Search
but it still fails to index URIs that look like:
http://impressive.net/archives/fogo/20000809025411.N7692@impressive.net
it only indexes URIs that look like this:
http://impressive.net/archives/www-talk/25020093151848@HUJIVMS
i.e. if there is a hostname after the '@', the pages are ignored.
(so ironically, the only messages in my archives that are being
indexed by google are those that have bogus message-id's :( )
Could you please fix this? Those URIs are valid, per sec. 2 of RFC 2396.
Past correspondence follows:
On Mon, Feb 19, 2001 at 06:05:16AM -0500, Gerald Oskoboiny wrote:
> On Tue, Sep 19, 2000 at 04:01:23PM -0700, Elgoog wrote:
> > At present, Google does not support special characters like !, @, or %. We
> > are looking to add this functionality in the long run and appreciate your
> > constructive criticism.
>
> Hmm... 5 months later google still doesn't index URIs with '@' in them.
> 5 months qualifies as "long run" in Web years, no?
>
> Come on, you probably just need to tweak a single regexp somewhere!
>
> Pretty please? I have hundreds of thousands of pages on the web
> with '@'s in their URIs, and many more to come.
>
> Thanks :)
>
> > Thanks for your support!
> > The Google Team
> >
> > -----Original Message-----
> > From: Gerald Oskoboiny [mailto:gerald@impressive.net]
> > Sent: Sunday, September 03, 2000 9:44 PM
> > To: googlebot@google.com
> > Subject: googlebot skips URIs with '@'?
> >
> >
> > Hi,
> >
> > Google is fantastic, way better than everything else out there.
> >
> > But...
> >
> > Why doesn't it index URIs that have '@'s in them? Sample:
> >
> > http://impressive.net/archives/fogo/20000809025411.N7692@impressive.net
> >
> > I'm pretty sure those are valid URIs, per sec. 2.2 of RFC 2396.
> >
> > alltheweb.com doesn't have any problem with them:
> >
> > http://www.ussc.alltheweb.com/cgi-bin/search?exec=FAST+Search&type=all&query=oskoboiny+fogo+mailing+list+archives
> >
> > and inktomi grabs them all, too.
> >
> > but google ignores them all:
> >
> > http://www.google.com/search?q=oskoboiny+fogo+mailing+list+archives&hl=en&safe=off&filter=0
> >
> > thanks,
> >
> > --
> > Gerald Oskoboiny <gerald@impressive.net>
> > http://impressive.net/people/gerald/
> >
> >
>
> --
> Gerald Oskoboiny <gerald@impressive.net>
> http://impressive.net/people/gerald/
--
Gerald Oskoboiny <gerald@impressive.net>
http://impressive.net/people/gerald/
Received on Sunday, 8 July 2001 14:00:12 UTC