W3C home > Mailing lists > Public > www-archive@w3.org > July 2001

Re: googlebot skips URIs with '@'?

From: Gerald Oskoboiny <gerald@impressive.net>
Date: Sun, 8 Jul 2001 13:58:03 -0400 (EDT)
To: Elgoog <elgoog@google.com>, googlebot@google.com
Cc: public message archive <www-archive@w3.org>
Message-ID: <20010708135721.A14680@impressive.net>
Hi,

A while ago I wrote to you pointing out a bug in Google's indexer,
that it does not index URIs that have '@'s in them. You seem to
have made some progress on that, since there are now some URIs
with '@'s in your index; sample:

    http://www.google.com/search?hl=en&safe=off&q=+site%3Aimpressive.net+oskoboiny+%2B%22www-talk%22+archive+%22new+search%22&btnG=Google+Search

but it still fails to index URIs that look like:

    http://impressive.net/archives/fogo/20000809025411.N7692@impressive.net

it only indexes URIs that look like this:

    http://impressive.net/archives/www-talk/25020093151848@HUJIVMS

i.e. if there is a hostname after the '@', the pages are ignored.
(so ironically, the only messages in my archives that are being
indexed by google are those that have bogus message-id's :( )

Could you please fix this? Those URIs are valid, per sec. 2 of RFC 2396.

Past correspondence follows:

On Mon, Feb 19, 2001 at 06:05:16AM -0500, Gerald Oskoboiny wrote:
> On Tue, Sep 19, 2000 at 04:01:23PM -0700, Elgoog wrote:
> > At present, Google does not support special characters like !, @, or %.  We
> > are looking to add this functionality in the long run and appreciate your
> > constructive criticism.
> 
> Hmm... 5 months later google still doesn't index URIs with '@' in them.
> 5 months qualifies as "long run" in Web years, no?
> 
> Come on, you probably just need to tweak a single regexp somewhere!
> 
> Pretty please? I have hundreds of thousands of pages on the web
> with '@'s in their URIs, and many more to come.
> 
> Thanks :)
> 
> > Thanks for your support!
> > The Google Team
> > 
> > -----Original Message-----
> > From: Gerald Oskoboiny [mailto:gerald@impressive.net]
> > Sent: Sunday, September 03, 2000 9:44 PM
> > To: googlebot@google.com
> > Subject: googlebot skips URIs with '@'?
> > 
> > 
> > Hi,
> > 
> > Google is fantastic, way better than everything else out there.
> > 
> > But...
> > 
> > Why doesn't it index URIs that have '@'s in them? Sample:
> > 
> >     http://impressive.net/archives/fogo/20000809025411.N7692@impressive.net
> > 
> > I'm pretty sure those are valid URIs, per sec. 2.2 of RFC 2396.
> > 
> > alltheweb.com doesn't have any problem with them:
> > 
> > http://www.ussc.alltheweb.com/cgi-bin/search?exec=FAST+Search&type=all&query=oskoboiny+fogo+mailing+list+archives
> > 
> > and inktomi grabs them all, too.
> > 
> > but google ignores them all:
> > 
> > http://www.google.com/search?q=oskoboiny+fogo+mailing+list+archives&hl=en&safe=off&filter=0
> > 
> > thanks,
> > 
> > --
> > Gerald Oskoboiny <gerald@impressive.net>
> > http://impressive.net/people/gerald/
> > 
> > 
> 
> -- 
> Gerald Oskoboiny <gerald@impressive.net>
> http://impressive.net/people/gerald/

-- 
Gerald Oskoboiny <gerald@impressive.net>
http://impressive.net/people/gerald/
Received on Sunday, 8 July 2001 14:00:12 UTC

This archive was generated by hypermail 2.3.1 : Wednesday, 7 January 2015 14:42:01 UTC