Errors in B.4.1 Search Robots (fwd) from Gerald Oskoboiny on 1998-04-26 (www-html-editor@w3.org from April to June 1998)

From: Gerald Oskoboiny <gerald@w3.org>
Date: Sun, 26 Apr 1998 01:25:10 -0400 (EDT)
To: www-html-editor@w3.org
Message-ID: <Pine.SOL.3.96.980426011911.20223D-100000@anansi.w3.org>

Hi,

I note that the "robots" section wasn't updated in the recently
re-released HTML 4.0 spec. Could I ask why? Do you plan to issue
another update in the future and fix it then, or do you disagree
that the things I mentioned (attached below) are errors?

Also, I just noticed that in addition to the things mentioned
below, the text in B.4.1 still says:

    There must be exactly one "User-agent" field. 

which is wrong. (I also pointed this out before [1], on Dec 10.)

[1] http://lists.w3.org/Archives/Member/www-html-editor/1997OctDec/0095.html

Thanks,

-- 
Gerald Oskoboiny              <gerald@w3.org>  +1 617 253 2920
System Administrator, W3C     http://www.w3.org/People/Gerald/
World Wide Web Consortium, MIT Laboratory for Computer Science
545 Technology Square,  Room NE43-353  Cambridge MA  02139 USA

---------- Forwarded message ----------
From: Gerald Oskoboiny <gerald@w3.org>
Date: Fri, 10 Apr 1998 17:22:14 -0400 (EDT)
To: www-html-editor@w3.org
Subject: Errors in B.4.1 Search Robots

Hi, at

    http://www.w3.org/TR/REC-html40/appendix/notes.html#h-B.4.1.1

it says:

> Some tips: URI's are case-sensitive, and "/robots.txt" string must be
> all lower-case. Blank lines are not permitted.

This last bit ("Blank lines are not permitted.") is incorrect, or
at least quite misleading the way it is currently written.

Blank lines *are* permitted in the robots.txt file, just not within
a single "record". (though "record" doesn't seem to be defined
anywhere here.)

I still think it might be a good idea to cite some other source,
like one of:

    http://www.kollar.com/robots.html
    http://info.webcrawler.com/mak/projects/robots/norobots-rfc.html
    http://info.webcrawler.com/mak/projects/robots/robots.html

I also think we should resist the urge to include stuff like this in
future specs; this section really doesn't seem to belong in an HTML
spec at all! I understand it was probably put there because there
aren't any other easily citable sources, but in that case I think
we should quickly publish whatever material we want to reference
as a NOTE and reference that, because at least that way it can be
updated more easily if there are problems.

Later in that same section, it says:

> Robots and the META element 
> 
> The META element allows HTML authors to tell visiting robots whether a
> document may be indexed, or used to harvest more links. No server
> administrator action is required.
> 
> In the following example a robot should neither index this document,
> nor analyze it for links.
> 
>   <META name="ROBOTS" content="NOINDEX, NOFOLLOW">
> 
>   The list of terms in the content is ALL, INDEX, NOFOLLOW, NOINDEX.
>   The name and the content attribute values are case-insensitive.

Where are these terms defined?

Thanks!

Gerald
-- 
Gerald Oskoboiny              <gerald@w3.org>  +1 617 253 2920
System Administrator, W3C     http://www.w3.org/People/Gerald/
World Wide Web Consortium, MIT Laboratory for Computer Science
545 Technology Square,  Room NE43-353  Cambridge MA  02139 USA

Received on Sunday, 26 April 1998 01:25:11 UTC