Errors in B.4.1 Search Robots

Hi, at

    http://www.w3.org/TR/REC-html40/appendix/notes.html#h-B.4.1.1

it says:

> Some tips: URI's are case-sensitive, and "/robots.txt" string must be
> all lower-case. Blank lines are not permitted.

This last bit ("Blank lines are not permitted.") is incorrect, or
at least quite misleading the way it is currently written.

Blank lines *are* permitted in the robots.txt file, just not within
a single "record". (though "record" doesn't seem to be defined
anywhere here.)

I still think it might be a good idea to cite some other source,
like one of:

    http://www.kollar.com/robots.html
    http://info.webcrawler.com/mak/projects/robots/norobots-rfc.html
    http://info.webcrawler.com/mak/projects/robots/robots.html

I also think we should resist the urge to include stuff like this in
future specs; this section really doesn't seem to belong in an HTML
spec at all! I understand it was probably put there because there
aren't any other easily citable sources, but in that case I think
we should quickly publish whatever material we want to reference
as a NOTE and reference that, because at least that way it can be
updated more easily if there are problems.

Later in that same section, it says:

> Robots and the META element 
> 
> The META element allows HTML authors to tell visiting robots whether a
> document may be indexed, or used to harvest more links. No server
> administrator action is required.
> 
> In the following example a robot should neither index this document,
> nor analyze it for links.
> 
>   <META name="ROBOTS" content="NOINDEX, NOFOLLOW">
> 
>   The list of terms in the content is ALL, INDEX, NOFOLLOW, NOINDEX.
>   The name and the content attribute values are case-insensitive.

Where are these terms defined?

Thanks!

Gerald
-- 
Gerald Oskoboiny              <gerald@w3.org>  +1 617 253 2920
System Administrator, W3C     http://www.w3.org/People/Gerald/
World Wide Web Consortium, MIT Laboratory for Computer Science
545 Technology Square,  Room NE43-353  Cambridge MA  02139 USA

Received on Friday, 10 April 1998 17:22:16 UTC