- From: Gerald Oskoboiny <gerald@w3.org>
- Date: Sun, 26 Apr 1998 01:25:10 -0400 (EDT)
- To: www-html-editor@w3.org
Hi, I note that the "robots" section wasn't updated in the recently re-released HTML 4.0 spec. Could I ask why? Do you plan to issue another update in the future and fix it then, or do you disagree that the things I mentioned (attached below) are errors? Also, I just noticed that in addition to the things mentioned below, the text in B.4.1 still says: There must be exactly one "User-agent" field. which is wrong. (I also pointed this out before [1], on Dec 10.) [1] http://lists.w3.org/Archives/Member/www-html-editor/1997OctDec/0095.html Thanks, -- Gerald Oskoboiny <gerald@w3.org> +1 617 253 2920 System Administrator, W3C http://www.w3.org/People/Gerald/ World Wide Web Consortium, MIT Laboratory for Computer Science 545 Technology Square, Room NE43-353 Cambridge MA 02139 USA ---------- Forwarded message ---------- From: Gerald Oskoboiny <gerald@w3.org> Date: Fri, 10 Apr 1998 17:22:14 -0400 (EDT) To: www-html-editor@w3.org Subject: Errors in B.4.1 Search Robots Hi, at http://www.w3.org/TR/REC-html40/appendix/notes.html#h-B.4.1.1 it says: > Some tips: URI's are case-sensitive, and "/robots.txt" string must be > all lower-case. Blank lines are not permitted. This last bit ("Blank lines are not permitted.") is incorrect, or at least quite misleading the way it is currently written. Blank lines *are* permitted in the robots.txt file, just not within a single "record". (though "record" doesn't seem to be defined anywhere here.) I still think it might be a good idea to cite some other source, like one of: http://www.kollar.com/robots.html http://info.webcrawler.com/mak/projects/robots/norobots-rfc.html http://info.webcrawler.com/mak/projects/robots/robots.html I also think we should resist the urge to include stuff like this in future specs; this section really doesn't seem to belong in an HTML spec at all! I understand it was probably put there because there aren't any other easily citable sources, but in that case I think we should quickly publish whatever material we want to reference as a NOTE and reference that, because at least that way it can be updated more easily if there are problems. Later in that same section, it says: > Robots and the META element > > The META element allows HTML authors to tell visiting robots whether a > document may be indexed, or used to harvest more links. No server > administrator action is required. > > In the following example a robot should neither index this document, > nor analyze it for links. > > <META name="ROBOTS" content="NOINDEX, NOFOLLOW"> > > The list of terms in the content is ALL, INDEX, NOFOLLOW, NOINDEX. > The name and the content attribute values are case-insensitive. Where are these terms defined? Thanks! Gerald -- Gerald Oskoboiny <gerald@w3.org> +1 617 253 2920 System Administrator, W3C http://www.w3.org/People/Gerald/ World Wide Web Consortium, MIT Laboratory for Computer Science 545 Technology Square, Room NE43-353 Cambridge MA 02139 USA
Received on Sunday, 26 April 1998 01:25:11 UTC