W3C home > Mailing lists > Public > public-html-bugzilla@w3.org > March 2010

[Bug 6609] negative keywords-not meta tags

From: <bugzilla@wiggum.w3.org>
Date: Sun, 28 Mar 2010 19:19:37 +0000
To: public-html-bugzilla@w3.org
Message-Id: <E1Nvy1J-0001Op-8N@wiggum.w3.org>
http://www.w3.org/Bugs/Public/show_bug.cgi?id=6609


Nick Levinson <Nick_Levinson@yahoo.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|VERIFIED                    |REOPENED
           Keywords|                            |TrackerRequest
         Resolution|NEEDSINFO                   |




--- Comment #4 from Nick Levinson <Nick_Levinson@yahoo.com>  2010-03-28 19:19:36 ---
I've had difficulty getting UA makers to respond to feature requests before
HTML5 support. We need HTML5 support for some of them to prioritize the
feature. Thus, I'm requesting escalation.

Suggested title: negative keywords-not meta tags

Suggested text:

Rumor to the contrary notwithstanding, keyword meta elements do work, albeit
within limits. I did a test and also found confirmatory recent discussion
online about major search engines.

Insofar as they work, what's needed is a way to clarify relevance to one theme
by distinguishing it from another. Negative keywords would thus be helpful. For
example, a page about "virus" could be about computer viri or biological viri
but usually won't be about both. While major search engines may be intelligent
enough to distinguish in that well-known case, new subjects may not be well
known to search engine managers, and thus an author may prefer to control how
their theme is understood from the date of going live. A negative keyword could
quickly clarify the theme of the page.

Using body text may not be adequate. Consider a doctor writing a carefully
exhaustive article about aspirin's less-well-known uses and thus without
discussing headaches, since almost everyone already knows about that use. Being
careful, the doctor writes in the introduction that "the article will not
discuss headaches." Someone does a search for "aspirin NOT headache". They
should get that paper but they do not. A negative metatag may aid a search
engine in understanding the doctor's thematic intention and thus in supplying
what a searcher is seeking. Search engine designers would have to do some
careful work to handle the aspirin case as intended but they could do that far
more easily if we page authors have an HTML facility that would give search
engines something to work with.

Antonyms are usually a waste of time in this area, so the keywords-not
attribute need not be invoked just to provide an antonymy. Rather, this is for
cases where the same word serves very different meanings, such as _virus_,
including opposite meanings by the same word, such as _sanction_. Thus, writing
keywords-not would be infrequent, although the sheer scale of the Web and of
HTML usage means the attribute would be still used enough to warrant
recognition in a standard and adaptation by search engines.

Search engines give more weight to thematic words written directly into page
content. However, some thematic words may be difficult for authors to work into
text without going to some length to explain important complications, and that
might make the whole page too cumbersome, losing readers. If the main text is
to be short, leaving those secondary keywords out may be smarter writing of
content. This is often true when stating principles, which may be more easily
understood if stated in just a few words, leaving redundant particulars out.
But searchers may still use various common particulars to find this principle
via search engines. To support search, the keywords that represent the
particulars and are not in the visible text should be put into meta tags. Some
would go into meta elements with the keywords attribute. But, for some of them,
keywords-not may be the more relevant attribute. And that would keep the
positive keywords metatag from getting enormously long.

Keyword metatags long ago lost favor after their widespread abuse. However,
they are used by search engines; and I don't see how negative keywords are any
more susceptible to abuse than positive ones. Further, a page author could use
either positive or negative keywords without having to offer both so there'd be
no unwanted increase in the designer's workload. Optimizers could use
essentially the same tools to generate either kind of keyword. The only risk, I
think, is putting a word in both, but I think that would only be an author's
error, so each search engine could prepare for that eventuality any way they
see fit and editing software and validators could choose to alert an author to
the apparent conflict without requiring an author to change an element. Thus,
if a page author uses the same word in both but with differing case because one
represents a common product and the other a brand name the page author would
take the risk of being misunderstood by a search engine while a search engine
might observe the case distinction and consider how to handle it. The page
author could also use longer phrases either positively or negatively and thus
ease distinguishing themes.

Because of the relevance of Boolean NOT searches and for relative brevity and
to avoid an abbreviation that may not be familiar to speakers of other
languages, I propose calling it "keywords-not". I'm preparing to include
keywords-not in a website I'm designing, but I don't know when the site will go
live. My method will probably be to use a separate meta tag following the
metatag for keywords used positively, since they can't be combined into one
element, but I see no reason to require any position other than that both go
into the head, as one tag already must. E.g.,

<head>
. . . . .
<meta name="keywords" content="aspirin,heart,blood" />
<meta name="keywords-not" content="headache" />
. . . . .
</head>
<body>
<h1>Aspirin Except For Headaches</h1>
<p>. . . .</p>
</body>


-- 
Configure bugmail: http://www.w3.org/Bugs/Public/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the QA contact for the bug.
Received on Sunday, 28 March 2010 19:19:40 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Sunday, 28 March 2010 19:19:40 GMT