W3C home > Mailing lists > Public > public-html-bugzilla@w3.org > September 2009

[Bug 7681] link tag: rel: associate pages about the same person across many sites

From: <bugzilla@wiggum.w3.org>
Date: Sat, 26 Sep 2009 21:13:16 +0000
To: public-html-bugzilla@w3.org
Message-Id: <E1MreZw-0001JL-Re@wiggum.w3.org>

--- Comment #5 from Nick Levinson <Nick_Levinson@yahoo.com>  2009-09-26 21:13:16 ---
None of those are as useful for this use case. Each lacks most of the needed
properties or their ready equivalents and each requires learning yet another
technology when HTML5 already offers one we know.

The Google Social Graph API is limited to URLs and name
(http://code.google.com/apis/socialgraph/docs/attributes.html) (and I'm unclear
how you use the API for HTML markup). The person in whom you're interested has
to have URLs you consider authoritative. They may not exist. To use it to
describe specific data about a person other than URLs requires believing any
URLs you cite are stable. You'd often have to limit the URLs to those you
control. That makes otherme not very useful for many famous and semifamous
people, including those in history. Many Web pages are about historical figures
and many more are about modern people who are likely to be significant in
history, like heads of state.

FOAF is for XML and therefore is compatible with XHTML, but is a bit more
complicated to use with HTML, because some of its requirements don't apply to
elsewhere in HTML. FOAF has many good features but, of 8 I proposed here, it
lacks 6: death date, when flourished, nationality, birth place, and a way to
refer to authoritative sources if they're not openly online (e.g., subscription
databases and Who's Who books) (http://xmlns.com/foaf/spec/). In addition,
despite having read probably dozens of books on Web matters (among hundreds on
computers generally), I don't recall FOAF. It deserves publicity, but HTML
already has that and already has a mechanism to do what I'm proposing, a
mechanism described in books on the language.

hCard and the closely-related RDFa grammar Google supports are too limited,
because they don't have enough fields available. Parsers are to ignore anything
not understood. A proposal for a date-of-death field is pending for hCard, but
not for other fields, and accepting the one proposal may require abandoning the
nearly 1:1 relationship with the vCard RFC. Multiple birth dates are required
when we know, say, a person was born October 16 but not whether that was in
1919 or 1918, often the case with entertainers, but hCard limits to a single
date of birth or requires more vagueness than the known facts may justify.
Birth and death dates may come from different calendars for people whose lives
straddle a calendar change (one occurred about two and half centuries ago in
the U.S.) and hCard doesn't accommodate those changes. While fn is flexible
enough, n isn't for some name methods found internationally and n is impliable
from fn, the inflexibility creating erroneous results not attributable to the
content author. An ident-scheme in this thread's proposal might refer to a
large collection of biographies that may be in book form or in an
access-limited website and thus not have a URL or a full URL, and hCard doesn't
offer compatible properties.

I did find one problem with my proposal. Where I wrote "Other biographical
sources offer vague dates for when someone flourishide nationalities or birth
places.", I probably meant something like "Other biographical sources offer
vague dates for when someone flourish[ed and, to distinguish someone, commonly
prov]ide nationalities or birth places." I'm proposing we provide linkages for
that kind of data.

Let's say Prof. X writes about Attila the Hun. So does Prof. Y. The two
professors don't trust each other, but they agree on their subject and when he
flourished. They don't want to link to each other's pages because they don't
want to trust their rivals' work or stability. At the same time, search
engines' content analyses are more geared to popular writing. One way scholarly
writing may differ is by using key terms less often per thousand words of total
text, because it's presumed readers already know what they're reading about,
and that lowers ranking, which may increase the spread between their papers,
making finding same-subject people results harder for searchers. And requiring
search engines to analyze free-form text like "he was brought by the stork as
the most beautiful baby you ever saw on April 16, 1963" to extract an
identifying birthdate is too much to ask of an algorithm, so any technology we
use for this general purpose is likely to need hand-coding, making the page
author's time a factor.

Solution: If both professors place rel canonical-human "Attila the Hun" and
what happen to be the same dates for flourishing or birth and death in their
pages, once per page head, search engines can recognize that Prof. X and Prof.
Y are almost certainly talking about the same person. The certainty will go up
when using standard biographical identifiers. This becomes even more important
when the name in question is coincidentally shared by multiple people, say, a
Panamanian judge and an Indian moviemaker, and searchers aren't sure which
nationality or occupation makes their subject important. The searchers want the
search engines to separate the results by subject person. And the rel being
essentially a line or two saves authoring time.

This link rel would solve the problem.

Thank you.


Configure bugmail: http://www.w3.org/Bugs/Public/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the QA contact for the bug.
Received on Saturday, 26 September 2009 21:13:27 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 20:01:01 UTC