Re: change proposal for issue-86, was: ISSUE-86 - atom-id-stability - Chairs Solicit Proposals from Toby Inkster on 2010-04-16 (public-html@w3.org from April 2010)

From: Toby Inkster <tai@g5n.co.uk>
Date: Fri, 16 Apr 2010 10:53:56 +0100
To: HTMLWG WG <public-html@w3.org>
Message-ID: <1271411637.15832.42.camel@ophelia2.g5n.co.uk>
Hmmm... a lot of the talk around this issue begs the question of what
the nature of sameness is. Two Atom <entry> elements need the same
identifier if they're the same entry, but under what circumstances are
they the same entry?

I am not the same person I was 10 years ago. This is true in a physical
sense - most of my body cells are less than 10 years old - and in a less
physical sense too - the person I was 10 years ago is not necessarily a
person with whom I share the same beliefs, hopes and fears.

So in what sense am I the same person? Certainly there's a continuity of
existence between Y2K me and Y2.01K me; a continuity of consciousness.
If I'm the same person today as I was yesterday, and the same person
yesterday that I was the day before, then I can trace that sameness back
as far as I like until my conception.

This is a useful technique for evaluating sameness that can be applied
to real-world objects, but applying it to virtual, digital resources
does not always work out the way we'd like. Take an example from popular
programmer's companion "diff". Say I have the following code:

 if (x) {
   foo;
 }

And one day I change this to:

 if (x) {
   foo;
 }
 if (y) {
   bar;
 }

A diff tool, when asked to pick out which lines have been added to my
code, might select:

 }
 if (y) {
   bar;

Logically, we consider it to have selected the "wrong" closing brace. It
has decided that the closing brace at the end of the full block has
remained the same; whereas we'd think that the closing brace after
"foo;" has remained the same.

Under what circumstances is one HTML page the same as another HTML page?
Consider a popular company goes bankrupt; their brand still has value
though, and a small competitor buys the right to use that brand (name
and logo), and the popular company's domain name from the administrator.
The new owner of the domain name sets up a new website reusing the old
company's domain name. They set up an "aboutus.html" page which, perhaps
by pure chance, or perhaps in an effort to capture traffic from the old
site, uses exactly the same URL as the old company's "aboutus.html"
page.

These two pages have different authors, perhaps different titles,
certainly different subjects (they refer to legally separate companies),
etc. Is it right to generate the same Atom identifiers for both? Are
they the same entries in an Atom sense?

Atom says that an <entry> must always retain the same identifier; it
must never change. But it doesn't say when two bits of content represent
the same <entry>.

So what's a practical solution? One way out of the quagmire would be to
generate not Atom 1.0, but RDF using the RSS 1.0 vocabulary. (I say "RDF
using the RSS 1.0 vocabulary" to differentiate between that and "RSS
1.0" which, in addition to using the RSS 1.0 vocabulary, also uses a
limited subset of RDF/XML syntax and has certain other restrictions in
order to seem backwards-compatible-ish with earlier versions of RSS.
Converters from HTML5 would want to, when possible, aim to comply with
those limitations and restrictions in order to increase compatibility
with real world feed readers, but not slavishly adhere to them.)

With RDF and the open world assumption, it's always possible to leave
stuff out. For example, a bloodType property might be defined with a
rule stating that every Person has a bloodType. However, even if every
people *has* a bloodType, it doesn't mean that we know what their
bloodType is, or care to share it with the world. It's permissible to
provide information about the Person without providing their bloodType.
That's the open world assumption - we never have all the information.

So generating RDF using the RSS 1.0 vocabulary, we'd convert each HTML5
<article> element into an rss:item and then tack on any information that
we could find about it (title, author, date, etc) and not concern
ourselves too much about any particular bits of information which would
be impossible to find.

-- 
Toby A Inkster
<mailto:mail@tobyinkster.co.uk>
<http://tobyinkster.co.uk>
Received on Friday, 16 April 2010 09:54:59 UTC