- From: Toby Inkster <tai@g5n.co.uk>
- Date: Fri, 16 Apr 2010 10:53:56 +0100
- To: HTMLWG WG <public-html@w3.org>
Hmmm... a lot of the talk around this issue begs the question of what the nature of sameness is. Two Atom <entry> elements need the same identifier if they're the same entry, but under what circumstances are they the same entry? I am not the same person I was 10 years ago. This is true in a physical sense - most of my body cells are less than 10 years old - and in a less physical sense too - the person I was 10 years ago is not necessarily a person with whom I share the same beliefs, hopes and fears. So in what sense am I the same person? Certainly there's a continuity of existence between Y2K me and Y2.01K me; a continuity of consciousness. If I'm the same person today as I was yesterday, and the same person yesterday that I was the day before, then I can trace that sameness back as far as I like until my conception. This is a useful technique for evaluating sameness that can be applied to real-world objects, but applying it to virtual, digital resources does not always work out the way we'd like. Take an example from popular programmer's companion "diff". Say I have the following code: if (x) { foo; } And one day I change this to: if (x) { foo; } if (y) { bar; } A diff tool, when asked to pick out which lines have been added to my code, might select: } if (y) { bar; Logically, we consider it to have selected the "wrong" closing brace. It has decided that the closing brace at the end of the full block has remained the same; whereas we'd think that the closing brace after "foo;" has remained the same. Under what circumstances is one HTML page the same as another HTML page? Consider a popular company goes bankrupt; their brand still has value though, and a small competitor buys the right to use that brand (name and logo), and the popular company's domain name from the administrator. The new owner of the domain name sets up a new website reusing the old company's domain name. They set up an "aboutus.html" page which, perhaps by pure chance, or perhaps in an effort to capture traffic from the old site, uses exactly the same URL as the old company's "aboutus.html" page. These two pages have different authors, perhaps different titles, certainly different subjects (they refer to legally separate companies), etc. Is it right to generate the same Atom identifiers for both? Are they the same entries in an Atom sense? Atom says that an <entry> must always retain the same identifier; it must never change. But it doesn't say when two bits of content represent the same <entry>. So what's a practical solution? One way out of the quagmire would be to generate not Atom 1.0, but RDF using the RSS 1.0 vocabulary. (I say "RDF using the RSS 1.0 vocabulary" to differentiate between that and "RSS 1.0" which, in addition to using the RSS 1.0 vocabulary, also uses a limited subset of RDF/XML syntax and has certain other restrictions in order to seem backwards-compatible-ish with earlier versions of RSS. Converters from HTML5 would want to, when possible, aim to comply with those limitations and restrictions in order to increase compatibility with real world feed readers, but not slavishly adhere to them.) With RDF and the open world assumption, it's always possible to leave stuff out. For example, a bloodType property might be defined with a rule stating that every Person has a bloodType. However, even if every people *has* a bloodType, it doesn't mean that we know what their bloodType is, or care to share it with the world. It's permissible to provide information about the Person without providing their bloodType. That's the open world assumption - we never have all the information. So generating RDF using the RSS 1.0 vocabulary, we'd convert each HTML5 <article> element into an rss:item and then tack on any information that we could find about it (title, author, date, etc) and not concern ourselves too much about any particular bits of information which would be impossible to find. -- Toby A Inkster <mailto:mail@tobyinkster.co.uk> <http://tobyinkster.co.uk>
Received on Friday, 16 April 2010 09:54:59 UTC