Re: Justifying breaking backward compatibility based on existing content (was Re: RDF 1.1 Lite Issue # 2: property vs rel)

On Tue, Oct 25, 2011 at 7:47 PM, Stéphane Corlosquet
<scorlosquet@gmail.com> wrote:
> On Tue, Oct 25, 2011 at 4:00 AM, Henri Sivonen <hsivonen@iki.fi> wrote:
>> Note that HTML5 does not try to be backwards-compatible with the HTML
>> 4.01 spec. It tries to be compatible with existing content. That is,
>> it tries to be compatible with content that's actually on the Web--not
>> with content that one could construct based on the HTML 4.01 spec.
>
> Thanks for raising this point, Henri. You bring an interesting perspective.
> I'm curious to know how similar decisions are made in the context of HTML5.
> How does a public working group such a WHATWG (which afaik does not have the
> resources to index the whole web) go about deciding what feature or markup
> pattern can be dropped from a spec? Are there representative samples that
> you use? or is it merely based on what feedback you get from people who
> "show up" and give feedback to the working group? Do browser vendors such as
> Mozilla have any ability to help here? What do you do about deep pages
> hidden behind password or a noindex courtesy? Extrapolate the findings from
> the public web? The RDFa WG is seeking ways to assess what patterns are used
> or not used in the wild (tangible numbers or % tend to carry a lot of
> weight) so any hint would help.

First of all, there aren't agreed-upon standards of evidence. The
first step really boils down to convincing the people who can effect
change in browser code to dare to change the code as the first step.
As a second step, if a change to a browser is made, it requires the
change to stick around for release (i.e. no one who can reverse the
change freaks out too much over early reports of breakage).

First, how to convince people who can effect change to try to make a change?

The first heuristic is using existing browser behavior as a proxy
indicator of what browsers need to be like in order to successfully
process existing content. If Gecko, WebKit, Trident and Presto all do
the same thing, we generally tend to accept this as evidence that
either that behavior was necessary in order to successfully render Web
content or if it wasn't initially, it is by now because Web authors do
all kinds of things and let stuff that "works" stick and this
particular thing now "works" in all the four engines.

For example, the kind of craziness discussed at
https://plus.google.com/u/0/107429617152575897589/posts/TY8zGybVos4
arises from content depending on a bug in ancient Netscape versions.
OTOH, the standards mode behavior for <p><table> discussed at
http://hsivonen.iki.fi/last-html-quirk/ wasn't required by existing
content (if you exclude Acid2 from existing content) when it got
universally implemented but by now there's probably enough
standards-mode content that depends on it that it's no longer
worthwhile to rock the boat and try to make the standards mode behave
like the quirks mode.

So when the four engines behave the same, that's usually it and we
don't change stuff without a *really* good reason. But to give a
counter-example, even universal behavior has gotten change on security
and efficiency grounds: Before HTML5, HTML parser in browser rewound
input and reparsed in a different mode if they hit the end of file
inside a script element or inside a comment. (And style, title and
textarea, though less consistently at least in the case of title.) In
HTML5, we changed this in order to implement a defense in depth
feature against attacks based on the attacker forcing a premature end
of file (which would make different parts of the page source be
interpreted as script) and in order to make the parser less crazy. The
obvious de-crazying of the area failed. It broke JS Beautifier, for
example. We had a high-level idea of what an alternative design would
have to look like. In this case, ideas were first tested by running
regular expression over data obtained by crawling pages linked to by
the Open Directory Project. A solution I suggested failed miserably.
The winning solution was suggested by Simon Pieters of Opera. From
testing with pages linked to from the ODP, we knew that the solution
would break a few pages out there. I was convinced that the breakage
was small enough that it was worthwhile to try to it in Firefox. I was
also in a position where I was able to make the in-browser experiment
happen. During the Firefox 4 beta period, I received only *one* report
of in-the-wild breakage. It was on a bank site, which usually makes
people freak out. However, since a low level of breakage was expected,
we left the code it and since then there haven't been any other
reports about the issue reported against Firefox. IIRC, when Chrome
implemented the same thing, they found one Google property triggering
a problem and had the Google property fixed.

When all four engines don't already agree, Firefox doing something or
Safari doing something can be taken as evidence that the behavior is
safe and other browsers could adopt it, too. If Opera alone does
something, it's generally not convincing enough, because Opera has
rather low market share, so Opera behaving a particular way isn't
*alone* convincing evidence that the behavior is successful on the
Web. If IE alone does something, it's fairly convincing, but by now
there's so much code that browser-sniffs IE vs. everyone else, so it
could be that IE's behavior is only successful on IE-specific code
paths.

However, we've used "IE and Opera do something" as convincing enough
evidence to change Gecko and WebKit.

We've used the "Safari does it and seems to get away with it"
heuristic multiple times. For example, unifying the namespace of HTML
elements in documents parsed from text/html and document parsed from
application/xhtml+xml was something that I dared to try in Firefox,
because Safari did it and was getting away with it. Making the change
in Firefox broke Facebook, because Facebook browser-sniffed and served
different code to all four engines. However, Facebook was phenomenally
responsive and fixed their code quickly.

And this brings us to the topic of what you can break (i.e. what makes
people freak out in the second step when a change is being tested). On
one hand, breaking major sites seems really scary and something that
seems like and obvious thing not to do. However, particularly Facebook
and Google push the limits of the platform much more than others and
also have engineers continuously working on stuff. So if a change only
breaks a particular line of code on Facebook or a particular line of
code on a Google property, it may be possible to change a browser
anyway and get Facebook or Google to change their code a little bit.

Breaking even a couple of major sites that aren't as big as Facebook
or Google and whose daily activities doesn't revolve around pushing
the limits of the Web platform is generally something that's not OK.
OTOH breaking a couple of long-tail sites might be OK. But breaking a
large number of long-tail sites is not OK. In particular, breaking
output from one authoring tool when the output has been spread all
over the Web is generally not OK even if none of it was on major
sites. (But it may happen without anyone noticing early enough and the
breakage sticks.)

Some vocabulary design decisions and early parsing algorithm design
decisions were based experiment performed at Google using the data
Google has from crawling the Web. Also, so decisions (I gave one
example above) were informed by running experiments of dotbot
(http://www.dotnetdotcom.org/) data or data downloaded by taking URLs
from the Open Directory Project. Using dotbot or ODP data isn't
generally a good idea if you are investigating something that's so
rare on the Web at the time of doing the research that you see almost
none of it in small-scale general-purpose crawls. For example, the
decision not to support prefixed SVG elements in text/html was
informed by downloading and parsing all the SVG files in Wikimedia
Commons, because it seemed likely that if an SVG authoring tool was
popular, some content authored with it would have found its way into
Wikimedia Commons.

So how would this apply to RDFa? Most of the above doesn't apply,
except that it's not particularly productive to try to come up with
some rules of evidence with percentage occurrence cut-offs in advance.

To be realistic, RDFa has much less legacy than HTML (what an
understatement), so it might not be particularly worthwhile to put a
lot of effort into saving the RDFa legacy, because RDFa doesn't yet
have a vast body of interoperable content being consumed by different
major consumers. For example, OGP data is consumed mainly by
Facebook's code and the v vocabulary is consumed mainly by Google's
Rich Snippets. If the RDFa community doesn't want to throw away that
legacy, it might make sense to see how Facebook consumes OGP data
(hard-wired prefix; xmlns:foo ignored) and spec that for OGP
consumption and see how Rich Snippets consumes v and spec that for
processing v data. (Or if it feels wrong to grandfather
corporation-specific rules for Facebook and Google stuff, stop
pretending that those are part of RDFa. Already neither Facebook nor
Google implements RDFa as specced, so that stuff never really was RDFa
anyway.) And then do a crawl analogous to the Wikimedia commons crawl
for SVG to discover what RDFa not make for Facebook or Google looks
like and generalizing about that as if it was a separate format from
OGP and the v vocabulary. (I don't know what would be to long-tail
RDFa what Wikimedia commons is for SVG, though, as a way of locating
stuff without having to do a Google-scale crawl.)

-- 
Henri Sivonen
hsivonen@iki.fi
http://hsivonen.iki.fi/

Received on Thursday, 27 October 2011 10:25:39 UTC