Site type heuristics, usage inferences and other matters from Rotan Hanrahan on 2009-01-08 (public-bpwg-ct@w3.org from January 2009)

From: Rotan Hanrahan <rotan.hanrahan@mobileaware.com>
Date: Thu, 8 Jan 2009 10:20:22 -0000
To: <public-bpwg-ct@w3.org>
Message-ID: <D5306DC72D165F488F56A9E43F2045D301DA4DB1@FTO.mobileaware.com>
According to Brian Wilson [1], MAMA identified itself as an Opera 9.10
browser when gathering data from 3.5 million Web sites. This effectively
eliminates all mobile content from adaptive sites, such as those that
might use MobileAware technology or equivalent. 

Brian further reveals [2] that the URLs were initially derived from the
DMoz collection, a publically edited collection of Web addresses and
predominantly desktop sites. He then added the URLs of the W3C members
(also desktop sites, though a few are also adaptive). Finally, he added
the top Alexa sites, a list generated by a toolbar installed on desktop
browsers. I therefore see this collection as being somewhat
representative of the desktop Web, and completely ignorant of anything
that may be present in the mobile aspect of the Web.

Luca Passani's recent suggestion [3] to examine the almost 1000
"xml+xhtml" Web sites from the collection is interesting. If the
collection was a more representative sample of the whole Web (including
an appropriate proportion of mobile sites) and the analysis had been
conducted with a range of User Agents (weighted according to prevalence)
then perhaps Luca's examination of the "xml+xhtml" sites would be
revealing. And if a statistically supported correlation between
"xml+xhtml" and "mobile" was thereby demonstrated then "Luca's
Heuristic" would have more solid grounds for adoption.

Unfortunately, the MAMA statistics and analysis are insufficient to
provide the justification that Luca's Heuristic requires. In fact, given
the way the URL set was created and pages retrieved, an analysis of the
data might reveal that they are not as mobile as Luca suggests (or
hopes).

Perhaps if someone were to provide Brian with the additional resources,
or collaboration, then we might get what we need. He needs to create a
more representative URL set, and a weighted sample of User Agents, then
cross the two. We should acknowledge that Opera have already been very
generous in providing Brian resources to date, and making the results
available to all, but we cannot expect Opera to carry all the burden, no
matter how beneficial the results might be.

It may be in the best interest of the providers of transcoding proxies
to conduct and publish this analysis. If they can demonstrate one or
more clear heuristics for identifying mobile sites (without requiring
any changes to be made by site administrators, page authors or Web
application developers) then they can use this as fairly concrete means
of supporting their operating models. Whether or not they should
actually intervene in the traffic between site and client without the
approval of the site/author/user is a separate matter that is beyond
technology or statistics, but at least with some solid data and viable
acceptable heuristics they would have a basis for executing a policy
that would be seen as "best practice".

Regarding the Web technology itself, we must accept that (absent any
security mechanism) any URL that is accessible via HTTP essentially
makes the corresponding resources fair game for public use. We have
higher layers of control in the form of copyright (not universally
accepted, but at least generally understood) that can be associated with
the resources to limit what polite society can do with the "property".
Sometimes copyright is unambiguous, and sometimes it is merely inferred.
Detecting that the authored work was intended for desktop, mobile,
automobile, or whatever, has no bearing on the protection afforded by
copyright. So even if we do identify good site-type detection
mechanisms, this argument about whether or not a transcoding proxy
should intervene is irrelevant. What we need is more general guidelines
about what is appropriate/permitted once a representation of the
authored work has been extracted from a machine via HTTP.

As a Web author, I would have an expectation that my work would be
somewhat adapted by the browser used to view it. The user may change the
dimensions of the window, change the zoom, maybe even change the fonts
if the browser has some limitations. If I were a more knowledgeable Web
author, I would probably also be aware of adaptation of my text to
spoken form via synthesis, as a service to people with visual problems.
Would I see this as a derivative work? (If I published a written version
of my music and then someone synthesised it back to audio, would I see
this as a derivative work competing with my published music CD?) Things
get a little grey when we acknowledge that some adaptation is going to
happen when we publish via the Web. As Web authors, we should know that
this is the case, and should accept it as part of the normal process of
Web publication. Therefore there should be no concerns regarding
copyright for such basic forms of adaptation.

But what if the browser were to make major changes to my published work?
What if it removed major pieces that I initially considered essential to
the intellectual property in my published work? Would I still be as
tolerant, or would I feel that somehow my work is being violated? I
might, or I might not. But if I didn't anticipate this could happen,
then I might never have taken steps to make my feelings known. I might
never have thought it necessary to say "do not mangle this content". Yet
this is exactly the situation we find with some mobile browsers and, of
course, transcoding proxies. Content is being seriously adapted, often
without consultation with the authors or users. (The underlying
motivations for doing this are irrelevant to the argument, so I won't
pursue them.)

The heuristics we seek might enable us to detect that a site was
designed for use on mobile devices. This is essentially a signal from
the author to say "my content is this way because this is how I want it
delivered to mobile devices". The question now is whether we should use
this signal to prevent further adaptation. Given that my company
provides adaptive technology to deliver content to mobile devices, I
would certainly like this signal to be respected, fully. However, I am
going to briefly take the side of the transcoding proxy and present an
argument in their favour.

Up to fairly recently, when an author placed content on a Web site, this
was a signal that "my content is this way because I expect it to be
viewed by Web browsers". Following an earlier argument, there is also
the inference that "I also expect that a few people will resize, zoom or
otherwise adapt it to their particular requirements". This is quite
normal, and basically how people expected the Web to work. Now (perhaps
via Luca's Heuristic) the authors can say "my content is this way
because I expect it to be viewed on mobile browsers," and perhaps it is
also reasonable to infer that "I also expect that a few people with
resize, zoom or otherwise adapt it to their particular requirements."
Exactly the same inference we had in the legacy Web. Given that
adaptation of retrieved resources is an anticipated possibility, and
given that some users will not be able to perceive the content
acceptably without adaptation, the role of the transcoding proxy is
acceptable.

So, perhaps what is missing is not just a signal for the "this is mobile
content" but also a signal to adjust the inference. An author should be
able to add "and I do not want any adaptation to be performed" (or
possibly some refinements to this).

I conclude therefore that we have two requirements:

1.	A reliable means of identifying that the author's delivery
intentions (i.e. "this is mobile/desktop/non-visual/... content").
2.	A reliable means of identifying additional constraints (i.e. "do
not further adapt").

To date, these requirements have been somewhat conflated, particularly
as we may have assumed that declaring content to be mobile implies that
we are also declaring that no further adaptation is allowed. Perhaps a
better inference is that further adaptation of mobile content should be
avoided if it is being delivered to a mobile device. That's possibly
something that I could live with, as both an author and a provider of
adaptive technology.

Finally, as a Web user, I would still like to have a say in any
adaptation. On the desktop I have direct control. I can resize the
window, zoom the page, add a speech synthesis plug-in etc. But on the
mobile devices I have less control. Sometimes my browser gives me some
optional features, such as "fit to page".  I'm sure the page authors
won't mind me doing a little bit of fit-to-page with their sites. What I
don't like is someone in the middle doing adaptation without my
knowledge. If I have the option to permit/prevent this "feature", then
I'm happy. I would use it when necessary. And again, I'm sure the
content authors won't mind me doing this. I'm just not happy if some
uninvited gooseberry* decides to make it a three-some.

---Rotan

[1] http://dev.opera.com/articles/view/mama-methodology/
[2] http://dev.opera.com/articles/view/mama-the-url-set/
[3] http://lists.w3.org/Archives/Public/public-bpwg-ct/2009Jan/0022.html

* Possibly a particularly Irish phrase... :)
Received on Thursday, 8 January 2009 10:21:11 UTC