Re: Initial research into installable web apps from Ernesto Jiménez on 2013-11-12 (public-web-mobile@w3.org from November 2013)

From: Ernesto Jiménez <erjica@gmail.com>
Date: Tue, 12 Nov 2013 12:36:07 +0000
To: Marcos Caceres <w3c@marcosc.com>
Cc: public-web-mobile@w3.org
Message-ID: <CAMueN_LxitN+P_JAsdVwZ2y8R+L7F4Xbouph-bCLUH30aULaYg@mail.gmail.com>
On Mon, Nov 11, 2013 at 8:27 PM, Marcos Caceres <w3c@marcosc.com> wrote:

>
> On Saturday, November 9, 2013 at 11:50 PM, Ernesto Jiménez wrote:
> >
> > I did a quick & dirty extraction from the webdevdata.org (
> http://webdevdata.org) to have a go at it. Just extracted the top meta
> header names in order to see how popular are the app-specific ones such as
> application-name or apple-mobile-web-app-capable. Given that the viewport
> tag seems to be extended, I also extracted the top properties in viewport.
> >
> > https://gist.github.com/ernesto-jimenez/7390115
> >
> > It has been quick, so numbers are not accurate.
>
> This is great, but yeah… I’m getting different results on the same data
> set (you are using the Oct 30 set, right?). For
> apple-mobile-web-app-capable, I get 1163 sites. Results are here:
> https://gist.github.com/marcoscaceres/7419589
>
> Searches I’m doing are just grepping:
>
> find ./ -name "*ml.txt" | xargs grep -l  "apple-mobile-web-app-capable”
>
> What method are you using to get your results?



I used a different method. Rather than just using grep I wrote 60 lines of
Go to do a command line tool that would read a file and print all meta tag
names it found.

Then I do:
$ find ./ -name "*ml.txt" | parallel "print_meta_tags {} >> meta_tags"

That results in a 316,535 line long CSV containing filename and meta tag
name.

$ cat meta_tags | grep apple-mobile-web-app-capable | wc -l
    1149

I've checked your method and the discrepancies seem to come mainly from:
  * Commented meta tags.
e.g: 16/nationchannel.com_167cb1ae269bf0c09ae5fd3496e26848.html.txt
  * Meta tags added in JS. e.g:
d9/harristeeter.com_d977ea7bae0fa6a33de5fef5c5e1efd7.html.txt


BTW, what user-agent was used to create the dataset? Some sites might be
doing user-agent sniffing to switch between their mobile and desktop sites
and we might be missing some data.

BTW, have you seen the following?
> “the .csv files with tag usage and attribute usage”:
> http://lists.w3.org/Archives/Public/public-webdevdata/2013Nov/0015.html
>
>
> Really useful!
>

I was a bit disconnected recently and I'm catch-ing up. I'll check the
threads :)



>  > I'm happy to help with the draft, but I'm not doing a pull request yet.
> I should probably fix my W3C account first, since it's still linked to my
> previous company. I did send an application to join the group as invited
> expert, in case I can help out.
>
> That would be awesome if you could. My idea is to now take a sample of
> about ~250 sites (for a confidence of 95% given the dataset size) and see
> if they are using the tag “properly”. That is, and I strongly suspect, that
> very few sites that claim to be “installable” actually function as
> installable web apps.
>
> I would really need help with this. I would like to split this task
> amongst 2-5 people, each of us looking at if these sites actually work as
> applications once installed. We would need to come up with some simple
> criteria for that… it’s pretty self evident, but with some caveats. For
> example:
>
> 1. forecast.io - yes, works as application, but it’s not useable as a
> Website on the iPhone!
> 2. variety.com - only “installed" page “works”, but clicking on *any
> link* (even same domain) breaks the “installed app” illusion.
> 3. squawka.com - declares to be capable, but presents the Desktop site.
> …
>
> WDYT?
>

That sounds good, but it's going to be time-consuming. I would rather do a
first pass on the data to review stats on what tags are in use and how.
After that, based on the data, we can get into more time-consuming research
based on what we observe.

In my opinion, the first pass on data will already give us some info on
what developers intend to support, even if they don't implement it
properly. Then we can dig deeper on how they are actually implementing it.
Received on Tuesday, 12 November 2013 12:36:56 UTC