Re: Javascript client rendered applications and SEO from Aaron Bradley on 2014-02-07 (public-vocabs@w3.org from February 2014)

From: Aaron Bradley <aaranged@gmail.com>
Date: Fri, 7 Feb 2014 13:08:57 -0800
To: Ruth Ellen Duerr <rduerr@colorado.edu>
Cc: "public-vocabs@w3.org" <public-vocabs@w3.org>
Message-ID: <CAMbipBuHXjfC8SownE4CqWCMYhgSdZVYg1FH4ePF6rQK9dyUbg@mail.gmail.com>
Thanks for this Ruth - very informative.

Regarding your last paragraph where you describe the lack of Webmaster
Tools support, have any of the items you've marked up using this method
since appeared in the Structured Data report of Google's Webmaster Tools
(it's the under "Search Appearance")?

Also curious if you've considered employing JSON-LD rather than coding
inline (for all I know, that might be wholly inappropriate, and at this
point we have little information on whether or not the search engines
respect schema.org for webpages expressed in JSON-LD)?  And do you have any
URLs you can share with us?

As your observations and questions are Google-specific you might want to
consider reposting this, and/or posting follow-ups, on the appropriate
Google forum (where you may have a better chance of Googlers weighing in):
https://productforums.google.com/forum/#!categories/webmasters/structured-data



On Wed, Feb 5, 2014 at 9:47 AM, Ruth Ellen Duerr <rduerr@colorado.edu>wrote:

> We've recently created a javascript MVC (backbone.js) application at NSIDC
> with schema.org tags that is indexable by Google. This is challenging
> because the search engines (Google, Yahoo, Bing...) don't execute javascript
> when they crawl a page. For a site rendered on the client by javascript the
> site won't be rendered by the search indexer so the schema.org tags won't
> be present.
>
> The general problem can be understood by reading this document from
> Google: https://support.google.com/webmasters/answer/174992?hl=en.
> However, this article is aimed at sites that are server rendered but use
> AJAX to fetch and display sections of the site. We implemented this pattern
> by using PhantomJS to server side render our simple application when a
> request comes from a web crawler. This turned out to be relatively simple
> to implement, although it would be more complex with a rich application
> that has a lot of user interaction.
>
> Here are the steps for the full solution:
> 1) Add the meta tag to inform crawlers that this page needs to be crawled
> with the _escaped_fragment= parameter. This tag is:
> <meta name="frragment" content="!">
> This is step 3 in the google support link above.
>
> 2) Setup a server to server-side render requests with the parameter
> _escaped_fragment_. We did this using express (http://expressjs.com) and
> PhantomJS (http://phantomjs.org/). When a request is made to our server
> with the _escaped_fragment_ parameter express makes that same request to a
> PhantomJS instance, scrapes the content and then returns that content.
>
> The code for this solution can be found in our bitbucket repository for
> this project (https://bitbucket.org/nsidc/dataset-landing-page). These
> are the most relevant files:
>
> index.html
>
> https://bitbucket.org/nsidc/dataset-landing-page/src/2f6d68107da4f3eed6f66af0545ea1398d2174f2/src/index.html?at=master
> Includes the <meta name="frragment" content="!"> for crawlers.
>
> server.js
>
> https://bitbucket.org/nsidc/dataset-landing-page/src/2f6d68107da4f3eed6f66af0545ea1398d2174f2/server.js?at=master
> The express server file that handles the routing and wraps the PhantomJS
> server side rendering.
>
> phantom-server.js
>
> https://bitbucket.org/nsidc/dataset-landing-page/src/2f6d68107da4f3eed6f66af0545ea1398d2174f2/src/phantom-server.js?at=master
> The phantom script open the page in PhantomJS and dump the contents for
> express.
>
> We hope this helps others who are using the Javascript MVC pattern and
> want their applications to be indexed.
>
> However, we note that testing whether this solution actually works is a
> totally frustrating experience as none of Google's Webmaster tools work, so
> you quite literally won't know whether your solution is working until after
> it has been pushed to production and Google has re-crawled your site (i.e.,
> might take a while - it took a week in our case).  Moreover, even then, the
> only way to know for sure is to have saved up some queries that you think
> might be affected by the changes and comparing rankings before and after
> the Google crawl.
>
Received on Friday, 7 February 2014 21:09:25 UTC