Re: Javascript client rendered applications and SEO from Ruth Ellen Duerr on 2014-02-07 (public-vocabs@w3.org from February 2014)

From: Ruth Ellen Duerr <rduerr@Colorado.EDU>
Date: Fri, 7 Feb 2014 15:28:35 -0700
To: Aaron Bradley <aaranged@gmail.com>, "public-vocabs@w3.org" <public-vocabs@w3.org>
Message-ID: <A8A62906-2BC1-40DF-8B1D-4D18CC31921A@colorado.edu>

Hi Aaron,

OK, you can delete the last sentence of my original post. You may eventually be able to determine whether Google has detected the schema.org<http://schema.org> data; but it may take a really, really, really long time. As of today 6 of our nearly a thousand data sets show up as having detected Schema.org<http://Schema.org> dataset information on the page you mentioned, though I will note that we have access to a test site provided by RPI which shows that all of our data sets actually have been indexed.

No we aren’t employing JSON-LD for our catalog pages; but here are a few links (the first 3 show up in the Structured data report, the last one not so much…).

* http://nsidc.org/data/G10007
* http://nsidc.org/data/ggd323
* http://nsidc.org/data/nsidc-0064
* http://nsidc.org/data/mod10a1 (which by the way was indexed Feb. 4 - 4 days after the data sets above!)

As you might expect these pages are all generated on-the-fly by a web service. The full list of data sets can be gotten by this other web service http://nsidc.org/data/search/#p=1/psize=25

I’ll repost to the other forum you mentioned. I posted here under recommendation from our collaborators at RPI.

Ruth

On Feb 7, 2014, at 2:08 PM, Aaron Bradley <aaranged@gmail.com<mailto:aaranged@gmail.com>> wrote:

Thanks for this Ruth - very informative.

Regarding your last paragraph where you describe the lack of Webmaster Tools support, have any of the items you've marked up using this method since appeared in the Structured Data report of Google's Webmaster Tools (it's the under "Search Appearance")?

Also curious if you've considered employing JSON-LD rather than coding inline (for all I know, that might be wholly inappropriate, and at this point we have little information on whether or not the search engines respect schema.org<http://schema.org/> for webpages expressed in JSON-LD)? And do you have any URLs you can share with us?

As your observations and questions are Google-specific you might want to consider reposting this, and/or posting follow-ups, on the appropriate Google forum (where you may have a better chance of Googlers weighing in):
https://productforums.google.com/forum/#!categories/webmasters/structured-data

On Wed, Feb 5, 2014 at 9:47 AM, Ruth Ellen Duerr <rduerr@colorado.edu<mailto:rduerr@colorado.edu>> wrote:
We’ve recently created a javascript MVC (backbone.js) application at NSIDC with schema.org<http://schema.org/> tags that is indexable by Google. This is challenging because the search engines (Google, Yahoo, Bing…) don’t execute javascript when they crawl a page. For a site rendered on the client by javascript the site won’t be rendered by the search indexer so the schema.org<http://schema.org/> tags won’t be present.

The general problem can be understood by reading this document from Google: https://support.google.com/webmasters/answer/174992?hl=en. However, this article is aimed at sites that are server rendered but use AJAX to fetch and display sections of the site. We implemented this pattern by using PhantomJS to server side render our simple application when a request comes from a web crawler. This turned out to be relatively simple to implement, although it would be more complex with a rich application that has a lot of user interaction.

Here are the steps for the full solution:
1) Add the meta tag to inform crawlers that this page needs to be crawled with the _escaped_fragment= parameter. This tag is:
<meta name=”frragment” content=”!”>
This is step 3 in the google support link above.

2) Setup a server to server-side render requests with the parameter _escaped_fragment_. We did this using express (http://expressjs.com<http://expressjs.com/>) and PhantomJS (http://phantomjs.org/). When a request is made to our server with the _escaped_fragment_ parameter express makes that same request to a PhantomJS instance, scrapes the content and then returns that content.

The code for this solution can be found in our bitbucket repository for this project (https://bitbucket.org/nsidc/dataset-landing-page). These are the most relevant files:

index.html
https://bitbucket.org/nsidc/dataset-landing-page/src/2f6d68107da4f3eed6f66af0545ea1398d2174f2/src/index.html?at=master
Includes the <meta name=”frragment” content=”!”> for crawlers.

server.js
https://bitbucket.org/nsidc/dataset-landing-page/src/2f6d68107da4f3eed6f66af0545ea1398d2174f2/server.js?at=master
The express server file that handles the routing and wraps the PhantomJS server side rendering.

phantom-server.js
https://bitbucket.org/nsidc/dataset-landing-page/src/2f6d68107da4f3eed6f66af0545ea1398d2174f2/src/phantom-server.js?at=master
The phantom script open the page in PhantomJS and dump the contents for express.

We hope this helps others who are using the Javascript MVC pattern and want their applications to be indexed.

However, we note that testing whether this solution actually works is a totally frustrating experience as none of Google’s Webmaster tools work, so you quite literally won’t know whether your solution is working until after it has been pushed to production and Google has re-crawled your site (i.e., might take a while - it took a week in our case). Moreover, even then, the only way to know for sure is to have saved up some queries that you think might be affected by the changes and comparing rankings before and after the Google crawl.

Received on Friday, 7 February 2014 22:29:01 UTC