- From: Ruth Ellen Duerr <rduerr@Colorado.EDU>
- Date: Wed, 5 Feb 2014 10:47:54 -0700
- To: "public-vocabs@w3.org" <public-vocabs@w3.org>
- Message-ID: <6BE474FC-D07F-4A1E-ABC2-410A1BD12793@colorado.edu>
We’ve recently created a javascript MVC (backbone.js) application at NSIDC with schema.org<http://schema.org> tags that is indexable by Google. This is challenging because the search engines (Google, Yahoo, Bing…) don’t execute javascript when they crawl a page. For a site rendered on the client by javascript the site won’t be rendered by the search indexer so the schema.org<http://schema.org> tags won’t be present. The general problem can be understood by reading this document from Google: https://support.google.com/webmasters/answer/174992?hl=en. However, this article is aimed at sites that are server rendered but use AJAX to fetch and display sections of the site. We implemented this pattern by using PhantomJS to server side render our simple application when a request comes from a web crawler. This turned out to be relatively simple to implement, although it would be more complex with a rich application that has a lot of user interaction. Here are the steps for the full solution: 1) Add the meta tag to inform crawlers that this page needs to be crawled with the _escaped_fragment= parameter. This tag is: <meta name=”frragment” content=”!”> This is step 3 in the google support link above. 2) Setup a server to server-side render requests with the parameter _escaped_fragment_. We did this using express (http://expressjs.com) and PhantomJS (http://phantomjs.org/). When a request is made to our server with the _escaped_fragment_ parameter express makes that same request to a PhantomJS instance, scrapes the content and then returns that content. The code for this solution can be found in our bitbucket repository for this project (https://bitbucket.org/nsidc/dataset-landing-page). These are the most relevant files: index.html https://bitbucket.org/nsidc/dataset-landing-page/src/2f6d68107da4f3eed6f66af0545ea1398d2174f2/src/index.html?at=master Includes the <meta name=”frragment” content=”!”> for crawlers. server.js https://bitbucket.org/nsidc/dataset-landing-page/src/2f6d68107da4f3eed6f66af0545ea1398d2174f2/server.js?at=master The express server file that handles the routing and wraps the PhantomJS server side rendering. phantom-server.js https://bitbucket.org/nsidc/dataset-landing-page/src/2f6d68107da4f3eed6f66af0545ea1398d2174f2/src/phantom-server.js?at=master The phantom script open the page in PhantomJS and dump the contents for express. We hope this helps others who are using the Javascript MVC pattern and want their applications to be indexed. However, we note that testing whether this solution actually works is a totally frustrating experience as none of Google’s Webmaster tools work, so you quite literally won’t know whether your solution is working until after it has been pushed to production and Google has re-crawled your site (i.e., might take a while - it took a week in our case). Moreover, even then, the only way to know for sure is to have saved up some queries that you think might be affected by the changes and comparing rankings before and after the Google crawl.
Received on Wednesday, 5 February 2014 22:32:24 UTC