Re: Make your data indexable by search engines

This is a very interesting discussion and if you don't mind I'd like to
throw in a practical example. We work with the Scottish Government to help
them publish statistical data as linked data.  They were asking me earlier
this week: how can we get our data to appear in search engine results?

I'd say we were following good practices for spatial and statistical data
publishing - though perhaps not yet 'best' practices, as there is always
more that you can do!

But it's not focused on search engine ranking, and the end result is that
the data is not yet prominent in the search results of major search
engines, though all the big ones can and do crawl the site - admittedly
it's a relatively new site with a lot of pages, so will take a while before
it's fully indexed, aside from any ranking issues.

For example, I tried a Google search for 'population of Dundee'.  That
picks up a number apparently from Wikipedia with a number from 2004 of
141,870.  The wikipedia latest https://en.wikipedia.org/wiki/Dundee has a
more recent figure of 148,260 from 2014 - that matches the official figure,
but doesn't make it to the headline Google result.

The official estimate for the population of Dundee can be found at this
page about Dundee (actually the Dundee City council area):
- http://statistics.gov.scot/doc/statistical-geography/S12000042
(see the data tab: http://statistics.gov.scot/doc/statistical-
geography/S12000042?tab=data)

So the government answer to the question is that in 2014 the population was
148,260.  How do we help people find that?

There's also a page for the individual RDF Data Cube observation for the
most recently published data (2014)
http://statistics.gov.scot/data/population-estimates/
year/2014/S12000042/gender/all/age/all/people/count

This dataset metadata summarises the methodology: http://
statistics.gov.scot/data/population-estimates?tab=about

In this case Google is presumably doing some special-case stuff with
Wikipedia, then after that it is returning pages that contain the words in
the search term, ranked according to however Google ranks stuff these days.

In this case, we've got some high quality machine- and human-readable
spatial data on the web - what can our best practices advise me to do to
make this easier for people to find?


Cheers

Bill




On 24 August 2016 at 11:44, Byron Cochrane <bcochrane@linz.govt.nz> wrote:

> Hi Linda,
>
> My short response here is less is more. Let's make these BPs more punchy
> and accessible wherever we can.  I agree with the general intent here that
> making data crawlable by search engines is useful.  (Although I would like
> to better understand the real use cases. They don't seem all that solid to
> me.  I think this BP helps but not in the way stated.)  I feel that most of
> the arguments weaken the argument rather than strengthening it because they
> sound like hacks and not best practice.  So why include them if the point
> can be made without?
>
> Perhaps also I spent too many years studying Christopher Alexander's
> "Pattern Languange" years ago. That is what the DWBP is styled after either
> directly or indirectly.  This document it feels to me is drifting from that
> by focusing too much on the arguments and not the practice. I could go into
> great detail about why I don't like many of the arguments in this
> particular section but I fear it would an unnecessary distraction from
> creating a good product.
>
> Yet if you think I should then sure.
>
> Cheers,
> Byron
> ________________________________________
> From: Linda van den Brink [l.vandenbrink@geonovum.nl]
> Sent: Wednesday, August 24, 2016 7:06 PM
> To: Byron Cochrane; 'Dan Brickley'
> Cc: 'eparsons@google.com'; 'SDW WG (public-sdw-wg@w3.org)'
> Subject: RE: Make your data indexable by search engines
>
> DCAT is more for getting your dataset metadata into data portal catalogs
> like CKAN.
>
> Byron I still have to look in detail at your proposal compared to the
> current BP text. From briefly glancing at it, I gather you propose to
> remove quite a bit of text and I’m not sure I’m happy about that. If I know
> what the gist is of the issue you have with the current text, we can have a
> discussion. Meanwhile, what do others think?
>
> Van: Byron Cochrane [mailto:bcochrane@linz.govt.nz]
> Verzonden: woensdag 24 augustus 2016 00:57
> Aan: 'Dan Brickley'
> CC: 'eparsons@google.com'; 'SDW WG (public-sdw-wg@w3.org)'
> Onderwerp: RE: Make your data indexable by search engines
>
> Hi Dan,
>
> Thanks for the feedback.  I wondered about DCAT.  It was a last minute
> addition thinking it might help because it would provide linkages that
> crawlers could follow??  If it is not appropriate let’s take it out.  I by
> no means claim any expertise in SEO.
>
> Cheers,
> Byron
>
> From: Dan Brickley [mailto:danbri@google.com]
> Sent: Tuesday, 23 August 2016 6:18 p.m.
> To: Byron Cochrane
> Cc: eparsons@google.com<mailto:eparsons@google.com>; SDW WG (
> public-sdw-wg@w3.org<mailto:public-sdw-wg@w3.org>)
> Subject: Re: Make your data indexable by search engines
>
>
>
> On 23 August 2016 at 03:55, Byron Cochrane <bcochrane@linz.govt.nz<mailto:
> bcochrane@linz.govt.nz>> wrote:
> Hi,
>
> I have been struggling with the “Make your data indexable by search
> engines” BP.   While I agree that it is useful to make data discoverable by
> common search engines, there is much is the discussion include in this BP
> that I take issue with.  But also, it seems too wordy to be easily
> understood.  So I have created a “Starter for Ten” revision that I feel
> focuses on the main idea while removing the contentious arguments while not
> decreasing the impact (I hope).
>
> Here it is:
>
> Make your data indexable by search engines
>
> Search engines should be able to crawl and index metadata for spatial data
> on the web.
>
> Why?
>
> In SDIs, data are commonly managed and published through the provision of
> authoritative ISO 19115 metadata collated in web based catalogs.  These
> catalogs and metadata may be difficult for non-professionals to find and
> use as they often do not support discovery through common search engines.
> These data therefore do not find their broadest audience. This is
> particularly true for SpatialThings that reside inside datasets.
>
> Intended Outcome
>
> Metadata for spatial data, datasets and SpatialThings, are indexable by
> search engine crawlers thereby making these data discoverable through
> common search engines.
>
> Possible Approach to Implementation
>
> To make your data indexable by search engines expose the appropriate
> elements of the spatial metadata (ISO 19115 and other) in formats that
> crawlers can use, such as DCAT, Schema.org, microdata and Opensearch.
> Where possible, do this at both the dataset and SpatialThing level.
>
> Which search engines use DCAT?
>
> Dan
>
>
> Look forward to getting some feedback.
>
> Cheers,
>
> Byron Cochrane
> SDI Technical Leader
> New Zealand Geospatial Office
>
> E  bcochrane@linz.govt.nz<mailto:bcochrane@linz.govt.nz>| DDI 04 460
> 0576| M 021 794 501
>
> Wellington Office, Level 7, Radio New Zealand House, 155 The Terrace
> PO Box 5501, Wellington 6145, New Zealand | T 04 460 0110
> W  www.linz.govt.nz<http://www.linz.govt.nz/> | data.linz.govt.nz<
> http://www.data.linz.govt.nz/>
> [cid:image001.png@01D1FDE6.2764ADF0]
>
>
> ________________________________
> This message contains information, which may be in confidence and may be
> subject to legal privilege. If you are not the intended recipient, you must
> not peruse, use, disseminate, distribute or copy this message. If you have
> received this message in error, please notify us immediately (Phone 0800
> 665 463 or info@linz.govt.nz<mailto:info@linz.govt.nz>) and destroy the
> original message. LINZ accepts no responsibility for changes to this email,
> or for any attachments, after its transmission from LINZ. Thank You.
>
>
>

Received on Wednesday, 24 August 2016 10:33:20 UTC