W3C home > Mailing lists > Public > public-sdw-wg@w3.org > August 2016

Re: Make your data indexable by search engines

From: Ghislain Atemezing-Pro <ghislain.atemezing@mondeca.com>
Date: Wed, 24 Aug 2016 12:43:06 +0000
Message-ID: <CAGKgTR=8ZOwzGOkjRH=B7iHSi7mWoBZ+KZBabRzzz2sMrSFwaA@mail.gmail.com>
To: Bill Roberts <bill@swirrl.com>
Cc: Linda van den Brink <l.vandenbrink@geonovum.nl>, "SDW WG (public-sdw-wg@w3.org)" <public-sdw-wg@w3.org>
Hi all,
@Bill: I was wondering how you use DCAT in the searching part of your
site...Let's say http://statistics.gov.scot/search?search=dundee. Maybe
your use case here might help others to reach that "best practice".

Best,
Ghislain

Le mer. 24 août 2016 à 13:00, Ed Parsons <eparsons@google.com> a écrit :

> This is one of the mist important issues and I think reflects a chicken
> and egg situation presently.
>
> Dan can provide more detail here, but Google will try to answer a factual
> question like the population of Dundee from the Knowledge Graph, the
> internal repository of facts obtained from canonical sources in many cases
> Wikipedia and wikidata supplemented by structured content from individual
> websites.
>
> The chicken and egg part comes when we want to promote the city website as
> the canonical source.. instead of Wikipedia/Wikidata.
>
> If course there is also the valid question should we recommend that the
> city instead just updates Wikipedia?
>
> And yes I have ignored the issue of refreshing data when new information
> is published..
>
> Ed
>
> On Wed, 24 Aug 2016, 12:32 Bill Roberts, <bill@swirrl.com> wrote:
>
>> This is a very interesting discussion and if you don't mind I'd like to
>> throw in a practical example. We work with the Scottish Government to help
>> them publish statistical data as linked data.  They were asking me earlier
>> this week: how can we get our data to appear in search engine results?
>>
>> I'd say we were following good practices for spatial and statistical data
>> publishing - though perhaps not yet 'best' practices, as there is always
>> more that you can do!
>>
>> But it's not focused on search engine ranking, and the end result is that
>> the data is not yet prominent in the search results of major search
>> engines, though all the big ones can and do crawl the site - admittedly
>> it's a relatively new site with a lot of pages, so will take a while before
>> it's fully indexed, aside from any ranking issues.
>>
>> For example, I tried a Google search for 'population of Dundee'.  That
>> picks up a number apparently from Wikipedia with a number from 2004 of
>> 141,870.  The wikipedia latest https://en.wikipedia.org/wiki/Dundee has
>> a more recent figure of 148,260 from 2014 - that matches the official
>> figure, but doesn't make it to the headline Google result.
>>
>> The official estimate for the population of Dundee can be found at this
>> page about Dundee (actually the Dundee City council area):
>> - http://statistics.gov.scot/doc/statistical-geography/S12000042
>> (see the data tab:
>> http://statistics.gov.scot/doc/statistical-geography/S12000042?tab=data)
>>
>> So the government answer to the question is that in 2014 the population
>> was 148,260.  How do we help people find that?
>>
>> There's also a page for the individual RDF Data Cube observation for the
>> most recently published data (2014)
>>
>> http://statistics.gov.scot/data/population-estimates/year/2014/S12000042/gender/all/age/all/people/count
>>
>> This dataset metadata summarises the methodology:
>> http://statistics.gov.scot/data/population-estimates?tab=about
>>
>> In this case Google is presumably doing some special-case stuff with
>> Wikipedia, then after that it is returning pages that contain the words in
>> the search term, ranked according to however Google ranks stuff these days.
>>
>> In this case, we've got some high quality machine- and human-readable
>> spatial data on the web - what can our best practices advise me to do to
>> make this easier for people to find?
>>
>>
>> Cheers
>>
>> Bill
>>
>>
>>
>>
>> On 24 August 2016 at 11:44, Byron Cochrane <bcochrane@linz.govt.nz>
>> wrote:
>>
>>> Hi Linda,
>>>
>>> My short response here is less is more. Let's make these BPs more punchy
>>> and accessible wherever we can.  I agree with the general intent here that
>>> making data crawlable by search engines is useful.  (Although I would like
>>> to better understand the real use cases. They don't seem all that solid to
>>> me.  I think this BP helps but not in the way stated.)  I feel that most of
>>> the arguments weaken the argument rather than strengthening it because they
>>> sound like hacks and not best practice.  So why include them if the point
>>> can be made without?
>>>
>>> Perhaps also I spent too many years studying Christopher Alexander's
>>> "Pattern Languange" years ago. That is what the DWBP is styled after either
>>> directly or indirectly.  This document it feels to me is drifting from that
>>> by focusing too much on the arguments and not the practice. I could go into
>>> great detail about why I don't like many of the arguments in this
>>> particular section but I fear it would an unnecessary distraction from
>>> creating a good product.
>>>
>>> Yet if you think I should then sure.
>>>
>>> Cheers,
>>> Byron
>>> ________________________________________
>>> From: Linda van den Brink [l.vandenbrink@geonovum.nl]
>>> Sent: Wednesday, August 24, 2016 7:06 PM
>>> To: Byron Cochrane; 'Dan Brickley'
>>> Cc: 'eparsons@google.com'; 'SDW WG (public-sdw-wg@w3.org)'
>>> Subject: RE: Make your data indexable by search engines
>>>
>>> DCAT is more for getting your dataset metadata into data portal
>>> catalogs  like CKAN.
>>>
>>> Byron I still have to look in detail at your proposal compared to the
>>> current BP text. From briefly glancing at it, I gather you propose to
>>> remove quite a bit of text and I’m not sure I’m happy about that. If I know
>>> what the gist is of the issue you have with the current text, we can have a
>>> discussion. Meanwhile, what do others think?
>>>
>>> Van: Byron Cochrane [mailto:bcochrane@linz.govt.nz]
>>> Verzonden: woensdag 24 augustus 2016 00:57
>>> Aan: 'Dan Brickley'
>>> CC: 'eparsons@google.com'; 'SDW WG (public-sdw-wg@w3.org)'
>>> Onderwerp: RE: Make your data indexable by search engines
>>>
>>> Hi Dan,
>>>
>>> Thanks for the feedback.  I wondered about DCAT.  It was a last minute
>>> addition thinking it might help because it would provide linkages that
>>> crawlers could follow??  If it is not appropriate let’s take it out.  I by
>>> no means claim any expertise in SEO.
>>>
>>> Cheers,
>>> Byron
>>>
>>> From: Dan Brickley [mailto:danbri@google.com]
>>> Sent: Tuesday, 23 August 2016 6:18 p.m.
>>> To: Byron Cochrane
>>> Cc: eparsons@google.com<mailto:eparsons@google.com>; SDW WG (
>>> public-sdw-wg@w3.org<mailto:public-sdw-wg@w3.org>)
>>> Subject: Re: Make your data indexable by search engines
>>>
>>>
>>>
>>> On 23 August 2016 at 03:55, Byron Cochrane <bcochrane@linz.govt.nz
>>> <mailto:bcochrane@linz.govt.nz>> wrote:
>>> Hi,
>>>
>>> I have been struggling with the “Make your data indexable by search
>>> engines” BP.   While I agree that it is useful to make data discoverable by
>>> common search engines, there is much is the discussion include in this BP
>>> that I take issue with.  But also, it seems too wordy to be easily
>>> understood.  So I have created a “Starter for Ten” revision that I feel
>>> focuses on the main idea while removing the contentious arguments while not
>>> decreasing the impact (I hope).
>>>
>>> Here it is:
>>>
>>> Make your data indexable by search engines
>>>
>>> Search engines should be able to crawl and index metadata for spatial
>>> data on the web.
>>>
>>> Why?
>>>
>>> In SDIs, data are commonly managed and published through the provision
>>> of authoritative ISO 19115 metadata collated in web based catalogs.  These
>>> catalogs and metadata may be difficult for non-professionals to find and
>>> use as they often do not support discovery through common search engines.
>>> These data therefore do not find their broadest audience. This is
>>> particularly true for SpatialThings that reside inside datasets.
>>>
>>> Intended Outcome
>>>
>>> Metadata for spatial data, datasets and SpatialThings, are indexable by
>>> search engine crawlers thereby making these data discoverable through
>>> common search engines.
>>>
>>> Possible Approach to Implementation
>>>
>>> To make your data indexable by search engines expose the appropriate
>>> elements of the spatial metadata (ISO 19115 and other) in formats that
>>> crawlers can use, such as DCAT, Schema.org, microdata and Opensearch.
>>> Where possible, do this at both the dataset and SpatialThing level.
>>>
>>> Which search engines use DCAT?
>>>
>>> Dan
>>>
>>>
>>> Look forward to getting some feedback.
>>>
>>> Cheers,
>>>
>>> Byron Cochrane
>>> SDI Technical Leader
>>> New Zealand Geospatial Office
>>>
>>> E  bcochrane@linz.govt.nz<mailto:bcochrane@linz.govt.nz>| DDI 04 460
>>> 0576| M 021 794 501
>>>
>>> Wellington Office, Level 7, Radio New Zealand House, 155 The Terrace
>>> PO Box 5501, Wellington 6145, New Zealand | T 04 460 0110
>>> W  www.linz.govt.nz<http://www.linz.govt.nz/> | data.linz.govt.nz<
>>> http://www.data.linz.govt.nz/>
>>> [cid:image001.png@01D1FDE6.2764ADF0]
>>>
>>>
>>> ________________________________
>>> This message contains information, which may be in confidence and may be
>>> subject to legal privilege. If you are not the intended recipient, you must
>>> not peruse, use, disseminate, distribute or copy this message. If you have
>>> received this message in error, please notify us immediately (Phone 0800
>>> 665 463 or info@linz.govt.nz<mailto:info@linz.govt.nz>) and destroy the
>>> original message. LINZ accepts no responsibility for changes to this email,
>>> or for any attachments, after its transmission from LINZ. Thank You.
>>>
>>>
>>>
>> --
>
> *Ed Parsons *FRGS
> Geospatial Technologist, Google
>
> Google Voice +44 (0)20 7881 4501
> www.edparsons.com @edparsons
>
-- 
--------------------------------------------
Ghislain A. Atemezing, Ph.D
R&D Engineer
@ Mondeca, Paris, France
Labs: http://labs.mondeca.com
Tel: +33 (0)1 4111 3034
Web: www.mondeca.com
Twitter: @gatemezing
About Me: http://atemezing.org
Received on Wednesday, 24 August 2016 12:43:46 UTC

This archive was generated by hypermail 2.4.0 : Thursday, 24 March 2022 20:31:25 UTC