W3C home > Mailing lists > Public > public-sdw-wg@w3.org > August 2016

Re: Make your data indexable by search engines

From: Bill Roberts <bill@swirrl.com>
Date: Wed, 24 Aug 2016 14:47:15 +0200
Message-ID: <CAMTVsunHCwQjefP0Rb=3ZiDzvfrGdV1sf7uEZdAHZ3sp37oWrQ@mail.gmail.com>
To: Ghislain Atemezing-Pro <ghislain.atemezing@mondeca.com>
Cc: Linda van den Brink <l.vandenbrink@geonovum.nl>, "SDW WG (public-sdw-wg@w3.org)" <public-sdw-wg@w3.org>
Hi Ghislain

The search feature currently indexes the name and description of datasets,
and names and codes of geographical areas.  So while we provide some basic
DCAT metadata for each dataset, the search feature doesn't currently look
at it.

We have a background activity to develop an 'advanced search' that will
probably include elements of DCAT metadata amongst other things.

Best regards

Bill



On 24 August 2016 at 14:43, Ghislain Atemezing-Pro <
ghislain.atemezing@mondeca.com> wrote:

> Hi all,
> @Bill: I was wondering how you use DCAT in the searching part of your
> site...Let's say http://statistics.gov.scot/search?search=dundee. Maybe
> your use case here might help others to reach that "best practice".
>
> Best,
> Ghislain
>
> Le mer. 24 août 2016 à 13:00, Ed Parsons <eparsons@google.com> a écrit :
>
>> This is one of the mist important issues and I think reflects a chicken
>> and egg situation presently.
>>
>> Dan can provide more detail here, but Google will try to answer a factual
>> question like the population of Dundee from the Knowledge Graph, the
>> internal repository of facts obtained from canonical sources in many cases
>> Wikipedia and wikidata supplemented by structured content from individual
>> websites.
>>
>> The chicken and egg part comes when we want to promote the city website
>> as the canonical source.. instead of Wikipedia/Wikidata.
>>
>> If course there is also the valid question should we recommend that the
>> city instead just updates Wikipedia?
>>
>> And yes I have ignored the issue of refreshing data when new information
>> is published..
>>
>> Ed
>>
>> On Wed, 24 Aug 2016, 12:32 Bill Roberts, <bill@swirrl.com> wrote:
>>
>>> This is a very interesting discussion and if you don't mind I'd like to
>>> throw in a practical example. We work with the Scottish Government to help
>>> them publish statistical data as linked data.  They were asking me earlier
>>> this week: how can we get our data to appear in search engine results?
>>>
>>> I'd say we were following good practices for spatial and statistical
>>> data publishing - though perhaps not yet 'best' practices, as there is
>>> always more that you can do!
>>>
>>> But it's not focused on search engine ranking, and the end result is
>>> that the data is not yet prominent in the search results of major search
>>> engines, though all the big ones can and do crawl the site - admittedly
>>> it's a relatively new site with a lot of pages, so will take a while before
>>> it's fully indexed, aside from any ranking issues.
>>>
>>> For example, I tried a Google search for 'population of Dundee'.  That
>>> picks up a number apparently from Wikipedia with a number from 2004 of
>>> 141,870.  The wikipedia latest https://en.wikipedia.org/wiki/Dundee has
>>> a more recent figure of 148,260 from 2014 - that matches the official
>>> figure, but doesn't make it to the headline Google result.
>>>
>>> The official estimate for the population of Dundee can be found at this
>>> page about Dundee (actually the Dundee City council area):
>>> - http://statistics.gov.scot/doc/statistical-geography/S12000042
>>> (see the data tab: http://statistics.gov.scot/doc/statistical-
>>> geography/S12000042?tab=data)
>>>
>>> So the government answer to the question is that in 2014 the population
>>> was 148,260.  How do we help people find that?
>>>
>>> There's also a page for the individual RDF Data Cube observation for the
>>> most recently published data (2014)
>>> http://statistics.gov.scot/data/population-estimates/
>>> year/2014/S12000042/gender/all/age/all/people/count
>>>
>>> This dataset metadata summarises the methodology: http://
>>> statistics.gov.scot/data/population-estimates?tab=about
>>>
>>> In this case Google is presumably doing some special-case stuff with
>>> Wikipedia, then after that it is returning pages that contain the words in
>>> the search term, ranked according to however Google ranks stuff these days.
>>>
>>> In this case, we've got some high quality machine- and human-readable
>>> spatial data on the web - what can our best practices advise me to do to
>>> make this easier for people to find?
>>>
>>>
>>> Cheers
>>>
>>> Bill
>>>
>>>
>>>
>>>
>>> On 24 August 2016 at 11:44, Byron Cochrane <bcochrane@linz.govt.nz>
>>> wrote:
>>>
>>>> Hi Linda,
>>>>
>>>> My short response here is less is more. Let's make these BPs more
>>>> punchy and accessible wherever we can.  I agree with the general intent
>>>> here that making data crawlable by search engines is useful.  (Although I
>>>> would like to better understand the real use cases. They don't seem all
>>>> that solid to me.  I think this BP helps but not in the way stated.)  I
>>>> feel that most of the arguments weaken the argument rather than
>>>> strengthening it because they sound like hacks and not best practice.  So
>>>> why include them if the point can be made without?
>>>>
>>>> Perhaps also I spent too many years studying Christopher Alexander's
>>>> "Pattern Languange" years ago. That is what the DWBP is styled after either
>>>> directly or indirectly.  This document it feels to me is drifting from that
>>>> by focusing too much on the arguments and not the practice. I could go into
>>>> great detail about why I don't like many of the arguments in this
>>>> particular section but I fear it would an unnecessary distraction from
>>>> creating a good product.
>>>>
>>>> Yet if you think I should then sure.
>>>>
>>>> Cheers,
>>>> Byron
>>>> ________________________________________
>>>> From: Linda van den Brink [l.vandenbrink@geonovum.nl]
>>>> Sent: Wednesday, August 24, 2016 7:06 PM
>>>> To: Byron Cochrane; 'Dan Brickley'
>>>> Cc: 'eparsons@google.com'; 'SDW WG (public-sdw-wg@w3.org)'
>>>> Subject: RE: Make your data indexable by search engines
>>>>
>>>> DCAT is more for getting your dataset metadata into data portal
>>>> catalogs  like CKAN.
>>>>
>>>> Byron I still have to look in detail at your proposal compared to the
>>>> current BP text. From briefly glancing at it, I gather you propose to
>>>> remove quite a bit of text and I’m not sure I’m happy about that. If I know
>>>> what the gist is of the issue you have with the current text, we can have a
>>>> discussion. Meanwhile, what do others think?
>>>>
>>>> Van: Byron Cochrane [mailto:bcochrane@linz.govt.nz]
>>>> Verzonden: woensdag 24 augustus 2016 00:57
>>>> Aan: 'Dan Brickley'
>>>> CC: 'eparsons@google.com'; 'SDW WG (public-sdw-wg@w3.org)'
>>>> Onderwerp: RE: Make your data indexable by search engines
>>>>
>>>> Hi Dan,
>>>>
>>>> Thanks for the feedback.  I wondered about DCAT.  It was a last minute
>>>> addition thinking it might help because it would provide linkages that
>>>> crawlers could follow??  If it is not appropriate let’s take it out.  I by
>>>> no means claim any expertise in SEO.
>>>>
>>>> Cheers,
>>>> Byron
>>>>
>>>> From: Dan Brickley [mailto:danbri@google.com]
>>>> Sent: Tuesday, 23 August 2016 6:18 p.m.
>>>> To: Byron Cochrane
>>>> Cc: eparsons@google.com<mailto:eparsons@google.com>; SDW WG (
>>>> public-sdw-wg@w3.org<mailto:public-sdw-wg@w3.org>)
>>>> Subject: Re: Make your data indexable by search engines
>>>>
>>>>
>>>>
>>>> On 23 August 2016 at 03:55, Byron Cochrane <bcochrane@linz.govt.nz<
>>>> mailto:bcochrane@linz.govt.nz>> wrote:
>>>> Hi,
>>>>
>>>> I have been struggling with the “Make your data indexable by search
>>>> engines” BP.   While I agree that it is useful to make data discoverable by
>>>> common search engines, there is much is the discussion include in this BP
>>>> that I take issue with.  But also, it seems too wordy to be easily
>>>> understood.  So I have created a “Starter for Ten” revision that I feel
>>>> focuses on the main idea while removing the contentious arguments while not
>>>> decreasing the impact (I hope).
>>>>
>>>> Here it is:
>>>>
>>>> Make your data indexable by search engines
>>>>
>>>> Search engines should be able to crawl and index metadata for spatial
>>>> data on the web.
>>>>
>>>> Why?
>>>>
>>>> In SDIs, data are commonly managed and published through the provision
>>>> of authoritative ISO 19115 metadata collated in web based catalogs.  These
>>>> catalogs and metadata may be difficult for non-professionals to find and
>>>> use as they often do not support discovery through common search engines.
>>>> These data therefore do not find their broadest audience. This is
>>>> particularly true for SpatialThings that reside inside datasets.
>>>>
>>>> Intended Outcome
>>>>
>>>> Metadata for spatial data, datasets and SpatialThings, are indexable by
>>>> search engine crawlers thereby making these data discoverable through
>>>> common search engines.
>>>>
>>>> Possible Approach to Implementation
>>>>
>>>> To make your data indexable by search engines expose the appropriate
>>>> elements of the spatial metadata (ISO 19115 and other) in formats that
>>>> crawlers can use, such as DCAT, Schema.org, microdata and Opensearch.
>>>> Where possible, do this at both the dataset and SpatialThing level.
>>>>
>>>> Which search engines use DCAT?
>>>>
>>>> Dan
>>>>
>>>>
>>>> Look forward to getting some feedback.
>>>>
>>>> Cheers,
>>>>
>>>> Byron Cochrane
>>>> SDI Technical Leader
>>>> New Zealand Geospatial Office
>>>>
>>>> E  bcochrane@linz.govt.nz<mailto:bcochrane@linz.govt.nz>| DDI 04 460
>>>> 0576| M 021 794 501
>>>>
>>>> Wellington Office, Level 7, Radio New Zealand House, 155 The Terrace
>>>> PO Box 5501, Wellington 6145, New Zealand | T 04 460 0110
>>>> W  www.linz.govt.nz<http://www.linz.govt.nz/> | data.linz.govt.nz<
>>>> http://www.data.linz.govt.nz/>
>>>> [cid:image001.png@01D1FDE6.2764ADF0]
>>>>
>>>>
>>>> ________________________________
>>>> This message contains information, which may be in confidence and may
>>>> be subject to legal privilege. If you are not the intended recipient, you
>>>> must not peruse, use, disseminate, distribute or copy this message. If you
>>>> have received this message in error, please notify us immediately (Phone
>>>> 0800 665 463 or info@linz.govt.nz<mailto:info@linz.govt.nz>) and
>>>> destroy the original message. LINZ accepts no responsibility for changes to
>>>> this email, or for any attachments, after its transmission from LINZ. Thank
>>>> You.
>>>>
>>>>
>>>>
>>> --
>>
>> *Ed Parsons *FRGS
>> Geospatial Technologist, Google
>>
>> Google Voice +44 (0)20 7881 4501
>> www.edparsons.com @edparsons
>>
> --
> --------------------------------------------
> Ghislain A. Atemezing, Ph.D
> R&D Engineer
> @ Mondeca, Paris, France
> Labs: http://labs.mondeca.com
> Tel: +33 (0)1 4111 3034
> Web: www.mondeca.com
> Twitter: @gatemezing
> About Me: http://atemezing.org
>
Received on Wednesday, 24 August 2016 12:47:46 UTC

This archive was generated by hypermail 2.4.0 : Thursday, 24 March 2022 20:31:25 UTC