W3C home > Mailing lists > Public > public-schemaorg@w3.org > June 2015

Re: Query schema.org data?

From: Nicolas Torzec <torzecn@yahoo-inc.com>
Date: Tue, 2 Jun 2015 21:46:07 +0000 (UTC)
To: Dan Brickley <danbri@google.com>, Barry Carter <carter.barry@gmail.com>
Cc: Phil Barker <phil.barker@hw.ac.uk>, "schema.org Mailing List" <public-schemaorg@w3.org>
Message-ID: <19792298.3833095.1433281567767.JavaMail.yahoo@mail.yahoo.com>
(message got stuck in my inbox yesterday)
I am not sure about use case: e.g. how much do you care about freshness?
Best bet is probably Google.- Yahoo doesn't have anything like this publicly available today.- I am not sure about Bing.
Sindice used to have something like that but has gone out of business as far as I understand. See [1] for a recap.
Did you look at Web Data Commons: [2] ? The structured data are extracted from the Common Crawl, openly licensed, and stored on S3 for convenience. One could build a stalled index on top of it if you care about random access and much about freshness?

[1]: http://www.dataversity.net/end-support-sindice-com-search-engine-history-lessons-learned-legacy-guest-post/[2]: http://webdatacommons.org/



     On Monday, June 1, 2015 5:11 PM, Dan Brickley <danbri@google.com> wrote:

 On 1 June 2015 at 20:38, Barry Carter <carter.barry@gmail.com> wrote:
> Phil, I was referring to google's public search engine at google.com

Currently Custom Search would be your best bet w.r.t Google. Not sure
what the other engines do.



> On Mon, 1 Jun 2015, Phil Barker wrote:
>> Date: Mon, 01 Jun 2015 19:35:15 +0100
>> From: Phil Barker <phil.barker@hw.ac.uk>
>> To: public-schemaorg@w3.org
>> Subject: Re: Query schema.org data?
>> Resent-Date: Mon, 01 Jun 2015 18:35:51 +0000
>> Resent-From: public-schemaorg@w3.org
>> With Google custom search engine, I think you should be able to choose
>> option to ?Restrict Pages using Schema.org Types? to Place and to add a
>> refinement something like  more:p:Place-name:Texas
>> On 01/06/15 19:12, Barry Carter wrote:
>>      Is it possible to query the data Google, Microsoft, Yahoo, etc
>>      collect from pages marked with schema.org tags?
>>      For example, I tried googling "[more:Place:name:Texas]" (no
>>      quotes) as
>>      quasi-suggested by:
>>      https://developers.google.com/custom-search/docs/structured_data
>>      but got no results.
>>      Of course, that page is specific to per-site custom queries, so
>>      my syntax may
>>      be wrong.
>>      Is there any generic way (on any of the search engines using
>>      schema.org) to
>>      search for documents that have a schema.org Place named Texas in
>>      them?
>>      [I realize schema.org itself has no data, but it still seemed to
>>      make a good post title]
>> --
>> --
>> Phil Barker          @philbarker
>> LRMI, Cetis, ICBL    http://people.pjjk.net/phil
>> Heriot-Watt University
>> Ubuntu: http://xkcd.com/456/
>>  not so much an operating system as a learning opportunity.

Received on Tuesday, 2 June 2015 21:47:35 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 2 June 2015 21:47:35 UTC