W3C home > Mailing lists > Public > public-schemaorg@w3.org > June 2015

Re: Query schema.org data?

From: Peter Mika <pmika@yahoo-inc.com>
Date: Wed, 3 Jun 2015 10:11:47 +0000 (UTC)
To: Nicolas Torzec <torzecn@yahoo-inc.com>, Dan Brickley <danbri@google.com>, Barry Carter <carter.barry@gmail.com>
Cc: Phil Barker <phil.barker@hw.ac.uk>, "schema.org Mailing List" <public-schemaorg@w3.org>
Message-ID: <339210772.4101781.1433326307636.JavaMail.yahoo@mail.yahoo.com>
It's somewhat outdated by now, but at Yahoo Labs we built an index over the WDC data which allows you to do structured queries:

The method is described in the following publication:
Roi Blanco, Peter Mika, Sebastiano Vigna: Effective and Efficient Entity Search in RDF Data. International Semantic Web Conference (1) 2011: 83-97

The code is open source at:


     On Tuesday, June 2, 2015 10:46 PM, Nicolas Torzec <torzecn@yahoo-inc.com> wrote:

 (message got stuck in my inbox yesterday)
I am not sure about use case: e.g. how much do you care about freshness?
Best bet is probably Google.- Yahoo doesn't have anything like this publicly available today.- I am not sure about Bing.
Sindice used to have something like that but has gone out of business as far as I understand. See [1] for a recap.
Did you look at Web Data Commons: [2] ? The structured data are extracted from the Common Crawl, openly licensed, and stored on S3 for convenience. One could build a stalled index on top of it if you care about random access and much about freshness?

[1]: http://www.dataversity.net/end-support-sindice-com-search-engine-history-lessons-learned-legacy-guest-post/[2]: http://webdatacommons.org/



     On Monday, June 1, 2015 5:11 PM, Dan Brickley <danbri@google.com> wrote:

 On 1 June 2015 at 20:38, Barry Carter <carter.barry@gmail.com> wrote:
> Phil, I was referring to google's public search engine at google.com

Currently Custom Search would be your best bet w.r.t Google. Not sure
what the other engines do.



> On Mon, 1 Jun 2015, Phil Barker wrote:
>> Date: Mon, 01 Jun 2015 19:35:15 +0100
>> From: Phil Barker <phil.barker@hw.ac.uk>
>> To: public-schemaorg@w3.org
>> Subject: Re: Query schema.org data?
>> Resent-Date: Mon, 01 Jun 2015 18:35:51 +0000
>> Resent-From: public-schemaorg@w3.org
>> With Google custom search engine, I think you should be able to choose
>> option to ?Restrict Pages using Schema.org Types? to Place and to add a
>> refinement something like  more:p:Place-name:Texas
>> On 01/06/15 19:12, Barry Carter wrote:
>>      Is it possible to query the data Google, Microsoft, Yahoo, etc
>>      collect from pages marked with schema.org tags?
>>      For example, I tried googling "[more:Place:name:Texas]" (no
>>      quotes) as
>>      quasi-suggested by:
>>      https://developers.google.com/custom-search/docs/structured_data
>>      but got no results.
>>      Of course, that page is specific to per-site custom queries, so
>>      my syntax may
>>      be wrong.
>>      Is there any generic way (on any of the search engines using
>>      schema.org) to
>>      search for documents that have a schema.org Place named Texas in
>>      them?
>>      [I realize schema.org itself has no data, but it still seemed to
>>      make a good post title]
>> --
>> --
>> Phil Barker          @philbarker
>> LRMI, Cetis, ICBL    http://people.pjjk.net/phil
>> Heriot-Watt University
>> Ubuntu: http://xkcd.com/456/
>>  not so much an operating system as a learning opportunity.


Received on Wednesday, 3 June 2015 10:13:14 UTC

This archive was generated by hypermail 2.3.1 : Wednesday, 3 June 2015 10:13:14 UTC