Re: prevalence of schema.org/Book

Niklas,

On Mon, Jan 28, 2013 at 6:49 PM, Niklas Lindström <lindstream@gmail.com> wrote:
>>> I suspect there is something strange with the property IRIs in that
>>> data. Jason, did those IRIs come from the source? Schema.org
>>> properties have IRIs of the form <http://schema.org/{term}>, i.e. not
>>> concatenated on a type. (As Jeff also mentioned; see [1] for details.)
>>
>> If you're referring to IRIs like http://schema.org/Book/name from my
>> post, then that comes directly from the Web Data Commons data. If this
>> is incorrect, which it appears to be, then it should be taken up with
>> them. I was just taking the data as I was given it and spitting it
>> back out. Here's a typical NQuad from the Web Data Commons corpus:
>> _:nodeca28f5cf7b05162b4036f77a176718 <http://schema.org/Book/isbn>
>> "978-3-902406-06-4"@en
>> <http://www.seifertverlag.at/en/programme/2003_autumn/detail_pharao.php>
>>   .
>
> That's interesting, and may be troublesome. I see that the source of
> that is using microdata (except for RDFa in the head for OGP). Since
> the interpretation of microdata as RDF used to be a moving target,
> with various options for constructing the property IRI (there still
> are [1]), I suspect that that has at least in part caused this. It
> should reasonably be investigated.

In some cases outside of the NQuads they do get the property names at
better URIs. In the RDFa statistics spreadsheet it appears they give
the proper property URI. I think someone like you who understands the
Microdata to RDF conversion best practices ought to consult with the
Web Data Commons folks on this. I'd like to see them make their
conversions using the current best practice for this sort of thing,
but wouldn't know how to advise.

I had looked at the RDFa statistics spreadsheet [1] from WDC, but did
not find any use of schema.org/Book or schema.org/ScholarlyArticle. At
least so far in the Common Crawl corpus folks who use RDFa aren't
using schema.org for these types. I did find a little bit of use of
schema.org types, but nowhere near as much as you do in the Microdata
set. Some of this could be due to any bias that the Common Crawl
robots have.

Jason

[1] http://webdatacommons.org/downloads/2012-08/vectors/rdfa_statistics.xlsx.gz

Received on Tuesday, 29 January 2013 00:36:15 UTC