- From: Dan Brickley <danbri@google.com>
- Date: Thu, 18 Oct 2018 14:26:14 -0700
- To: "R.V. Guha" <guha@google.com>
- Cc: rdf-dev@w3.org, "schema.org Mailing List" <public-schemaorg@w3.org>, Vicki Tardif Holland <vtardif@google.com>
- Message-ID: <CAK-qy=4+_KJ9PmNbE18XZRHFaZ4ncm8HBtBWsqCcL0t7QL6aEg@mail.gmail.com>
On Thu, 18 Oct 2018 at 14:20, Guha <guha@google.com> wrote: > In May 2018, we introduced datacommons.org, <http://datacommons.org/> an > initiative for the open sharing of data, and released the first fact check > corpus to help academia and practitioners to study misinformation. > > We are now taking the next step in the evolution of datacommons.org. > A few notes to follow up on Guha's dataCommons announcement, oriented towards the Schema.org community. (These are potential FAQs, but nobody has actually asked them, so I'm trying to anticipate a few more potential questions here in advance.) Firstly, folk in the Search Marketing community will be wondering what this means for SEO. At this point, I'd suggest this is "one to watch". The dataCommons effort is in large part about making it easier to use Schema.org data, re-exposing an integrated view of data that is represented in Schema markup. For those in the SEO world interested to engage, it probably makes most sense to continue to focus on the vocabulary already used by search engines, e.g. for Google, https://developers.google.com/search/docs/guides/search-gallery For those more in the standards world, you may be wondering "hey, what about W3C RDF Data Cube, SKOS, SPARQL 1.1, PROV, SHACL, ShEx, CSVW, JSON-LD contexts, Linked Data Platform, etc etc?"... or many other interesting and standardized approaches. At this stage dataCommons is more focussed on taking a step back and concentrating on mechanisms (e.g. workflow, implementation, APIs) focussed more on the core graph data model. In a W3C setting, this roughly means RDF. The dataCommons approach to Knowledge Graphs highlights a common issue with RDF that has been encountered also in many related efforts, from Freebase and Wikidata to Schema.org itself: that we need to represent fine-grained provenance and qualifications with each piece of factual data; historically this is difficult in standard RDF without (ab)using SPARQL named graphs as representational mechanism. W3C's upcoming workshop on bridging RDF, Property Graph and SQL standards for Graph Data is therefore highly relevant ( https://www.w3.org/Data/events/data-ws-2019/cfp.html). For data science, journalists and those working with public datasets, we have been exploring (see http://datacommons.org/colab) the use of Python/Jupyter notebooks (including a protocol-backed Python API) as a way to expose data for exploration via Panda data frames. There are a few directions this could take, and the python wrapper is effectively another way to avoid prematurely fixing our approach to query language, provenance, etc. The current approach drafts some dedicated domain-specific schemas to reflect more explicitly what some public statistics datasets are telling us, and we are looking at ways of bridging this to more generic representations (like data cube) which offer weaker data Integration but may scale better for the long tail of public data. cheers, Dan
Received on Thursday, 18 October 2018 21:26:52 UTC