W3C home > Mailing lists > Public > public-schemaorg@w3.org > November 2018

Re: dataCommons.org: Data Commons Knowledge Graph (DCKG)

From: Guha <guha@google.com>
Date: Tue, 20 Nov 2018 19:00:39 -0800
Message-ID: <CAPAGhv9nqtmsxnyHyLUEXmQyg=v7tOOkDANEcDYK5Z61zDCASQ@mail.gmail.com>
To: Elwin Huaman <elwinlhq@gmail.com>
Cc: public-schemaorg@w3.org, Dan Brickley <danbri@google.com>, support@datacommons.org
Elwin,

 We are not trying to 'improve' the size of the data commons KG. Merely its
utility. In that spirit, I have to respectfully keep our knowledge base
size out of the discussions you reference.

 guha

On Tue, Nov 20, 2018 at 5:11 AM Elwin Huaman <elwinlhq@gmail.com> wrote:

> I totally agree with you, "what really matters is the utility of the
> data", However, is important to note that for building high-quality Knowledge
> Graphs(KGs) depends on the structure and identified reliable data
> sources. "What gets measured, gets improved".
>
> I am just curious about these numbers because two reasons: First, to date,
> some authors have estimated the cost of the KG curation and that create a
> large-scale KGs can be hard and costly [1]. As well as, whichever approach
> is taken for constructing a KG, the KG resulted is not always correct [2].
> Second, the importance of KG cleaning in the KG creation lifecycle, where
> external sources(e.g., Data Commons Knowledge Graph DCKG ) can be used as
> input to training data i.e., machine learning approach may require a
> sufficient quantity of training data.
>
> thank you for your time!
>
> Elwin
>
> [1] Paulheim, H. (2018). How much is a triple? estimating the cost of
> knowledge graph creation. In Proceedings of the ISWC 2018 Posters &
> Demonstrations, Industry and Blue Sky Ideas Tracks co-located with 17th
> International Semantic Web Conference (ISWC 2018), Monterey, USA, October
> 8th - to - 12th, 2018.
> [2] Bordes, A. and Gabrilovich, E. (2014). Constructing and mining
> web-scale knowledge graphs: Kdd 2014 tutorial. In Proceedings of the 20th
> ACM SIGKDD International Conference on Knowledge Discovery and Data Mining,
> KDD ’14, pages 1967–1967, New York, NY, USA. ACM.
>
> On Mon, 19 Nov 2018 at 22:35, Guha <guha@google.com> wrote:
>
>> Elwin,
>>
>>  These numbers are rapidly changing and as we have all learnt, what
>> matters really is the utility of the data, not the size of the graph.
>>
>>  So, could you give us some context?
>>
>> guha
>>
>> On Mon, Nov 19, 2018 at 11:47 AM Elwin Huaman <elwinlhq@gmail.com> wrote:
>>
>>> Hey all,
>>>
>>> I was challenged last week to provide info(in rough numbers) about the
>>> Data Commons Knowledge Graph(DCKG), which was constructed by synthesizing
>>> in a single Knowledge Graph from many different data sources[1]. What I am
>>> looking for especially is to know:
>>>
>>>    - *How many entities or nodes the DCKG has?*, understanding that
>>>    *dcid* (DataCommons identifier) is a unique identifier assigned to
>>>    each entity in the knowledge graph, furthermore entities are represented by
>>>    nodes[2].
>>>    - *How many data sources the DCKG has?*, because currently contains
>>>    data from Wikipedia, the US Census, NOAA, FBI, *etc?*[3].
>>>    - *How many nodes and relations the DCKG has? and  **How many
>>>    statements it has?*
>>>       - For example, the statement "Santa Clara County is contained in
>>>       the State of California" is represented in the graph as two nodes: "Santa
>>>       Clara County" and "California" with an edge labeled "containedInPlace"
>>>       pointing from Santa Clara to California.
>>>    - *What is the current size of the used vocabulary in the DCKG?*,
>>>    taking into account that dataCommons.org builds upon on the vocabularies
>>>    defined by Schema.org[4]
>>>    - *These are potential FAQs* for future researchers (of course there
>>>    are more)
>>>
>>> Could you help me?
>>>
>>> cheers,
>>> Elwin Huaman
>>>
>>>
>>> [1] https://browser.datacommons.org/
>>> [2]
>>> https://colab.research.google.com/drive/1vffnWktZyffk7pNfpuXrTsCpp-od5W47
>>> [3] https://datacommons.org/
>>> [4] https://datacommons.org/faq
>>>
>>>
>
> --
> _____________________________
> *Elwin Huaman *
> *Web Engineer*
> *+34 671 529 151*
> *+43 0677 631 129 47*
>
Received on Wednesday, 21 November 2018 03:01:15 UTC

This archive was generated by hypermail 2.3.1 : Wednesday, 21 November 2018 03:01:16 UTC