W3C home > Mailing lists > Public > public-schemaorg@w3.org > November 2018

Re: dataCommons.org: Data Commons Knowledge Graph (DCKG)

From: Paola Di Maio <paola.dimaio@gmail.com>
Date: Wed, 21 Nov 2018 11:36:25 +0800
Message-ID: <CAMXe=SrhzvXaciLa3J7pwLVYLAzUEJjwkno36MrHsj4pajXrUg@mail.gmail.com>
To: Guha <guha@google.com>
Cc: elwinlhq@gmail.com, "schema.org Mailing List" <public-schemaorg@w3.org>, Dan Brickley <danbri@google.com>, support@datacommons.org
So, dear Guha et ak
are you excluding the possibility of any direct correlation between
size of the graph and utility?
I;d like to see the hypothesis  evaluated either way


Dr Paola Di Maio
Artificial Intelligence Knowledge Representation
Special Issue, Systems MDPI
*Cfp  accepting manuscripts
A bit about me




On Wed, Nov 21, 2018 at 11:04 AM Guha <guha@google.com> wrote:
>
> Elwin,
>
>  We are not trying to 'improve' the size of the data commons KG. Merely its utility. In that spirit, I have to respectfully keep our knowledge base size out of the discussions you reference.
>
>  guha
>
> On Tue, Nov 20, 2018 at 5:11 AM Elwin Huaman <elwinlhq@gmail.com> wrote:
>>
>> I totally agree with you, "what really matters is the utility of the data", However, is important to note that for building high-quality Knowledge Graphs(KGs) depends on the structure and identified reliable data sources. "What gets measured, gets improved".
>>
>> I am just curious about these numbers because two reasons: First, to date, some authors have estimated the cost of the KG curation and that create a large-scale KGs can be hard and costly [1]. As well as, whichever approach is taken for constructing a KG, the KG resulted is not always correct [2]. Second, the importance of KG cleaning in the KG creation lifecycle, where external sources(e.g., Data Commons Knowledge Graph DCKG ) can be used as input to training data i.e., machine learning approach may require a sufficient quantity of training data.
>>
>> thank you for your time!
>>
>> Elwin
>>
>> [1] Paulheim, H. (2018). How much is a triple? estimating the cost of knowledge graph creation. In Proceedings of the ISWC 2018 Posters & Demonstrations, Industry and Blue Sky Ideas Tracks co-located with 17th International Semantic Web Conference (ISWC 2018), Monterey, USA, October 8th - to - 12th, 2018.
>> [2] Bordes, A. and Gabrilovich, E. (2014). Constructing and mining web-scale knowledge graphs: Kdd 2014 tutorial. In Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’14, pages 1967–1967, New York, NY, USA. ACM.
>>
>> On Mon, 19 Nov 2018 at 22:35, Guha <guha@google.com> wrote:
>>>
>>> Elwin,
>>>
>>>  These numbers are rapidly changing and as we have all learnt, what matters really is the utility of the data, not the size of the graph.
>>>
>>>  So, could you give us some context?
>>>
>>> guha
>>>
>>> On Mon, Nov 19, 2018 at 11:47 AM Elwin Huaman <elwinlhq@gmail.com> wrote:
>>>>
>>>> Hey all,
>>>>
>>>> I was challenged last week to provide info(in rough numbers) about the Data Commons Knowledge Graph(DCKG), which was constructed by synthesizing in a single Knowledge Graph from many different data sources[1]. What I am looking for especially is to know:
>>>>
>>>> How many entities or nodes the DCKG has?, understanding that dcid (DataCommons identifier) is a unique identifier assigned to each entity in the knowledge graph, furthermore entities are represented by nodes[2].
>>>> How many data sources the DCKG has?, because currently contains data from Wikipedia, the US Census, NOAA, FBI, etc?[3].
>>>> How many nodes and relations the DCKG has? and  How many statements it has?
>>>>
>>>> For example, the statement "Santa Clara County is contained in the State of California" is represented in the graph as two nodes: "Santa Clara County" and "California" with an edge labeled "containedInPlace" pointing from Santa Clara to California.
>>>>
>>>> What is the current size of the used vocabulary in the DCKG?, taking into account that dataCommons.org builds upon on the vocabularies defined by Schema.org[4]
>>>> These are potential FAQs for future researchers (of course there are more)
>>>>
>>>> Could you help me?
>>>>
>>>> cheers,
>>>> Elwin Huaman
>>>>
>>>>
>>>> [1] https://browser.datacommons.org/
>>>> [2] https://colab.research.google.com/drive/1vffnWktZyffk7pNfpuXrTsCpp-od5W47
>>>> [3] https://datacommons.org/
>>>> [4] https://datacommons.org/faq
>>>>
>>
>>
>> --
>> _____________________________
>> Elwin Huaman
>> Web Engineer
>> +34 671 529 151
>> +43 0677 631 129 47
Received on Wednesday, 21 November 2018 03:37:33 UTC

This archive was generated by hypermail 2.3.1 : Wednesday, 21 November 2018 03:37:34 UTC