- From: Paola Di Maio <paola.dimaio@gmail.com>
- Date: Wed, 21 Nov 2018 11:36:25 +0800
- To: Guha <guha@google.com>
- Cc: elwinlhq@gmail.com, "schema.org Mailing List" <public-schemaorg@w3.org>, Dan Brickley <danbri@google.com>, support@datacommons.org
So, dear Guha et ak are you excluding the possibility of any direct correlation between size of the graph and utility? I;d like to see the hypothesis evaluated either way Dr Paola Di Maio Artificial Intelligence Knowledge Representation Special Issue, Systems MDPI *Cfp accepting manuscripts A bit about me On Wed, Nov 21, 2018 at 11:04 AM Guha <guha@google.com> wrote: > > Elwin, > > We are not trying to 'improve' the size of the data commons KG. Merely its utility. In that spirit, I have to respectfully keep our knowledge base size out of the discussions you reference. > > guha > > On Tue, Nov 20, 2018 at 5:11 AM Elwin Huaman <elwinlhq@gmail.com> wrote: >> >> I totally agree with you, "what really matters is the utility of the data", However, is important to note that for building high-quality Knowledge Graphs(KGs) depends on the structure and identified reliable data sources. "What gets measured, gets improved". >> >> I am just curious about these numbers because two reasons: First, to date, some authors have estimated the cost of the KG curation and that create a large-scale KGs can be hard and costly [1]. As well as, whichever approach is taken for constructing a KG, the KG resulted is not always correct [2]. Second, the importance of KG cleaning in the KG creation lifecycle, where external sources(e.g., Data Commons Knowledge Graph DCKG ) can be used as input to training data i.e., machine learning approach may require a sufficient quantity of training data. >> >> thank you for your time! >> >> Elwin >> >> [1] Paulheim, H. (2018). How much is a triple? estimating the cost of knowledge graph creation. In Proceedings of the ISWC 2018 Posters & Demonstrations, Industry and Blue Sky Ideas Tracks co-located with 17th International Semantic Web Conference (ISWC 2018), Monterey, USA, October 8th - to - 12th, 2018. >> [2] Bordes, A. and Gabrilovich, E. (2014). Constructing and mining web-scale knowledge graphs: Kdd 2014 tutorial. In Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’14, pages 1967–1967, New York, NY, USA. ACM. >> >> On Mon, 19 Nov 2018 at 22:35, Guha <guha@google.com> wrote: >>> >>> Elwin, >>> >>> These numbers are rapidly changing and as we have all learnt, what matters really is the utility of the data, not the size of the graph. >>> >>> So, could you give us some context? >>> >>> guha >>> >>> On Mon, Nov 19, 2018 at 11:47 AM Elwin Huaman <elwinlhq@gmail.com> wrote: >>>> >>>> Hey all, >>>> >>>> I was challenged last week to provide info(in rough numbers) about the Data Commons Knowledge Graph(DCKG), which was constructed by synthesizing in a single Knowledge Graph from many different data sources[1]. What I am looking for especially is to know: >>>> >>>> How many entities or nodes the DCKG has?, understanding that dcid (DataCommons identifier) is a unique identifier assigned to each entity in the knowledge graph, furthermore entities are represented by nodes[2]. >>>> How many data sources the DCKG has?, because currently contains data from Wikipedia, the US Census, NOAA, FBI, etc?[3]. >>>> How many nodes and relations the DCKG has? and How many statements it has? >>>> >>>> For example, the statement "Santa Clara County is contained in the State of California" is represented in the graph as two nodes: "Santa Clara County" and "California" with an edge labeled "containedInPlace" pointing from Santa Clara to California. >>>> >>>> What is the current size of the used vocabulary in the DCKG?, taking into account that dataCommons.org builds upon on the vocabularies defined by Schema.org[4] >>>> These are potential FAQs for future researchers (of course there are more) >>>> >>>> Could you help me? >>>> >>>> cheers, >>>> Elwin Huaman >>>> >>>> >>>> [1] https://browser.datacommons.org/ >>>> [2] https://colab.research.google.com/drive/1vffnWktZyffk7pNfpuXrTsCpp-od5W47 >>>> [3] https://datacommons.org/ >>>> [4] https://datacommons.org/faq >>>> >> >> >> -- >> _____________________________ >> Elwin Huaman >> Web Engineer >> +34 671 529 151 >> +43 0677 631 129 47
Received on Wednesday, 21 November 2018 03:37:33 UTC