W3C home > Mailing lists > Public > public-lod@w3.org > November 2015

Re: Are there any datasets about companies? ( DBpedia Open Data Initiative)

From: Sebastian Hellmann <hellmann@informatik.uni-leipzig.de>
Date: Fri, 6 Nov 2015 07:58:53 +0100
To: Chris Taggart <countculture@gmail.com>, Rolf Kleef <rolf@openforchange.info>
Cc: Kay Müller <kay.mueller@informatik.uni-leipzig.de>, public-lod@w3.org
Message-ID: <563C4FAD.6090507@informatik.uni-leipzig.de>
Hi Chris,

> However, making sense of this data is very, very time consuming, not 
> to mentioned writing and maintaining bots  (we now have hundreds and 
> hundreds of them) to scrape jurisdictions that aren't open data (the 
> vast majority) takes significant resources, and we don't see any way 
> of sustaining this on a CC-BY licence. 
a) is the code for these bots somewhere?
b) we hope to find a way to maintain it this time. DBpedia has received 
funding via http://smartdataweb.de/ and also http://aligned-project.eu/
We are also currently  working on a charter for an non-profit 
association that is committed to keep all data open under cc-by (we are 
accepting donations, membership fees among other things)

> I could also write a book about corporate identifiers, and the issues 
> with those on the list (but don't have time).
We are writing such a book* in parallel, do you want to help?
Sebastian

*= well it's just a paper


On 05.11.2015 19:18, Chris Taggart wrote:
> Rolf etc
>
> Thanks for cc'ing me. We'd had contact from Sebastian and given him an 
> API key. The main issues here are sustainability and domain knowledge. 
> We'd love more people to be downloading the open datasets from the UK 
> and others, and using them in all sorts of innovative ways, and the 
> main reason we do the Open Company Data Index 
> <http://registries.opencorporates.com/>, is to motivate company 
> registers to opening up their data (I was speaking at the Open Govt 
> Partnership Summit in Mexico City last week on the same subject). 
> However, making sense of this data is very, very time consuming, not 
> to mentioned writing and maintaining bots  (we now have hundreds and 
> hundreds of them) to scrape jurisdictions that aren't open data (the 
> vast majority) takes significant resources, and we don't see any way 
> of sustaining this on a CC-BY licence.
>
> Finally, there are very few registers that are CC-BY licences or less 
> (for example Denmark places restrictions on use for marketing), even 
> ignoring DPA issues (we are now spending a considerable amount on 
> legal fees on this issue). I could also write a book about corporate 
> identifiers, and the issues with those on the list (but don't have time).
>
> So, we'd love to see more activity in the area, particularly in 
> Germany – where the Handelsregister and Bundesanzeiger are very 
> definitely not open data  ;-)
>
> Chris
>
> On 5 November 2015 at 12:49, Rolf Kleef <rolf@openforchange.info 
> <mailto:rolf@openforchange.info>> wrote:
>
>     Hi Sebastian, Kay,
>
>     If you haven't done it yet, I suggest getting in touch with Chris
>     Taggart of Open Corporates (cc'd). He has years of experience doing
>     this, and is also involved in cross-standards work on "organisational
>     identifiers", crucial in the development of for instance the Open
>     Contracting Data Standard and the International Aid Transparancy
>     Initiative:
>
>     http://www.open-contracting.org/
>     http://iatistandard.org/201/organisation-identifiers/
>
>     ~~Rolf.
>
>     On 03/11/15 16:17, Sebastian Hellmann wrote:
>     > [Apologies for cross-posting]
>     >
>     > Dear all,
>     > this message is part announcement of an open data initiative and
>     part
>     > call for feedback and support.
>     >
>     > We are considering to work on creating a free, open and
>     interoperable
>     > dataset on companies and organisations, which we are planing to
>     > integrate into DBpedia+ and offer as dump download. As we are in
>     a very
>     > early phase of the endeavour, we would like to know whether there is
>     > existing work in this area.
>     >
>     > We are looking for any available datasets which have information
>     about
>     > companies and other organizations in any language and any country.
>     > Ideally, the datasets are:
>     > 1. downloadable as dump
>     > 2. openly licensed , e.g. CC-BY following the
>     http://opendefinition.org/
>     > 3. in an easily parseable format, e.g. RDF or CSV and not PDF
>     >
>     > But hey! Send around anything you know, and we will look at it
>     and see
>     > whether we can make use of it. You can reach us either by
>     replying  to
>     > this email or send feedback directly to me and Kay Müller
>     > <kay.mueller@informatik.uni-leipzig.de
>     <mailto:kay.mueller@informatik.uni-leipzig.de>>.
>     > If you have any private/closed data, please contact us as well.
>     We might
>     > make use of it to cross-reference and validate public/open data
>     with it.
>     > Or just learn from it to build a good scheme.
>     >
>     > We started a link collection here (and attached the current
>     status at
>     > the end of this email)
>     >
>     https://docs.google.com/document/d/1IaWSSt4_SZVhypvB1QzBlCtBuMQHv-q5Ti0n8xoZFIQ/edit
>     > Also we started to collect potential identifiers for linking here:
>     >
>     https://docs.google.com/spreadsheets/d/1EMqemA1BlqvyOXGLzYbvY0IcBCAhaRd5XgYLMWIxGsA/edit#gid=0
>     >
>     > Regards and thank you for any support on this,
>     > Sebastian and Kay
>     >
>     > ##############################
>     >
>     >
>     https://docs.google.com/document/d/1IaWSSt4_SZVhypvB1QzBlCtBuMQHv-q5Ti0n8xoZFIQ/edit
>     >
>     >
>     > *
>     >
>     >
>     >   Open Company Data
>     >
>     > Open Company Data
>     >
>     <https://docs.google.com/document/d/1IaWSSt4_SZVhypvB1QzBlCtBuMQHv-q5Ti0n8xoZFIQ/edit#heading=h.buuo7dypfd9a>
>     >
>     > Identifiers for companies/organisation
>     >
>     <https://docs.google.com/document/d/1IaWSSt4_SZVhypvB1QzBlCtBuMQHv-q5Ti0n8xoZFIQ/edit#heading=h.qs150ivpio94>
>     >
>     > URIs (Linked Data/Semantic Web)
>     >
>     <https://docs.google.com/document/d/1IaWSSt4_SZVhypvB1QzBlCtBuMQHv-q5Ti0n8xoZFIQ/edit#heading=h.b9yeovqjeglz>
>     >
>     > Downloadable Datasets with Company info (confirmed)
>     >
>     <https://docs.google.com/document/d/1IaWSSt4_SZVhypvB1QzBlCtBuMQHv-q5Ti0n8xoZFIQ/edit#heading=h.7ihxrlrypp14>
>     >
>     > Portals with no bulk downloads
>     >
>     <https://docs.google.com/document/d/1IaWSSt4_SZVhypvB1QzBlCtBuMQHv-q5Ti0n8xoZFIQ/edit#heading=h.a95o85lqil72>
>     >
>     > Portals, we will still need to investigate
>     >
>     <https://docs.google.com/document/d/1IaWSSt4_SZVhypvB1QzBlCtBuMQHv-q5Ti0n8xoZFIQ/edit#heading=h.p50bjh96q3ok>
>     >
>     >
>     >
>     >     Identifiers for companies/organisation
>     >
>     > Table with identifiers:
>     >
>     >
>     <https://docs.google.com/spreadsheets/d/1EMqemA1BlqvyOXGLzYbvY0IcBCAhaRd5XgYLMWIxGsA/edit#gid=0>https://docs.google.com/spreadsheets/d/1EMqemA1BlqvyOXGLzYbvY0IcBCAhaRd5XgYLMWIxGsA/edit#gid=0
>     >
>     >
>     >       URIs (Linked Data/Semantic Web)
>     >
>     >   *
>     >
>     >     DBpedia/Wikipedia/Wikidata URIs -
>     <http://dbpedia.org>http://dbpedia.org
>     >
>     >   *
>     >
>     >     LinkedGeoData -
>     <http://linkedgeodata.org/>http://linkedgeodata.org/
>     >
>     >
>     >     DownloadableDatasets with Company info (confirmed)
>     >
>     >   *
>     >
>     >     VIAF - <http://viaf.org/viaf/data/>http://viaf.org/viaf/data/
>     >
>     >   *
>     >
>     >     DBpedia -
>     >   
>      <http://downloads.dbpedia.org/current/core/>http://downloads.dbpedia.org/current/core/
>     >
>     >   *
>     >
>     >     Wikidata -
>     >   
>      <http://downloads.dbpedia.org/current/ext/wikidata/>http://downloads.dbpedia.org/current/ext/wikidata/
>     >
>     >   *
>     >
>     >     LinkedGeoData -
>     >   
>      <http://downloads.linkedgeodata.org/releases/>http://downloads.linkedgeodata.org/releases/
>     >
>     >   *
>     >
>     >     Company Data Index:
>     >   
>      <http://index.okfn.org/dataset/companies/>http://index.okfn.org/dataset/companies/
>     >
>     >       o
>     >
>     >         e.g. UK company data:
>     >       
>      <http://download.companieshouse.gov.uk/en_output.html>http://download.companieshouse.gov.uk/en_output.html
>     >
>     >
>     >     Portals with no bulk downloads
>     >
>     >   *
>     >
>     >     <https://opencorporates.com/>https://opencorporates.com/
>     >
>     >   *
>     >
>     >   
>      <http://registries.opencorporates.com/>http://registries.opencorporates.com/
>     >
>     >
>     >     Portals, we will still need to investigate
>     >
>     >
>     >   *
>     >
>     >     <https://www.wlw.de/>https://www.wlw.de/
>     >
>     >   *
>     >
>     >     <https://www.crunchbase.com>https://www.crunchbase.com
>     >
>     >   *
>     >
>     >   
>      <http://data.crunchbase.com/v3/page/crunchbase-open-data-map-odm>http://data.crunchbase.com/v3/page/crunchbase-open-data-map-odm
>     >
>     >   *
>     >
>     >     <http://www.industrystock.de>http://www.industrystock.de
>     >
>     >   *
>     >
>     >     <http://www.ebr.org/>http://www.ebr.org/
>     >
>     >   *
>     >
>     >   
>      <https://simfin.com/data/browse/companies>https://simfin.com/data/browse/companies
>     >
>     >   *
>     >
>     >     <http://c-lei.org/>http://c-lei.org/
>     >
>     >   *
>     >
>     >     <http://data.imf.org/>http://data.imf.org/
>     >
>     >   *
>     >
>     >   
>      <http://worldbank.270a.info/.html>http://worldbank.270a.info/.html
>     >
>     >   *
>     >
>     >   
>      <http://datacatalog.worldbank.org/>http://datacatalog.worldbank.org/
>     >
>     >   *
>     >
>     >     <http://www.europages.com/>http://www.europages.com/
>     >
>     >   *
>     >
>     >     <http://www.sec.gov/data>http://www.sec.gov/data
>     >
>     >   *
>     >
>     >   
>      <http://faculty.philau.edu/russowl/industry.html>http://faculty.philau.edu/russowl/industry.html
>     >
>     >   *
>     >
>     >     USA: http://www.corporationwiki.com/
>     >
>     >   *
>     >
>     >     India: http://www.companywiki.in/
>     >
>     >   *
>     >
>     >     Handelsregister: www.Handelsregister.de
>     <http://www.Handelsregister.de>
>     >
>     >   *
>     >
>     >     Creditreform: http://www.creditsafetrial.com/de/?country=DE
>     >
>     >   *
>     >
>     >     Bürgel: https://www.buergel.de/en
>     >
>     >   *
>     >
>     >     Factiva:
>     > https://global.factiva.com/factivalogin/login.asp?productname=global
>     >
>     >   *
>     >
>     >
>     > Interesting Links:
>     >
>     >   *
>     >
>     >     German
>     >   
>      <http://get.torial.com/blog/2014/02/die-besten-quellen-fuer-wirtschaftsjournalisten-teil-1/>http://get.torial.com/blog/2014/02/die-besten-quellen-fuer-wirtschaftsjournalisten-teil-1/
>     >
>     >   *
>     >
>     >   
>      <http://get.torial.com/blog/2014/02/die-besten-quellen-fuer-wirtschaftsjournalisten-teil-2/>http://get.torial.com/blog/2014/02/die-besten-quellen-fuer-wirtschaftsjournalisten-teil-2/
>     >
>     > *
>     >
>     > --
>     > Sebastian Hellmann
>     > AKSW/KILT research group
>     > Insitute for Applied Informatics (InfAI) at Leipzig University
>     > DBpedia Association
>     > Events:
>     > * *Nov 20th, 2015* Extended Deadline for Quality Management of
>     Semantic
>     > Web Assets (Data, Services and Systems)
>     >
>     <http://www.semantic-web-journal.net/blog/call-papers-special-issue-quality-management-semantic-web-assets-data-services-and-systems>
>     > Venha para a Alemanha como PhD:
>     >
>     <http://bis.informatik.uni-leipzig.de/csf>http://bis.informatik.uni-leipzig.de/csf
>     > Projects: http://dbpedia.org, http://nlp2rdf.org,
>     > <http://linguistics.okfn.org>http://linguistics.okfn.org,
>     > https://www.w3.org/community/ld4lt
>     <http://www.w3.org/community/ld4lt>
>     > Homepage: http://aksw.org/SebastianHellmann
>     > Research Group: http://aksw.org
>     > Thesis:
>     > http://tinyurl.com/sh-thesis-summary
>     > http://tinyurl.com/sh-thesis
>
>     --
>     Rolf Kleef                Open for Change, network for open
>     development
>     rolf@openforchange.info <mailto:rolf@openforchange.info>
>     +31617232772 <tel:%2B31617232772> @rolfkleef
>     www.openforchange.info <http://www.openforchange.info>
>
>     Internet trailblazer. Weaving the web to help humanity. Implementing
>     open data, open organisations and online collaboration in civil
>     society.
>
>
>
>
> -- 
> -------------------------------------------------------
> OpenCorporates :: The Open Database of the Corporate World 
> http://opencorporates.com
> OpenlyLocal :: Making Local Government More Transparent 
> http://openlylocal.com
> Blog: http://countculture.wordpress.com
> Twitter: http://twitter.com/CountCulture


-- 
Sebastian Hellmann
AKSW/KILT research group
Insitute for Applied Informatics (InfAI) at Leipzig University
DBpedia Association
Events:
* *Nov 20th, 2015* Extended Deadline for Quality Management of Semantic 
Web Assets (Data, Services and Systems) 
<http://www.semantic-web-journal.net/blog/call-papers-special-issue-quality-management-semantic-web-assets-data-services-and-systems>
Venha para a Alemanha como PhD: http://bis.informatik.uni-leipzig.de/csf
Projects: http://dbpedia.org, http://nlp2rdf.org, 
http://linguistics.okfn.org, https://www.w3.org/community/ld4lt 
<http://www.w3.org/community/ld4lt>
Homepage: http://aksw.org/SebastianHellmann
Research Group: http://aksw.org
Thesis:
http://tinyurl.com/sh-thesis-summary
http://tinyurl.com/sh-thesis
Received on Friday, 6 November 2015 06:59:31 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 16:22:27 UTC