- From: Sebastian Hellmann <hellmann@informatik.uni-leipzig.de>
- Date: Fri, 6 Nov 2015 07:58:53 +0100
- To: Chris Taggart <countculture@gmail.com>, Rolf Kleef <rolf@openforchange.info>
- Cc: Kay Müller <kay.mueller@informatik.uni-leipzig.de>, public-lod@w3.org
- Message-ID: <563C4FAD.6090507@informatik.uni-leipzig.de>
Hi Chris, > However, making sense of this data is very, very time consuming, not > to mentioned writing and maintaining bots (we now have hundreds and > hundreds of them) to scrape jurisdictions that aren't open data (the > vast majority) takes significant resources, and we don't see any way > of sustaining this on a CC-BY licence. a) is the code for these bots somewhere? b) we hope to find a way to maintain it this time. DBpedia has received funding via http://smartdataweb.de/ and also http://aligned-project.eu/ We are also currently working on a charter for an non-profit association that is committed to keep all data open under cc-by (we are accepting donations, membership fees among other things) > I could also write a book about corporate identifiers, and the issues > with those on the list (but don't have time). We are writing such a book* in parallel, do you want to help? Sebastian *= well it's just a paper On 05.11.2015 19:18, Chris Taggart wrote: > Rolf etc > > Thanks for cc'ing me. We'd had contact from Sebastian and given him an > API key. The main issues here are sustainability and domain knowledge. > We'd love more people to be downloading the open datasets from the UK > and others, and using them in all sorts of innovative ways, and the > main reason we do the Open Company Data Index > <http://registries.opencorporates.com/>, is to motivate company > registers to opening up their data (I was speaking at the Open Govt > Partnership Summit in Mexico City last week on the same subject). > However, making sense of this data is very, very time consuming, not > to mentioned writing and maintaining bots (we now have hundreds and > hundreds of them) to scrape jurisdictions that aren't open data (the > vast majority) takes significant resources, and we don't see any way > of sustaining this on a CC-BY licence. > > Finally, there are very few registers that are CC-BY licences or less > (for example Denmark places restrictions on use for marketing), even > ignoring DPA issues (we are now spending a considerable amount on > legal fees on this issue). I could also write a book about corporate > identifiers, and the issues with those on the list (but don't have time). > > So, we'd love to see more activity in the area, particularly in > Germany – where the Handelsregister and Bundesanzeiger are very > definitely not open data ;-) > > Chris > > On 5 November 2015 at 12:49, Rolf Kleef <rolf@openforchange.info > <mailto:rolf@openforchange.info>> wrote: > > Hi Sebastian, Kay, > > If you haven't done it yet, I suggest getting in touch with Chris > Taggart of Open Corporates (cc'd). He has years of experience doing > this, and is also involved in cross-standards work on "organisational > identifiers", crucial in the development of for instance the Open > Contracting Data Standard and the International Aid Transparancy > Initiative: > > http://www.open-contracting.org/ > http://iatistandard.org/201/organisation-identifiers/ > > ~~Rolf. > > On 03/11/15 16:17, Sebastian Hellmann wrote: > > [Apologies for cross-posting] > > > > Dear all, > > this message is part announcement of an open data initiative and > part > > call for feedback and support. > > > > We are considering to work on creating a free, open and > interoperable > > dataset on companies and organisations, which we are planing to > > integrate into DBpedia+ and offer as dump download. As we are in > a very > > early phase of the endeavour, we would like to know whether there is > > existing work in this area. > > > > We are looking for any available datasets which have information > about > > companies and other organizations in any language and any country. > > Ideally, the datasets are: > > 1. downloadable as dump > > 2. openly licensed , e.g. CC-BY following the > http://opendefinition.org/ > > 3. in an easily parseable format, e.g. RDF or CSV and not PDF > > > > But hey! Send around anything you know, and we will look at it > and see > > whether we can make use of it. You can reach us either by > replying to > > this email or send feedback directly to me and Kay Müller > > <kay.mueller@informatik.uni-leipzig.de > <mailto:kay.mueller@informatik.uni-leipzig.de>>. > > If you have any private/closed data, please contact us as well. > We might > > make use of it to cross-reference and validate public/open data > with it. > > Or just learn from it to build a good scheme. > > > > We started a link collection here (and attached the current > status at > > the end of this email) > > > https://docs.google.com/document/d/1IaWSSt4_SZVhypvB1QzBlCtBuMQHv-q5Ti0n8xoZFIQ/edit > > Also we started to collect potential identifiers for linking here: > > > https://docs.google.com/spreadsheets/d/1EMqemA1BlqvyOXGLzYbvY0IcBCAhaRd5XgYLMWIxGsA/edit#gid=0 > > > > Regards and thank you for any support on this, > > Sebastian and Kay > > > > ############################## > > > > > https://docs.google.com/document/d/1IaWSSt4_SZVhypvB1QzBlCtBuMQHv-q5Ti0n8xoZFIQ/edit > > > > > > * > > > > > > Open Company Data > > > > Open Company Data > > > <https://docs.google.com/document/d/1IaWSSt4_SZVhypvB1QzBlCtBuMQHv-q5Ti0n8xoZFIQ/edit#heading=h.buuo7dypfd9a> > > > > Identifiers for companies/organisation > > > <https://docs.google.com/document/d/1IaWSSt4_SZVhypvB1QzBlCtBuMQHv-q5Ti0n8xoZFIQ/edit#heading=h.qs150ivpio94> > > > > URIs (Linked Data/Semantic Web) > > > <https://docs.google.com/document/d/1IaWSSt4_SZVhypvB1QzBlCtBuMQHv-q5Ti0n8xoZFIQ/edit#heading=h.b9yeovqjeglz> > > > > Downloadable Datasets with Company info (confirmed) > > > <https://docs.google.com/document/d/1IaWSSt4_SZVhypvB1QzBlCtBuMQHv-q5Ti0n8xoZFIQ/edit#heading=h.7ihxrlrypp14> > > > > Portals with no bulk downloads > > > <https://docs.google.com/document/d/1IaWSSt4_SZVhypvB1QzBlCtBuMQHv-q5Ti0n8xoZFIQ/edit#heading=h.a95o85lqil72> > > > > Portals, we will still need to investigate > > > <https://docs.google.com/document/d/1IaWSSt4_SZVhypvB1QzBlCtBuMQHv-q5Ti0n8xoZFIQ/edit#heading=h.p50bjh96q3ok> > > > > > > > > Identifiers for companies/organisation > > > > Table with identifiers: > > > > > <https://docs.google.com/spreadsheets/d/1EMqemA1BlqvyOXGLzYbvY0IcBCAhaRd5XgYLMWIxGsA/edit#gid=0>https://docs.google.com/spreadsheets/d/1EMqemA1BlqvyOXGLzYbvY0IcBCAhaRd5XgYLMWIxGsA/edit#gid=0 > > > > > > URIs (Linked Data/Semantic Web) > > > > * > > > > DBpedia/Wikipedia/Wikidata URIs - > <http://dbpedia.org>http://dbpedia.org > > > > * > > > > LinkedGeoData - > <http://linkedgeodata.org/>http://linkedgeodata.org/ > > > > > > DownloadableDatasets with Company info (confirmed) > > > > * > > > > VIAF - <http://viaf.org/viaf/data/>http://viaf.org/viaf/data/ > > > > * > > > > DBpedia - > > > <http://downloads.dbpedia.org/current/core/>http://downloads.dbpedia.org/current/core/ > > > > * > > > > Wikidata - > > > <http://downloads.dbpedia.org/current/ext/wikidata/>http://downloads.dbpedia.org/current/ext/wikidata/ > > > > * > > > > LinkedGeoData - > > > <http://downloads.linkedgeodata.org/releases/>http://downloads.linkedgeodata.org/releases/ > > > > * > > > > Company Data Index: > > > <http://index.okfn.org/dataset/companies/>http://index.okfn.org/dataset/companies/ > > > > o > > > > e.g. UK company data: > > > <http://download.companieshouse.gov.uk/en_output.html>http://download.companieshouse.gov.uk/en_output.html > > > > > > Portals with no bulk downloads > > > > * > > > > <https://opencorporates.com/>https://opencorporates.com/ > > > > * > > > > > <http://registries.opencorporates.com/>http://registries.opencorporates.com/ > > > > > > Portals, we will still need to investigate > > > > > > * > > > > <https://www.wlw.de/>https://www.wlw.de/ > > > > * > > > > <https://www.crunchbase.com>https://www.crunchbase.com > > > > * > > > > > <http://data.crunchbase.com/v3/page/crunchbase-open-data-map-odm>http://data.crunchbase.com/v3/page/crunchbase-open-data-map-odm > > > > * > > > > <http://www.industrystock.de>http://www.industrystock.de > > > > * > > > > <http://www.ebr.org/>http://www.ebr.org/ > > > > * > > > > > <https://simfin.com/data/browse/companies>https://simfin.com/data/browse/companies > > > > * > > > > <http://c-lei.org/>http://c-lei.org/ > > > > * > > > > <http://data.imf.org/>http://data.imf.org/ > > > > * > > > > > <http://worldbank.270a.info/.html>http://worldbank.270a.info/.html > > > > * > > > > > <http://datacatalog.worldbank.org/>http://datacatalog.worldbank.org/ > > > > * > > > > <http://www.europages.com/>http://www.europages.com/ > > > > * > > > > <http://www.sec.gov/data>http://www.sec.gov/data > > > > * > > > > > <http://faculty.philau.edu/russowl/industry.html>http://faculty.philau.edu/russowl/industry.html > > > > * > > > > USA: http://www.corporationwiki.com/ > > > > * > > > > India: http://www.companywiki.in/ > > > > * > > > > Handelsregister: www.Handelsregister.de > <http://www.Handelsregister.de> > > > > * > > > > Creditreform: http://www.creditsafetrial.com/de/?country=DE > > > > * > > > > Bürgel: https://www.buergel.de/en > > > > * > > > > Factiva: > > https://global.factiva.com/factivalogin/login.asp?productname=global > > > > * > > > > > > Interesting Links: > > > > * > > > > German > > > <http://get.torial.com/blog/2014/02/die-besten-quellen-fuer-wirtschaftsjournalisten-teil-1/>http://get.torial.com/blog/2014/02/die-besten-quellen-fuer-wirtschaftsjournalisten-teil-1/ > > > > * > > > > > <http://get.torial.com/blog/2014/02/die-besten-quellen-fuer-wirtschaftsjournalisten-teil-2/>http://get.torial.com/blog/2014/02/die-besten-quellen-fuer-wirtschaftsjournalisten-teil-2/ > > > > * > > > > -- > > Sebastian Hellmann > > AKSW/KILT research group > > Insitute for Applied Informatics (InfAI) at Leipzig University > > DBpedia Association > > Events: > > * *Nov 20th, 2015* Extended Deadline for Quality Management of > Semantic > > Web Assets (Data, Services and Systems) > > > <http://www.semantic-web-journal.net/blog/call-papers-special-issue-quality-management-semantic-web-assets-data-services-and-systems> > > Venha para a Alemanha como PhD: > > > <http://bis.informatik.uni-leipzig.de/csf>http://bis.informatik.uni-leipzig.de/csf > > Projects: http://dbpedia.org, http://nlp2rdf.org, > > <http://linguistics.okfn.org>http://linguistics.okfn.org, > > https://www.w3.org/community/ld4lt > <http://www.w3.org/community/ld4lt> > > Homepage: http://aksw.org/SebastianHellmann > > Research Group: http://aksw.org > > Thesis: > > http://tinyurl.com/sh-thesis-summary > > http://tinyurl.com/sh-thesis > > -- > Rolf Kleef Open for Change, network for open > development > rolf@openforchange.info <mailto:rolf@openforchange.info> > +31617232772 <tel:%2B31617232772> @rolfkleef > www.openforchange.info <http://www.openforchange.info> > > Internet trailblazer. Weaving the web to help humanity. Implementing > open data, open organisations and online collaboration in civil > society. > > > > > -- > ------------------------------------------------------- > OpenCorporates :: The Open Database of the Corporate World > http://opencorporates.com > OpenlyLocal :: Making Local Government More Transparent > http://openlylocal.com > Blog: http://countculture.wordpress.com > Twitter: http://twitter.com/CountCulture -- Sebastian Hellmann AKSW/KILT research group Insitute for Applied Informatics (InfAI) at Leipzig University DBpedia Association Events: * *Nov 20th, 2015* Extended Deadline for Quality Management of Semantic Web Assets (Data, Services and Systems) <http://www.semantic-web-journal.net/blog/call-papers-special-issue-quality-management-semantic-web-assets-data-services-and-systems> Venha para a Alemanha como PhD: http://bis.informatik.uni-leipzig.de/csf Projects: http://dbpedia.org, http://nlp2rdf.org, http://linguistics.okfn.org, https://www.w3.org/community/ld4lt <http://www.w3.org/community/ld4lt> Homepage: http://aksw.org/SebastianHellmann Research Group: http://aksw.org Thesis: http://tinyurl.com/sh-thesis-summary http://tinyurl.com/sh-thesis
Received on Friday, 6 November 2015 06:59:31 UTC