Re: DBPedia (was Re: Use Case: BetaNYC 3/5) from Eric Stephan on 2014-03-12 (public-dwbp-wg@w3.org from March 2014)

From: Eric Stephan <ericphb@gmail.com>
Date: Wed, 12 Mar 2014 05:42:22 -0700
To: Ivan Herman <ivan@w3.org>
Cc: Phil Archer <phila@w3.org>, Public DWBP WG <public-dwbp-wg@w3.org>
Message-ID: <CAMFz4ji9rHMVwQ9AALgmWg9G_YHW9QL9SASFkBxf0PM=k_hZdQ@mail.gmail.com>
I was working on the CSV working group use cases and trying to catch up :-)

>> - how can we describe the veracity and reliability of crowd-sourced data in the DCAT extension?

Phil - very interesting, I have to admit I haven't looked into the
background of the creators of DBPedia.

>> I will take the opportunity to seek their views on making use of additional vocabs in the ongoing work.

This would be fascinating to hear their perspective.

Eric


On Wed, Mar 12, 2014 at 4:56 AM, Ivan Herman <ivan@w3.org> wrote:
>
> On 12 Mar 2014, at 09:34 , Phil Archer <phila@w3.org> wrote:
>
>> Picking up on Antoine's much appreciated comment that is is important to be disciplined on mailing lists so that the subject matter is clear, I've made the subject line here more explicit.
>>
>> No one knows more about the Linked Data vocabulary landscape than the creators of DBPedia*. The project is evolving in various ways (I understand that there's now a new legal entity supporting it for example) and I'll see several of the 'DBPedians' next week in Athens. I will take the opportunity to seek their views on making use of additional vocabs in the ongoing work.
>>
>> The issues as I see them would be:
>> - given that DBPedia is auto-generated from Wikipedia, how realistic is it to make use of other vocabs?
>
> Without getting into the other issues below: the dbpedia mapping is not just a blind dump. They do mappings to other vocabularies, they actually generate their own vocabulary for many things (that they reuse in the data), etc. Ie, if somebody convinces them of the advantages of a particular vocabulary, it is certainly technically possible...
>
> (They actually have a mapping language that was documented in a paper somewhere, some sort of a precursor of R2RML, that can be easily modified if needed.)
>
> Ivan
>
>> - would their use be semantically accurate?
>> - if so, would the benefit of adding the extra triples outweigh the disadvantage of increasing the number of triples without actually increasing the informational content?
>>
>> These challenges/questions would apply to any existing large scale dataset. One that comes specifically from DBPedia:
>>
>> - how can we describe the veracity and reliability of crowd-sourced data in the DCAT extension?
>>
>> WDYT?
>>
>> Phil
>>
>> * I assume everyone is familiar with DBpedia. If not, please do some background reading - it is *the* seminal work in Data on the Web and is closely associated with TimBL's original principles of Linked Data and the 5 stars of Linked Open Data. What may be less well known, especially to non-European members, is that its creators are extremely well known in the Linked Data community (people like Soren Auer, Chris Bizer, Sebastian Hellmann etc.) DBPedia is at the centre of the LOD cloud diagram that, along with Anja Jentsch, Richard Cyganiak created while working on DBPedia (Richard named it and owns the domain name). He was also one of the originators of DCAT and has been an active member of many W3C WGs for many years, including the one that standardised the DCAT, ORG and QB vocabularies.
>>
>>
>> On 11/03/2014 11:51, Bernadette Farias Lóscio wrote:
>>> Hi Ig and Steve,
>>>
>>> I also think that this is a good idea! I also agree that the most important
>>>  task is related to use a vocab to foster trust and to describe
>>> metadata(schema).
>>>
>>> This is not an easy task, but I think this is a plausible one! However, it
>>> is important to keep in mind what kind of description could be interesting
>>> considering that the descriptions can be related with the whole dataset but
>>> also with some specific concepts. Does it make sense to you?
>>>
>>> Cheers,
>>> Bernadette
>>>
>>>
>>> 2014-03-10 19:34 GMT-03:00 Ig Ibert Bittencourt <ig.ibert@gmail.com>:
>>>
>>>> Hi Bernadette,
>>>>
>>>> Thanks.
>>>>
>>>> Yes. I know DBPedia provides an ontology, but as far as I know, it reuses
>>>> some vocabs (e.g. FOAF, Schema.org and Bibo) but few annotations about the
>>>> Classes are provided, such as rdfs:label and rdfs:comment. However, nothing
>>>> related to metadata describing where came from or how it was derived, and
>>>> so on (see first e-mail).
>>>>
>>>> So, I am talking vocabs like DC, Org (perharps aligning with schema.org)
>>>> and BIBO (extending the use). But I think the most important is to use a
>>>> vocab to foster trust. This is directly connect to the Quality and
>>>> Granularity Description Vocabulary (again, see the charter). That's why I
>>>> think a use case describing it could be interesting.
>>>>
>>>> Please, let me know if is plausible or not.
>>>>
>>>> All the best,
>>>> Ig
>>>>
>>>>
>>>>
>>>> 2014-03-10 17:35 GMT-03:00 Bernadette Farias Lóscio <bfl@cin.ufpe.br>:
>>>>
>>>> Hi Ig,
>>>>>
>>>>> DBpedia already uses a cross-domain ontology [1] to describe the concepts
>>>>> and relationships available in the DBpedia dataset. In this case, what kind
>>>>> of vocabs do you think that could be useful to use together with DBpedia?
>>>>> Could you please give some examples?
>>>>>
>>>>> Thanks!
>>>>>
>>>>> Cheers,
>>>>> Bernadette
>>>>>
>>>>> [1] http://wiki.dbpedia.org/Ontology
>>>>>
>>>>>
>>>>>
>>>>> 2014-03-10 14:21 GMT-03:00 Steven Adler <adler1@us.ibm.com>:
>>>>>
>>>>> So lets talk to DBpedia about that.  They already use RDF ...
>>>>>>
>>>>>> http://wiki.dbpedia.org/Datasets
>>>>>>
>>>>>>
>>>>>> Best Regards,
>>>>>>
>>>>>> Steve
>>>>>>
>>>>>> Motto: "Do First, Think, Do it Again"
>>>>>>
>>>>>>
>>>>>>  From: Ig Ibert Bittencourt <ig.ibert@gmail.com> To: Christophe Guéret <
>>>>>> christophe.gueret@dans.knaw.nl> Cc: Steven Adler/Somers/IBM@IBMUS,
>>>>>> Public DWBP WG <public-dwbp-wg@w3.org> Date: 03/10/2014 10:42 AM
>>>>>> Subject: Re: Use Case: BetaNYC 3/5
>>>>>> ------------------------------
>>>>>>
>>>>>>
>>>>>>
>>>>>> Hi Christophe,
>>>>>>
>>>>>> Thank you for your answer.
>>>>>>
>>>>>> You are right and I think that's the Steve's proposal to get DBpedia to
>>>>>> use the vocabs and build a use case on that. For example, one discussion in
>>>>>> this way is happening in the Public GLD is in this way [1].
>>>>>>
>>>>>> Well, perhaps it is still early, but one point for suggesting about the
>>>>>> use of the vocabs is because we are going to propose an extension of DCAT
>>>>>> [2] (according to the charter [3]) to Quality and Granularity Description
>>>>>> Vocabulary. Maybe this is not the best way, but I believe we need to deeply
>>>>>> understand such vocabs.
>>>>>>
>>>>>> All the Best,
>>>>>> Ig
>>>>>>
>>>>>> [1] *http://lists.w3.org/Archives/Public/public-gld-comments/2014Mar/*<http://lists.w3.org/Archives/Public/public-gld-comments/2014Mar/>
>>>>>> [2] *http://www.w3.org/TR/vocab-dcat/*<http://www.w3.org/TR/vocab-dcat/>
>>>>>> [3] *http://www.w3.org/2013/05/odbp-charter*<http://www.w3.org/2013/05/odbp-charter>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> 2014-03-10 6:54 GMT-03:00 Christophe Guéret <
>>>>>> *christophe.gueret@dans.knaw.nl* <christophe.gueret@dans.knaw.nl>>:
>>>>>> Hoi,
>>>>>>
>>>>>> Don't you think we should create some use cases focused on the usage of
>>>>>> PROV-O, QB, DCAT, ORG... ?
>>>>>> This sounds a bit awkward to me. I would have expected that the usage of
>>>>>> the vocabulary would be derived from the use-cases, and not the inverse.
>>>>>> If we make up use-cases to the aim of illustrating some best practices
>>>>>> these BP may be disconnected from the concrete happenings...
>>>>>> Rather, if we would like an existing use-case to use some vocabulary
>>>>>> instead of something of their own we can suggest this change and try to get
>>>>>> it implemented, and/or understand why this situation exists.
>>>>>>
>>>>>> Cheers,
>>>>>> Christophe
>>>>>>
>>>>>>
>>>>>> Best,
>>>>>> Ig
>>>>>>
>>>>>>
>>>>>> 2014-03-06 12:51 GMT-03:00 Steven Adler <*adler1@us.ibm.com*<adler1@us.ibm.com>
>>>>>>> :
>>>>>>
>>>>>> Last night, I attended another BetaNYC Hackathon in Brooklyn, where I
>>>>>> met another group of passionate citizens developing, and learning to
>>>>>> develop, fascinating apps for Smarter Cities.  This week we were about 15
>>>>>> people in the room, and we started with a lightning round of "what are you
>>>>>> working on" descriptions from project leads.  There were only three people
>>>>>> in the room who had participated in the hackathon the week prior, and this
>>>>>> is pretty normal.  BetaNYC has 1600 developers registered in their network
>>>>>> and every week coders rotate in and out of meetups and projects in an
>>>>>> endless and unplanned cycle that continuously inspires creativity and
>>>>>> motivation by showcasing new projects.
>>>>>>
>>>>>>
>>>>>>
>>>>>> The first project we heard about came from a local nonprofit called *Tomorrow
>>>>>> Lab* <http://tomorrow-lab.com/>, who have designed hardware that
>>>>>> measures how many bikes travel on streets they measure.  It uses simple
>>>>>> hardware and open source software that connects two sensors with a
>>>>>> pneumatic tube that measures impressions for weight and axel distance that
>>>>>> differentiates between bikes and cars.  Its called WayCount.  The text
>>>>>> below is from their website.  In the room we discussed how WayCount data
>>>>>> could be combined with NYPD crash reports to more accurately identify the
>>>>>> spots in NYC where bike accidents per bike numbers occur and identify ways
>>>>>> to remediate.
>>>>>>
>>>>>> WayCount is a platform for crowd-sourcing massive amounts of near
>>>>>> real-time automobile and bicycle traffic data from a nodal network of
>>>>>> inexpensive hardware devices.   For the first time ever, you can gather
>>>>>> accurate volume, rate, and speed measurements of automobiles and bicycles,
>>>>>> then easily upload and map the information to a central online database.
>>>>>>  The WayCount device works like other traffic counters, but has two key
>>>>>> differences: lower cost and open data. At 1/5th price of the least
>>>>>> expensive comparible product, WayCount is affordable. The WayCount Data
>>>>>> Uploader allows you to seamlessly upload and map your latest traffic count
>>>>>> data, making it instantly available to anyone online.
>>>>>> Collectively, the WayCount user community has the potential to build a
>>>>>> rich repository of traffic count data for bike paths, city alley ways,
>>>>>> neighborhood streets, and busy boulevards from around the world. With a
>>>>>> better understanding of automobile and bicycle ridership patterns, we can
>>>>>> inform the design of better cities and towns.
>>>>>>
>>>>>> The WayCount platform is an important addition to the process of
>>>>>> measuring the impact of transportation design, and creating livable streets
>>>>>> by adding bicycle lanes, public spaces, and developing smart transportation
>>>>>> management systems. By creating open-data, we can increase governmental
>>>>>> transparency, and provide constituencies with the essential data they need
>>>>>> to advocate for rational and necessary improvements to the design,
>>>>>> maintenance, and policy of transportation systems.
>>>>>>
>>>>>> The hardware and software of the WayCount device and website were
>>>>>> designed and engineered by Tomorrow Lab.
>>>>>>
>>>>>> WayCount devices are currently for sale on the website, *WayCount.com*<http://waycount.com/>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> We also discussed some ideas to provide policy makers with better
>>>>>> sources of Open Data to guide policy discussions, and then broke up into
>>>>>> four groups focusing on different projects.  One group discussed how to
>>>>>> save the New York Library on 42nd Street from the imminent transformation
>>>>>> of its main reading room and function as a lending library.  Another group
>>>>>> scraped web pages for NYPD crash data for an app comparing accident rates
>>>>>> across the 5 boroughs.  Some people just spent time talking about who they
>>>>>> are and what they want to work on, what they want to learn, and how to get
>>>>>> more involved.
>>>>>>
>>>>>> I spent an hour with a young programmer who had worked on the NYC
>>>>>> Property Tax Map I shared with you last week.  He showed me a Chrome Plugin
>>>>>> he is working on that provides data about leading politicians whenever
>>>>>> their names are mentioned on a webpage.  It is called Data Explorer for US
>>>>>> Politics and it provides some nifty data on things like campaign
>>>>>> contributions compared to committee assignments.
>>>>>>
>>>>>>
>>>>>>
>>>>>> I asked him where he got his data and he showed me *DBpedia*<http://dbpedia.org/About>,
>>>>>> which "is a crowd-sourced community effort to extract structured
>>>>>> information from *Wikipedia* <http://wikipedia.org/> and make this
>>>>>> information available on the Web. DBpedia allows you to ask sophisticated
>>>>>> queries against Wikipedia, and to link the different data sets on the
>>>>>> Web to Wikipedia data. We hope that this work will make it easier for the
>>>>>> huge amount of information in Wikipedia to be used in some new interesting
>>>>>> ways. Furthermore, it might inspire new mechanisms for navigating, linking,
>>>>>> and improving the encyclopedia itself. "
>>>>>>
>>>>>> Then I asked him how he knows that DBpedia data is accurate and reliable
>>>>>> and he just looked at me.  "It's on the internet..."  Yeah, and so where
>>>>>> weapons of mass destruction in Iraq.  But they were only on the internet
>>>>>> and never in Iraq.  And herein lies a huge problem about Open Data on the
>>>>>> Web; there is no corroboration of fact, no metadata describing where it
>>>>>> came from, how it was derived, calculated, presented.  No one attests to
>>>>>> its veracity, yet we all use it on faith which just ain't good enough.
>>>>>>
>>>>>> This is why we have the *W3C Data on the Web Best Practices Working
>>>>>> Group* <https://www.w3.org/2013/dwbp/wiki/Main_Page> - to create new
>>>>>> vocabulary and metadata standards that attach citations and lineage,
>>>>>> attestations and data quality metrics to Open Data so that everyone can
>>>>>> understand where it came from, how much to trust it, and even how to
>>>>>> improve it.
>>>>>>
>>>>>> At the end of the evening, we also discussed IBM Smarter Cities, the
>>>>>> Portland System Dynamics Demo, and the possibility of hosting a BetaNYC
>>>>>> meetup at IBM on 590 Madison Avenue.  It was a fascinating evening and I
>>>>>> encourage all to check out the links provided in this writeup and get out
>>>>>> and join a meetup near you.
>>>>>>
>>>>>> Talk to you tomorrow.
>>>>>>
>>>>>> Best Regards,
>>>>>>
>>>>>> Steve
>>>>>>
>>>>>> Motto: "Do First, Think, Do it Again"
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>>
>>>>>> Ig Ibert Bittencourt
>>>>>> Professor Adjunto III - Universidade Federal de Alagoas (UFAL)
>>>>>> Vice-Coordenador da Comissão Especial de Informática na Educação
>>>>>> Líder do Centro de Excelência em Tecnologias Sociais
>>>>>> Co-fundador da Startup MeuTutor Soluções Educacionais LTDA.
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Onderzoeker
>>>>>> *+31(0)6 14576494* <%2B31%280%296%2014576494>
>>>>>> *christophe.gueret@dans.knaw.nl* <christophe.gueret@dans.knaw.nl>
>>>>>>
>>>>>> *Data Archiving and Networked Services (DANS)*
>>>>>> DANS bevordert duurzame toegang tot digitale onderzoeksgegevens. Kijk op
>>>>>> *www.dans.knaw.nl* <http://www.dans.knaw.nl/> voor meer informatie.
>>>>>> DANS is een instituut van KNAW en NWO.
>>>>>>
>>>>>> Let op, per 1 januari hebben we een nieuw adres:
>>>>>> DANS | Anna van Saksenlaan 51 | 2593 HW Den Haag | Postbus 93067 | 2509
>>>>>> AB Den Haag | *+31 70 349 44 50* <%2B31%2070%20349%2044%2050> |
>>>>>> *info@dans.knaw.nl* <info@dans.kn> | www.dans.knaw.nl
>>>>>>
>>>>>> *Let's build a World Wide Semantic Web!*
>>>>>> *http://worldwidesemanticweb.org/* <http://worldwidesemanticweb.org/>
>>>>>>
>>>>>> * e-Humanities Group (KNAW)*
>>>>>>  <http://www.ehumanities.nl/>
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>>
>>>>>> Ig Ibert Bittencourt
>>>>>> Professor Adjunto III - Universidade Federal de Alagoas (UFAL)
>>>>>> Vice-Coordenador da Comissão Especial de Informática na Educação
>>>>>> Líder do Centro de Excelência em Tecnologias Sociais
>>>>>> Co-fundador da Startup MeuTutor Soluções Educacionais LTDA.
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Bernadette Farias Lóscio
>>>>> Centro de Informática
>>>>> Universidade Federal de Pernambuco - UFPE, Brazil
>>>>> ----------------------------------------------------------------------------
>>>>>
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>>
>>>> Ig Ibert Bittencourt
>>>>  Professor Adjunto III - Universidade Federal de Alagoas (UFAL)
>>>> Vice-Coordenador da Comissão Especial de Informática na Educação
>>>> Líder do Centro de Excelência em Tecnologias Sociais
>>>> Co-fundador da Startup MeuTutor Soluções Educacionais LTDA.
>>>>
>>>
>>>
>>>
>>
>> --
>>
>>
>> Phil Archer
>> W3C Data Activity Lead
>> http://www.w3.org/2013/data/
>>
>> http://philarcher.org
>> +44 (0)7887 767755
>> @philarcher1
>>
>
>
> ----
> Ivan Herman, W3C
> Digital Publishing Activity Lead
> Home: http://www.w3.org/People/Ivan/
> mobile: +31-641044153
> GPG: 0x343F1A3D
> FOAF: http://www.ivan-herman.net/foaf
>
>
>
>
>
Received on Wednesday, 12 March 2014 12:42:53 UTC