Re: DBPedia (was Re: Use Case: BetaNYC 3/5) from Steven Adler on 2014-03-18 (public-dwbp-wg@w3.org from March 2014)

From: Steven Adler <adler1@us.ibm.com>
Date: Tue, 18 Mar 2014 16:30:59 -0400
To: Ig Ibert Bittencourt <ig.ibert@gmail.com>
Cc: Eric Stephan <ericphb@gmail.com>, Ivan Herman <ivan@w3.org>, Phil Archer <phila@w3.org>, Public DWBP WG <public-dwbp-wg@w3.org>
Message-ID: <OFF19A5C3C.B9949707-ON85257C9F.00706796-85257C9F.0070B350@us.ibm.com>
Yes, great timing that Phil will meet them in Athens!


Best Regards,

Steve

Motto: "Do First, Think, Do it Again"



From:
Ig Ibert Bittencourt <ig.ibert@gmail.com>
To:
Steven Adler/Somers/IBM@IBMUS
Cc:
Eric Stephan <ericphb@gmail.com>, Ivan Herman <ivan@w3.org>, Phil Archer 
<phila@w3.org>, Public DWBP WG <public-dwbp-wg@w3.org>
Date:
03/18/2014 12:44 PM
Subject:
Re: DBPedia (was Re: Use Case: BetaNYC 3/5)



Hi Steven,

I think Phil will meet them in Athens this week. 

Perhaps the first thing to know about their opinion with regards DCAT 
Extension and DBPedia. If it makes sense to them, I think we could try 
invite them to a Use Case Webinar. Does that make sense to you?

Best,
Ig


2014-03-17 16:58 GMT-03:00 Steven Adler <adler1@us.ibm.com>:
Is anyone taking the todo to contact DBpedia and ask them to participate 
in a Use Case Webinar?  I think the best way to engage people is in real 
conversation on the phone or in person.  Sounds like several members of 
our group already have relationships with DBpedia.  Would someone like to 
call them? 


Best Regards,

Steve

Motto: "Do First, Think, Do it Again" 


From: 
Ig Ibert Bittencourt <ig.ibert@gmail.com> 
To: 
Eric Stephan <ericphb@gmail.com> 
Cc: 
Ivan Herman <ivan@w3.org>, Phil Archer <phila@w3.org>, Public DWBP WG <
public-dwbp-wg@w3.org> 
Date: 
03/13/2014 09:00 AM 
Subject: 
Re: DBPedia (was Re: Use Case: BetaNYC 3/5)





Hi Phil, +1 

That's excellent. This meeting in Athens is a great opportunity. 

Ghislain, thank you for sending to SWJ paper. 

Best, 
Ig 




2014-03-12 9:42 GMT-03:00 Eric Stephan <ericphb@gmail.com>: 
I was working on the CSV working group use cases and trying to catch up 
:-) 

>> - how can we describe the veracity and reliability of crowd-sourced 
data in the DCAT extension?

Phil - very interesting, I have to admit I haven't looked into the
background of the creators of DBPedia. 

>> I will take the opportunity to seek their views on making use of 
additional vocabs in the ongoing work.

This would be fascinating to hear their perspective.

Eric 


On Wed, Mar 12, 2014 at 4:56 AM, Ivan Herman <ivan@w3.org> wrote:
>
> On 12 Mar 2014, at 09:34 , Phil Archer <phila@w3.org> wrote:
>
>> Picking up on Antoine's much appreciated comment that is is important 
to be disciplined on mailing lists so that the subject matter is clear, 
I've made the subject line here more explicit.
>>
>> No one knows more about the Linked Data vocabulary landscape than the 
creators of DBPedia*. The project is evolving in various ways (I 
understand that there's now a new legal entity supporting it for example) 
and I'll see several of the 'DBPedians' next week in Athens. I will take 
the opportunity to seek their views on making use of additional vocabs in 
the ongoing work.
>>
>> The issues as I see them would be:
>> - given that DBPedia is auto-generated from Wikipedia, how realistic is 
it to make use of other vocabs?
>
> Without getting into the other issues below: the dbpedia mapping is not 
just a blind dump. They do mappings to other vocabularies, they actually 
generate their own vocabulary for many things (that they reuse in the 
data), etc. Ie, if somebody convinces them of the advantages of a 
particular vocabulary, it is certainly technically possible...
>
> (They actually have a mapping language that was documented in a paper 
somewhere, some sort of a precursor of R2RML, that can be easily modified 
if needed.)
>
> Ivan
>
>> - would their use be semantically accurate?
>> - if so, would the benefit of adding the extra triples outweigh the 
disadvantage of increasing the number of triples without actually 
increasing the informational content?
>>
>> These challenges/questions would apply to any existing large scale 
dataset. One that comes specifically from DBPedia:
>>
>> - how can we describe the veracity and reliability of crowd-sourced 
data in the DCAT extension?
>>
>> WDYT?
>>
>> Phil
>>
>> * I assume everyone is familiar with DBpedia. If not, please do some 
background reading - it is *the* seminal work in Data on the Web and is 
closely associated with TimBL's original principles of Linked Data and the 
5 stars of Linked Open Data. What may be less well known, especially to 
non-European members, is that its creators are extremely well known in the 
Linked Data community (people like Soren Auer, Chris Bizer, Sebastian 
Hellmann etc.) DBPedia is at the centre of the LOD cloud diagram that, 
along with Anja Jentsch, Richard Cyganiak created while working on DBPedia 
(Richard named it and owns the domain name). He was also one of the 
originators of DCAT and has been an active member of many W3C WGs for many 
years, including the one that standardised the DCAT, ORG and QB 
vocabularies.
>>
>>
>> On 11/03/2014 11:51, Bernadette Farias Lóscio wrote:
>>> Hi Ig and Steve,
>>>
>>> I also think that this is a good idea! I also agree that the most 
important
>>>  task is related to use a vocab to foster trust and to describe
>>> metadata(schema).
>>>
>>> This is not an easy task, but I think this is a plausible one! 
However, it
>>> is important to keep in mind what kind of description could be 
interesting
>>> considering that the descriptions can be related with the whole 
dataset but
>>> also with some specific concepts. Does it make sense to you?
>>>
>>> Cheers,
>>> Bernadette
>>>
>>>
>>> 2014-03-10 19:34 GMT-03:00 Ig Ibert Bittencourt <ig.ibert@gmail.com>:
>>>
>>>> Hi Bernadette,
>>>>
>>>> Thanks.
>>>>
>>>> Yes. I know DBPedia provides an ontology, but as far as I know, it 
reuses
>>>> some vocabs (e.g. FOAF, Schema.org and Bibo) but few annotations 
about the
>>>> Classes are provided, such as rdfs:label and rdfs:comment. However, 
nothing
>>>> related to metadata describing where came from or how it was derived, 
and
>>>> so on (see first e-mail).
>>>>
>>>> So, I am talking vocabs like DC, Org (perharps aligning with 
schema.org)
>>>> and BIBO (extending the use). But I think the most important is to 
use a
>>>> vocab to foster trust. This is directly connect to the Quality and
>>>> Granularity Description Vocabulary (again, see the charter). That's 
why I
>>>> think a use case describing it could be interesting.
>>>>
>>>> Please, let me know if is plausible or not.
>>>>
>>>> All the best,
>>>> Ig
>>>>
>>>>
>>>>
>>>> 2014-03-10 17:35 GMT-03:00 Bernadette Farias Lóscio <bfl@cin.ufpe.br
>:
>>>>
>>>> Hi Ig,
>>>>>
>>>>> DBpedia already uses a cross-domain ontology [1] to describe the 
concepts
>>>>> and relationships available in the DBpedia dataset. In this case, 
what kind
>>>>> of vocabs do you think that could be useful to use together with 
DBpedia?
>>>>> Could you please give some examples?
>>>>>
>>>>> Thanks!
>>>>>
>>>>> Cheers,
>>>>> Bernadette
>>>>>
>>>>> [1] http://wiki.dbpedia.org/Ontology
>>>>>
>>>>>
>>>>>
>>>>> 2014-03-10 14:21 GMT-03:00 Steven Adler <adler1@us.ibm.com>:
>>>>>
>>>>> So lets talk to DBpedia about that.  They already use RDF ...
>>>>>>
>>>>>> http://wiki.dbpedia.org/Datasets
>>>>>>
>>>>>>
>>>>>> Best Regards,
>>>>>>
>>>>>> Steve
>>>>>>
>>>>>> Motto: "Do First, Think, Do it Again"
>>>>>>
>>>>>>
>>>>>>  From: Ig Ibert Bittencourt <ig.ibert@gmail.com> To: Christophe 
Guéret <
>>>>>> christophe.gueret@dans.knaw.nl> Cc: Steven Adler/Somers/IBM@IBMUS,
>>>>>> Public DWBP WG <public-dwbp-wg@w3.org> Date: 03/10/2014 10:42 AM
>>>>>> Subject: Re: Use Case: BetaNYC 3/5
>>>>>> ------------------------------
>>>>>>
>>>>>>
>>>>>>
>>>>>> Hi Christophe,
>>>>>>
>>>>>> Thank you for your answer.
>>>>>>
>>>>>> You are right and I think that's the Steve's proposal to get 
DBpedia to
>>>>>> use the vocabs and build a use case on that. For example, one 
discussion in
>>>>>> this way is happening in the Public GLD is in this way [1].
>>>>>>
>>>>>> Well, perhaps it is still early, but one point for suggesting about 
the
>>>>>> use of the vocabs is because we are going to propose an extension 
of DCAT
>>>>>> [2] (according to the charter [3]) to Quality and Granularity 
Description
>>>>>> Vocabulary. Maybe this is not the best way, but I believe we need 
to deeply
>>>>>> understand such vocabs.
>>>>>>
>>>>>> All the Best,
>>>>>> Ig
>>>>>>
>>>>>> [1] *
http://lists.w3.org/Archives/Public/public-gld-comments/2014Mar/*<
http://lists.w3.org/Archives/Public/public-gld-comments/2014Mar/>
>>>>>> [2] *http://www.w3.org/TR/vocab-dcat/*<
http://www.w3.org/TR/vocab-dcat/>
>>>>>> [3] *http://www.w3.org/2013/05/odbp-charter*<
http://www.w3.org/2013/05/odbp-charter>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> 2014-03-10 6:54 GMT-03:00 Christophe Guéret <
>>>>>> *christophe.gueret@dans.knaw.nl* <christophe.gueret@dans.knaw.nl>>:
>>>>>> Hoi,
>>>>>>
>>>>>> Don't you think we should create some use cases focused on the 
usage of
>>>>>> PROV-O, QB, DCAT, ORG... ?
>>>>>> This sounds a bit awkward to me. I would have expected that the 
usage of
>>>>>> the vocabulary would be derived from the use-cases, and not the 
inverse.
>>>>>> If we make up use-cases to the aim of illustrating some best 
practices
>>>>>> these BP may be disconnected from the concrete happenings...
>>>>>> Rather, if we would like an existing use-case to use some 
vocabulary
>>>>>> instead of something of their own we can suggest this change and 
try to get
>>>>>> it implemented, and/or understand why this situation exists.
>>>>>>
>>>>>> Cheers,
>>>>>> Christophe
>>>>>>
>>>>>>
>>>>>> Best,
>>>>>> Ig
>>>>>>
>>>>>>
>>>>>> 2014-03-06 12:51 GMT-03:00 Steven Adler <*adler1@us.ibm.com*<
adler1@us.ibm.com>
>>>>>>> :
>>>>>>
>>>>>> Last night, I attended another BetaNYC Hackathon in Brooklyn, where 
I
>>>>>> met another group of passionate citizens developing, and learning 
to
>>>>>> develop, fascinating apps for Smarter Cities.  This week we were 
about 15
>>>>>> people in the room, and we started with a lightning round of "what 
are you
>>>>>> working on" descriptions from project leads.  There were only three 
people
>>>>>> in the room who had participated in the hackathon the week prior, 
and this
>>>>>> is pretty normal.  BetaNYC has 1600 developers registered in their 
network
>>>>>> and every week coders rotate in and out of meetups and projects in 
an
>>>>>> endless and unplanned cycle that continuously inspires creativity 
and
>>>>>> motivation by showcasing new projects.
>>>>>>
>>>>>>
>>>>>>
>>>>>> The first project we heard about came from a local nonprofit called 
*Tomorrow
>>>>>> Lab* <http://tomorrow-lab.com/>, who have designed hardware that
>>>>>> measures how many bikes travel on streets they measure.  It uses 
simple
>>>>>> hardware and open source software that connects two sensors with a
>>>>>> pneumatic tube that measures impressions for weight and axel 
distance that
>>>>>> differentiates between bikes and cars.  Its called WayCount.  The 
text
>>>>>> below is from their website.  In the room we discussed how WayCount 
data
>>>>>> could be combined with NYPD crash reports to more accurately 
identify the
>>>>>> spots in NYC where bike accidents per bike numbers occur and 
identify ways
>>>>>> to remediate.
>>>>>>
>>>>>> WayCount is a platform for crowd-sourcing massive amounts of near
>>>>>> real-time automobile and bicycle traffic data from a nodal network 
of
>>>>>> inexpensive hardware devices.   For the first time ever, you can 
gather
>>>>>> accurate volume, rate, and speed measurements of automobiles and 
bicycles,
>>>>>> then easily upload and map the information to a central online 
database.
>>>>>>  The WayCount device works like other traffic counters, but has two 
key
>>>>>> differences: lower cost and open data. At 1/5th price of the least
>>>>>> expensive comparible product, WayCount is affordable. The WayCount 
Data
>>>>>> Uploader allows you to seamlessly upload and map your latest 
traffic count
>>>>>> data, making it instantly available to anyone online.
>>>>>> Collectively, the WayCount user community has the potential to 
build a
>>>>>> rich repository of traffic count data for bike paths, city alley 
ways,
>>>>>> neighborhood streets, and busy boulevards from around the world. 
With a
>>>>>> better understanding of automobile and bicycle ridership patterns, 
we can
>>>>>> inform the design of better cities and towns.
>>>>>>
>>>>>> The WayCount platform is an important addition to the process of
>>>>>> measuring the impact of transportation design, and creating livable 
streets
>>>>>> by adding bicycle lanes, public spaces, and developing smart 
transportation
>>>>>> management systems. By creating open-data, we can increase 
governmental
>>>>>> transparency, and provide constituencies with the essential data 
they need
>>>>>> to advocate for rational and necessary improvements to the design,
>>>>>> maintenance, and policy of transportation systems.
>>>>>>
>>>>>> The hardware and software of the WayCount device and website were
>>>>>> designed and engineered by Tomorrow Lab.
>>>>>>
>>>>>> WayCount devices are currently for sale on the website, 
*WayCount.com*<http://waycount.com/>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> We also discussed some ideas to provide policy makers with better
>>>>>> sources of Open Data to guide policy discussions, and then broke up 
into
>>>>>> four groups focusing on different projects.  One group discussed 
how to
>>>>>> save the New York Library on 42nd Street from the imminent 
transformation
>>>>>> of its main reading room and function as a lending library. 
 Another group
>>>>>> scraped web pages for NYPD crash data for an app comparing accident 
rates
>>>>>> across the 5 boroughs.  Some people just spent time talking about 
who they
>>>>>> are and what they want to work on, what they want to learn, and how 
to get
>>>>>> more involved.
>>>>>>
>>>>>> I spent an hour with a young programmer who had worked on the NYC
>>>>>> Property Tax Map I shared with you last week.  He showed me a 
Chrome Plugin
>>>>>> he is working on that provides data about leading politicians 
whenever
>>>>>> their names are mentioned on a webpage.  It is called Data Explorer 
for US
>>>>>> Politics and it provides some nifty data on things like campaign
>>>>>> contributions compared to committee assignments.
>>>>>>
>>>>>>
>>>>>>
>>>>>> I asked him where he got his data and he showed me *DBpedia*<
http://dbpedia.org/About>,
>>>>>> which "is a crowd-sourced community effort to extract structured
>>>>>> information from *Wikipedia* <http://wikipedia.org/> and make this
>>>>>> information available on the Web. DBpedia allows you to ask 
sophisticated
>>>>>> queries against Wikipedia, and to link the different data sets on 
the
>>>>>> Web to Wikipedia data. We hope that this work will make it easier 
for the
>>>>>> huge amount of information in Wikipedia to be used in some new 
interesting
>>>>>> ways. Furthermore, it might inspire new mechanisms for navigating, 
linking,
>>>>>> and improving the encyclopedia itself. "
>>>>>>
>>>>>> Then I asked him how he knows that DBpedia data is accurate and 
reliable
>>>>>> and he just looked at me.  "It's on the internet..."  Yeah, and so 
where
>>>>>> weapons of mass destruction in Iraq.  But they were only on the 
internet
>>>>>> and never in Iraq.  And herein lies a huge problem about Open Data 
on the
>>>>>> Web; there is no corroboration of fact, no metadata describing 
where it
>>>>>> came from, how it was derived, calculated, presented.  No one 
attests to
>>>>>> its veracity, yet we all use it on faith which just ain't good 
enough.
>>>>>>
>>>>>> This is why we have the *W3C Data on the Web Best Practices Working
>>>>>> Group* <https://www.w3.org/2013/dwbp/wiki/Main_Page> - to create 
new
>>>>>> vocabulary and metadata standards that attach citations and 
lineage,
>>>>>> attestations and data quality metrics to Open Data so that everyone 
can
>>>>>> understand where it came from, how much to trust it, and even how 
to
>>>>>> improve it.
>>>>>>
>>>>>> At the end of the evening, we also discussed IBM Smarter Cities, 
the
>>>>>> Portland System Dynamics Demo, and the possibility of hosting a 
BetaNYC
>>>>>> meetup at IBM on 590 Madison Avenue.  It was a fascinating evening 
and I
>>>>>> encourage all to check out the links provided in this writeup and 
get out
>>>>>> and join a meetup near you.
>>>>>>
>>>>>> Talk to you tomorrow.
>>>>>>
>>>>>> Best Regards,
>>>>>>
>>>>>> Steve
>>>>>>
>>>>>> Motto: "Do First, Think, Do it Again"
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>>
>>>>>> Ig Ibert Bittencourt
>>>>>> Professor Adjunto III - Universidade Federal de Alagoas (UFAL)
>>>>>> Vice-Coordenador da Comissão Especial de Informática na Educação
>>>>>> Líder do Centro de Excelência em Tecnologias Sociais
>>>>>> Co-fundador da Startup MeuTutor Soluções Educacionais LTDA.
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Onderzoeker
>>>>>> *+31(0)6 14576494* <%2B31%280%296%2014576494>
>>>>>> *christophe.gueret@dans.knaw.nl* <christophe.gueret@dans.knaw.nl>
>>>>>>
>>>>>> *Data Archiving and Networked Services (DANS)*
>>>>>> DANS bevordert duurzame toegang tot digitale onderzoeksgegevens. 
Kijk op
>>>>>> *www.dans.knaw.nl* <http://www.dans.knaw.nl/> voor meer informatie.
>>>>>> DANS is een instituut van KNAW en NWO.
>>>>>>
>>>>>> Let op, per 1 januari hebben we een nieuw adres:
>>>>>> DANS | Anna van Saksenlaan 51 | 2593 HW Den Haag | Postbus 93067 | 
2509
>>>>>> AB Den Haag | *+31 70 349 44 50* <%2B31%2070%20349%2044%2050> |
>>>>>> *info@dans.knaw.nl* <info@dans.kn> | www.dans.knaw.nl
>>>>>>
>>>>>> *Let's build a World Wide Semantic Web!*
>>>>>> *http://worldwidesemanticweb.org/* <
http://worldwidesemanticweb.org/>
>>>>>>
>>>>>> * e-Humanities Group (KNAW)*
>>>>>>  <http://www.ehumanities.nl/>
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>>
>>>>>> Ig Ibert Bittencourt
>>>>>> Professor Adjunto III - Universidade Federal de Alagoas (UFAL)
>>>>>> Vice-Coordenador da Comissão Especial de Informática na Educação
>>>>>> Líder do Centro de Excelência em Tecnologias Sociais
>>>>>> Co-fundador da Startup MeuTutor Soluções Educacionais LTDA.
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Bernadette Farias Lóscio
>>>>> Centro de Informática
>>>>> Universidade Federal de Pernambuco - UFPE, Brazil
>>>>> 
----------------------------------------------------------------------------
>>>>>
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>>
>>>> Ig Ibert Bittencourt
>>>>  Professor Adjunto III - Universidade Federal de Alagoas (UFAL)
>>>> Vice-Coordenador da Comissão Especial de Informática na Educação
>>>> Líder do Centro de Excelência em Tecnologias Sociais
>>>> Co-fundador da Startup MeuTutor Soluções Educacionais LTDA.
>>>>
>>>
>>>
>>>
>>
>> --
>>
>>
>> Phil Archer
>> W3C Data Activity Lead
>> http://www.w3.org/2013/data/
>>
>> http://philarcher.org
>> +44 (0)7887 767755
>> @philarcher1
>>
>
>
> ----
> Ivan Herman, W3C
> Digital Publishing Activity Lead
> Home: http://www.w3.org/People/Ivan/
> mobile: +31-641044153
> GPG: 0x343F1A3D
> FOAF: http://www.ivan-herman.net/foaf
>
>
>
>
>




-- 

Ig Ibert Bittencourt 
Professor Adjunto III - Universidade Federal de Alagoas (UFAL) 
Vice-Coordenador da Comissão Especial de Informática na Educação 
Líder do Centro de Excelência em Tecnologias Sociais 
Co-fundador da Startup MeuTutor Soluções Educacionais LTDA. 




-- 

Ig Ibert Bittencourt
Professor Adjunto III - Universidade Federal de Alagoas (UFAL)
Vice-Coordenador da Comissão Especial de Informática na Educação
Líder do Centro de Excelência em Tecnologias Sociais
Co-fundador da Startup MeuTutor Soluções Educacionais LTDA.
Received on Tuesday, 18 March 2014 20:31:35 UTC