Re: DBPedia (was Re: Use Case: BetaNYC 3/5) from Ig Ibert Bittencourt on 2014-03-13 (public-dwbp-wg@w3.org from March 2014)

From: Ig Ibert Bittencourt <ig.ibert@gmail.com>
Date: Thu, 13 Mar 2014 09:59:09 -0300
To: Eric Stephan <ericphb@gmail.com>
Cc: Ivan Herman <ivan@w3.org>, Phil Archer <phila@w3.org>, Public DWBP WG <public-dwbp-wg@w3.org>
Message-ID: <CAKNDvRXVRmux-FxCb7b4p6vUowRFuxnH3aAVJ=nvCuah1H6p9Q@mail.gmail.com>
Hi Phil, +1

That's excellent. This meeting in Athens is a great opportunity.

Ghislain, thank you for sending to SWJ paper.

Best,
Ig




2014-03-12 9:42 GMT-03:00 Eric Stephan <ericphb@gmail.com>:

> I was working on the CSV working group use cases and trying to catch up :-)
>
> >> - how can we describe the veracity and reliability of crowd-sourced
> data in the DCAT extension?
>
> Phil - very interesting, I have to admit I haven't looked into the
> background of the creators of DBPedia.
>
> >> I will take the opportunity to seek their views on making use of
> additional vocabs in the ongoing work.
>
> This would be fascinating to hear their perspective.
>
> Eric
>
>
> On Wed, Mar 12, 2014 at 4:56 AM, Ivan Herman <ivan@w3.org> wrote:
> >
> > On 12 Mar 2014, at 09:34 , Phil Archer <phila@w3.org> wrote:
> >
> >> Picking up on Antoine's much appreciated comment that is is important
> to be disciplined on mailing lists so that the subject matter is clear,
> I've made the subject line here more explicit.
> >>
> >> No one knows more about the Linked Data vocabulary landscape than the
> creators of DBPedia*. The project is evolving in various ways (I understand
> that there's now a new legal entity supporting it for example) and I'll see
> several of the 'DBPedians' next week in Athens. I will take the opportunity
> to seek their views on making use of additional vocabs in the ongoing work.
> >>
> >> The issues as I see them would be:
> >> - given that DBPedia is auto-generated from Wikipedia, how realistic is
> it to make use of other vocabs?
> >
> > Without getting into the other issues below: the dbpedia mapping is not
> just a blind dump. They do mappings to other vocabularies, they actually
> generate their own vocabulary for many things (that they reuse in the
> data), etc. Ie, if somebody convinces them of the advantages of a
> particular vocabulary, it is certainly technically possible...
> >
> > (They actually have a mapping language that was documented in a paper
> somewhere, some sort of a precursor of R2RML, that can be easily modified
> if needed.)
> >
> > Ivan
> >
> >> - would their use be semantically accurate?
> >> - if so, would the benefit of adding the extra triples outweigh the
> disadvantage of increasing the number of triples without actually
> increasing the informational content?
> >>
> >> These challenges/questions would apply to any existing large scale
> dataset. One that comes specifically from DBPedia:
> >>
> >> - how can we describe the veracity and reliability of crowd-sourced
> data in the DCAT extension?
> >>
> >> WDYT?
> >>
> >> Phil
> >>
> >> * I assume everyone is familiar with DBpedia. If not, please do some
> background reading - it is *the* seminal work in Data on the Web and is
> closely associated with TimBL's original principles of Linked Data and the
> 5 stars of Linked Open Data. What may be less well known, especially to
> non-European members, is that its creators are extremely well known in the
> Linked Data community (people like Soren Auer, Chris Bizer, Sebastian
> Hellmann etc.) DBPedia is at the centre of the LOD cloud diagram that,
> along with Anja Jentsch, Richard Cyganiak created while working on DBPedia
> (Richard named it and owns the domain name). He was also one of the
> originators of DCAT and has been an active member of many W3C WGs for many
> years, including the one that standardised the DCAT, ORG and QB
> vocabularies.
> >>
> >>
> >> On 11/03/2014 11:51, Bernadette Farias Lóscio wrote:
> >>> Hi Ig and Steve,
> >>>
> >>> I also think that this is a good idea! I also agree that the most
> important
> >>>  task is related to use a vocab to foster trust and to describe
> >>> metadata(schema).
> >>>
> >>> This is not an easy task, but I think this is a plausible one!
> However, it
> >>> is important to keep in mind what kind of description could be
> interesting
> >>> considering that the descriptions can be related with the whole
> dataset but
> >>> also with some specific concepts. Does it make sense to you?
> >>>
> >>> Cheers,
> >>> Bernadette
> >>>
> >>>
> >>> 2014-03-10 19:34 GMT-03:00 Ig Ibert Bittencourt <ig.ibert@gmail.com>:
> >>>
> >>>> Hi Bernadette,
> >>>>
> >>>> Thanks.
> >>>>
> >>>> Yes. I know DBPedia provides an ontology, but as far as I know, it
> reuses
> >>>> some vocabs (e.g. FOAF, Schema.org and Bibo) but few annotations
> about the
> >>>> Classes are provided, such as rdfs:label and rdfs:comment. However,
> nothing
> >>>> related to metadata describing where came from or how it was derived,
> and
> >>>> so on (see first e-mail).
> >>>>
> >>>> So, I am talking vocabs like DC, Org (perharps aligning with
> schema.org)
> >>>> and BIBO (extending the use). But I think the most important is to
> use a
> >>>> vocab to foster trust. This is directly connect to the Quality and
> >>>> Granularity Description Vocabulary (again, see the charter). That's
> why I
> >>>> think a use case describing it could be interesting.
> >>>>
> >>>> Please, let me know if is plausible or not.
> >>>>
> >>>> All the best,
> >>>> Ig
> >>>>
> >>>>
> >>>>
> >>>> 2014-03-10 17:35 GMT-03:00 Bernadette Farias Lóscio <bfl@cin.ufpe.br
> >:
> >>>>
> >>>> Hi Ig,
> >>>>>
> >>>>> DBpedia already uses a cross-domain ontology [1] to describe the
> concepts
> >>>>> and relationships available in the DBpedia dataset. In this case,
> what kind
> >>>>> of vocabs do you think that could be useful to use together with
> DBpedia?
> >>>>> Could you please give some examples?
> >>>>>
> >>>>> Thanks!
> >>>>>
> >>>>> Cheers,
> >>>>> Bernadette
> >>>>>
> >>>>> [1] http://wiki.dbpedia.org/Ontology
> >>>>>
> >>>>>
> >>>>>
> >>>>> 2014-03-10 14:21 GMT-03:00 Steven Adler <adler1@us.ibm.com>:
> >>>>>
> >>>>> So lets talk to DBpedia about that.  They already use RDF ...
> >>>>>>
> >>>>>> http://wiki.dbpedia.org/Datasets
> >>>>>>
> >>>>>>
> >>>>>> Best Regards,
> >>>>>>
> >>>>>> Steve
> >>>>>>
> >>>>>> Motto: "Do First, Think, Do it Again"
> >>>>>>
> >>>>>>
> >>>>>>  From: Ig Ibert Bittencourt <ig.ibert@gmail.com> To: Christophe
> Guéret <
> >>>>>> christophe.gueret@dans.knaw.nl> Cc: Steven Adler/Somers/IBM@IBMUS,
> >>>>>> Public DWBP WG <public-dwbp-wg@w3.org> Date: 03/10/2014 10:42 AM
> >>>>>> Subject: Re: Use Case: BetaNYC 3/5
> >>>>>> ------------------------------
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> Hi Christophe,
> >>>>>>
> >>>>>> Thank you for your answer.
> >>>>>>
> >>>>>> You are right and I think that's the Steve's proposal to get
> DBpedia to
> >>>>>> use the vocabs and build a use case on that. For example, one
> discussion in
> >>>>>> this way is happening in the Public GLD is in this way [1].
> >>>>>>
> >>>>>> Well, perhaps it is still early, but one point for suggesting about
> the
> >>>>>> use of the vocabs is because we are going to propose an extension
> of DCAT
> >>>>>> [2] (according to the charter [3]) to Quality and Granularity
> Description
> >>>>>> Vocabulary. Maybe this is not the best way, but I believe we need
> to deeply
> >>>>>> understand such vocabs.
> >>>>>>
> >>>>>> All the Best,
> >>>>>> Ig
> >>>>>>
> >>>>>> [1] *
> http://lists.w3.org/Archives/Public/public-gld-comments/2014Mar/*<
> http://lists.w3.org/Archives/Public/public-gld-comments/2014Mar/>
> >>>>>> [2] *http://www.w3.org/TR/vocab-dcat/*<
> http://www.w3.org/TR/vocab-dcat/>
> >>>>>> [3] *http://www.w3.org/2013/05/odbp-charter*<
> http://www.w3.org/2013/05/odbp-charter>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> 2014-03-10 6:54 GMT-03:00 Christophe Guéret <
> >>>>>> *christophe.gueret@dans.knaw.nl* <christophe.gueret@dans.knaw.nl>>:
> >>>>>> Hoi,
> >>>>>>
> >>>>>> Don't you think we should create some use cases focused on the
> usage of
> >>>>>> PROV-O, QB, DCAT, ORG... ?
> >>>>>> This sounds a bit awkward to me. I would have expected that the
> usage of
> >>>>>> the vocabulary would be derived from the use-cases, and not the
> inverse.
> >>>>>> If we make up use-cases to the aim of illustrating some best
> practices
> >>>>>> these BP may be disconnected from the concrete happenings...
> >>>>>> Rather, if we would like an existing use-case to use some vocabulary
> >>>>>> instead of something of their own we can suggest this change and
> try to get
> >>>>>> it implemented, and/or understand why this situation exists.
> >>>>>>
> >>>>>> Cheers,
> >>>>>> Christophe
> >>>>>>
> >>>>>>
> >>>>>> Best,
> >>>>>> Ig
> >>>>>>
> >>>>>>
> >>>>>> 2014-03-06 12:51 GMT-03:00 Steven Adler <*adler1@us.ibm.com*<
> adler1@us.ibm.com>
> >>>>>>> :
> >>>>>>
> >>>>>> Last night, I attended another BetaNYC Hackathon in Brooklyn, where
> I
> >>>>>> met another group of passionate citizens developing, and learning to
> >>>>>> develop, fascinating apps for Smarter Cities.  This week we were
> about 15
> >>>>>> people in the room, and we started with a lightning round of "what
> are you
> >>>>>> working on" descriptions from project leads.  There were only three
> people
> >>>>>> in the room who had participated in the hackathon the week prior,
> and this
> >>>>>> is pretty normal.  BetaNYC has 1600 developers registered in their
> network
> >>>>>> and every week coders rotate in and out of meetups and projects in
> an
> >>>>>> endless and unplanned cycle that continuously inspires creativity
> and
> >>>>>> motivation by showcasing new projects.
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> The first project we heard about came from a local nonprofit called
> *Tomorrow
> >>>>>> Lab* <http://tomorrow-lab.com/>, who have designed hardware that
> >>>>>> measures how many bikes travel on streets they measure.  It uses
> simple
> >>>>>> hardware and open source software that connects two sensors with a
> >>>>>> pneumatic tube that measures impressions for weight and axel
> distance that
> >>>>>> differentiates between bikes and cars.  Its called WayCount.  The
> text
> >>>>>> below is from their website.  In the room we discussed how WayCount
> data
> >>>>>> could be combined with NYPD crash reports to more accurately
> identify the
> >>>>>> spots in NYC where bike accidents per bike numbers occur and
> identify ways
> >>>>>> to remediate.
> >>>>>>
> >>>>>> WayCount is a platform for crowd-sourcing massive amounts of near
> >>>>>> real-time automobile and bicycle traffic data from a nodal network
> of
> >>>>>> inexpensive hardware devices.   For the first time ever, you can
> gather
> >>>>>> accurate volume, rate, and speed measurements of automobiles and
> bicycles,
> >>>>>> then easily upload and map the information to a central online
> database.
> >>>>>>  The WayCount device works like other traffic counters, but has two
> key
> >>>>>> differences: lower cost and open data. At 1/5th price of the least
> >>>>>> expensive comparible product, WayCount is affordable. The WayCount
> Data
> >>>>>> Uploader allows you to seamlessly upload and map your latest
> traffic count
> >>>>>> data, making it instantly available to anyone online.
> >>>>>> Collectively, the WayCount user community has the potential to
> build a
> >>>>>> rich repository of traffic count data for bike paths, city alley
> ways,
> >>>>>> neighborhood streets, and busy boulevards from around the world.
> With a
> >>>>>> better understanding of automobile and bicycle ridership patterns,
> we can
> >>>>>> inform the design of better cities and towns.
> >>>>>>
> >>>>>> The WayCount platform is an important addition to the process of
> >>>>>> measuring the impact of transportation design, and creating livable
> streets
> >>>>>> by adding bicycle lanes, public spaces, and developing smart
> transportation
> >>>>>> management systems. By creating open-data, we can increase
> governmental
> >>>>>> transparency, and provide constituencies with the essential data
> they need
> >>>>>> to advocate for rational and necessary improvements to the design,
> >>>>>> maintenance, and policy of transportation systems.
> >>>>>>
> >>>>>> The hardware and software of the WayCount device and website were
> >>>>>> designed and engineered by Tomorrow Lab.
> >>>>>>
> >>>>>> WayCount devices are currently for sale on the website,
> *WayCount.com*<http://waycount.com/>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> We also discussed some ideas to provide policy makers with better
> >>>>>> sources of Open Data to guide policy discussions, and then broke up
> into
> >>>>>> four groups focusing on different projects.  One group discussed
> how to
> >>>>>> save the New York Library on 42nd Street from the imminent
> transformation
> >>>>>> of its main reading room and function as a lending library.
>  Another group
> >>>>>> scraped web pages for NYPD crash data for an app comparing accident
> rates
> >>>>>> across the 5 boroughs.  Some people just spent time talking about
> who they
> >>>>>> are and what they want to work on, what they want to learn, and how
> to get
> >>>>>> more involved.
> >>>>>>
> >>>>>> I spent an hour with a young programmer who had worked on the NYC
> >>>>>> Property Tax Map I shared with you last week.  He showed me a
> Chrome Plugin
> >>>>>> he is working on that provides data about leading politicians
> whenever
> >>>>>> their names are mentioned on a webpage.  It is called Data Explorer
> for US
> >>>>>> Politics and it provides some nifty data on things like campaign
> >>>>>> contributions compared to committee assignments.
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> I asked him where he got his data and he showed me *DBpedia*<
> http://dbpedia.org/About>,
> >>>>>> which "is a crowd-sourced community effort to extract structured
> >>>>>> information from *Wikipedia* <http://wikipedia.org/> and make this
> >>>>>> information available on the Web. DBpedia allows you to ask
> sophisticated
> >>>>>> queries against Wikipedia, and to link the different data sets on
> the
> >>>>>> Web to Wikipedia data. We hope that this work will make it easier
> for the
> >>>>>> huge amount of information in Wikipedia to be used in some new
> interesting
> >>>>>> ways. Furthermore, it might inspire new mechanisms for navigating,
> linking,
> >>>>>> and improving the encyclopedia itself. "
> >>>>>>
> >>>>>> Then I asked him how he knows that DBpedia data is accurate and
> reliable
> >>>>>> and he just looked at me.  "It's on the internet..."  Yeah, and so
> where
> >>>>>> weapons of mass destruction in Iraq.  But they were only on the
> internet
> >>>>>> and never in Iraq.  And herein lies a huge problem about Open Data
> on the
> >>>>>> Web; there is no corroboration of fact, no metadata describing
> where it
> >>>>>> came from, how it was derived, calculated, presented.  No one
> attests to
> >>>>>> its veracity, yet we all use it on faith which just ain't good
> enough.
> >>>>>>
> >>>>>> This is why we have the *W3C Data on the Web Best Practices Working
> >>>>>> Group* <https://www.w3.org/2013/dwbp/wiki/Main_Page> - to create
> new
> >>>>>> vocabulary and metadata standards that attach citations and lineage,
> >>>>>> attestations and data quality metrics to Open Data so that everyone
> can
> >>>>>> understand where it came from, how much to trust it, and even how to
> >>>>>> improve it.
> >>>>>>
> >>>>>> At the end of the evening, we also discussed IBM Smarter Cities, the
> >>>>>> Portland System Dynamics Demo, and the possibility of hosting a
> BetaNYC
> >>>>>> meetup at IBM on 590 Madison Avenue.  It was a fascinating evening
> and I
> >>>>>> encourage all to check out the links provided in this writeup and
> get out
> >>>>>> and join a meetup near you.
> >>>>>>
> >>>>>> Talk to you tomorrow.
> >>>>>>
> >>>>>> Best Regards,
> >>>>>>
> >>>>>> Steve
> >>>>>>
> >>>>>> Motto: "Do First, Think, Do it Again"
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> --
> >>>>>>
> >>>>>> Ig Ibert Bittencourt
> >>>>>> Professor Adjunto III - Universidade Federal de Alagoas (UFAL)
> >>>>>> Vice-Coordenador da Comissão Especial de Informática na Educação
> >>>>>> Líder do Centro de Excelência em Tecnologias Sociais
> >>>>>> Co-fundador da Startup MeuTutor Soluções Educacionais LTDA.
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> --
> >>>>>> Onderzoeker
> >>>>>> *+31(0)6 14576494* <%2B31%280%296%2014576494>
> >>>>>> *christophe.gueret@dans.knaw.nl* <christophe.gueret@dans.knaw.nl>
> >>>>>>
> >>>>>> *Data Archiving and Networked Services (DANS)*
> >>>>>> DANS bevordert duurzame toegang tot digitale onderzoeksgegevens.
> Kijk op
> >>>>>> *www.dans.knaw.nl* <http://www.dans.knaw.nl/> voor meer informatie.
> >>>>>> DANS is een instituut van KNAW en NWO.
> >>>>>>
> >>>>>> Let op, per 1 januari hebben we een nieuw adres:
> >>>>>> DANS | Anna van Saksenlaan 51 | 2593 HW Den Haag | Postbus 93067 |
> 2509
> >>>>>> AB Den Haag | *+31 70 349 44 50* <%2B31%2070%20349%2044%2050> |
> >>>>>> *info@dans.knaw.nl* <info@dans.kn> | www.dans.knaw.nl
> >>>>>>
> >>>>>> *Let's build a World Wide Semantic Web!*
> >>>>>> *http://worldwidesemanticweb.org/* <
> http://worldwidesemanticweb.org/>
> >>>>>>
> >>>>>> * e-Humanities Group (KNAW)*
> >>>>>>  <http://www.ehumanities.nl/>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> --
> >>>>>>
> >>>>>> Ig Ibert Bittencourt
> >>>>>> Professor Adjunto III - Universidade Federal de Alagoas (UFAL)
> >>>>>> Vice-Coordenador da Comissão Especial de Informática na Educação
> >>>>>> Líder do Centro de Excelência em Tecnologias Sociais
> >>>>>> Co-fundador da Startup MeuTutor Soluções Educacionais LTDA.
> >>>>>>
> >>>>>>
> >>>>>
> >>>>>
> >>>>> --
> >>>>> Bernadette Farias Lóscio
> >>>>> Centro de Informática
> >>>>> Universidade Federal de Pernambuco - UFPE, Brazil
> >>>>>
> ----------------------------------------------------------------------------
> >>>>>
> >>>>>
> >>>>
> >>>>
> >>>>
> >>>> --
> >>>>
> >>>> Ig Ibert Bittencourt
> >>>>  Professor Adjunto III - Universidade Federal de Alagoas (UFAL)
> >>>> Vice-Coordenador da Comissão Especial de Informática na Educação
> >>>> Líder do Centro de Excelência em Tecnologias Sociais
> >>>> Co-fundador da Startup MeuTutor Soluções Educacionais LTDA.
> >>>>
> >>>
> >>>
> >>>
> >>
> >> --
> >>
> >>
> >> Phil Archer
> >> W3C Data Activity Lead
> >> http://www.w3.org/2013/data/
> >>
> >> http://philarcher.org
> >> +44 (0)7887 767755
> >> @philarcher1
> >>
> >
> >
> > ----
> > Ivan Herman, W3C
> > Digital Publishing Activity Lead
> > Home: http://www.w3.org/People/Ivan/
> > mobile: +31-641044153
> > GPG: 0x343F1A3D
> > FOAF: http://www.ivan-herman.net/foaf
> >
> >
> >
> >
> >
>
>


-- 

Ig Ibert Bittencourt
Professor Adjunto III - Universidade Federal de Alagoas (UFAL)
Vice-Coordenador da Comissão Especial de Informática na Educação
Líder do Centro de Excelência em Tecnologias Sociais
Co-fundador da Startup MeuTutor Soluções Educacionais LTDA.
Received on Thursday, 13 March 2014 12:59:59 UTC