Re: [Linked Life Data] Minutes from May 21 Janos Hajagos presentation

Attached are formatted minutes.  I don't know why they weren't generated
as expected.  The only weird thing I noticed in the log was one line
that said:

  <Jun> <mscottm, I have to leave for another meeting>

Possibly the angle brackets around "mscottm, I have to leave for another
meeting" confused the minutes generator, since angle brackets are
normally only expected around IRC names.

David


On Tue, 2012-05-22 at 07:49 +0200, M. Scott Marshall wrote:
> Hi Eric,
> 
> Something seems to have gone wrong with the generation of the minutes
> again. Could you please take a look at it?
> I am pasting the log from my mIRC buffer below for those who want to
> take a look in the meantime.
> 
> Cheers,
> Scott
> 
> RRSagent, draft minutes
> <RRSAgent> I have made the request to generate
> http://www.w3.org/2012/05/21-HCLS-minutes.html mscottm
> <mscottm>  rrsagent, make log world-visible
> <RRSAgent> I have made the request, mscottm
> 
> <Zakim> On the phone I see Tony, +1.510.705.aaaa, tlebo,
> +1.631.444.aabb, ??P9, Scott_Marshall, ??P1, +46.7.08.13.aadd, ??P13,
> Chimezie
> <Zakim> On IRC I see ram, matthias_samwald, Jun, amrapali, RRSAgent,
> Zakim, Janos, egombocz, rkiefer, bbalsa, achille_zappa, mscottm, ericP
> * matthias_samwald1 (m@128.131.167.8) has joined #hcls
> <Janos> http://dl.dropbox.com/u/21690634/Quantifying%20RDF%20data%20sets.pdf
> <Zakim> +??P15
> <Zakim> + +1.412.623.aaee
> * BrianLowe (8054fe2e@78.129.202.38) has joined #hcls
> * HarryH (8231b338@64.62.228.82) has joined #hcls
> * chimezie (chimezie@99.59.109.32) has joined #hcls
> <chimezie> Zakim, who is on the phone?
> <Zakim> On the phone I see Tony, +1.510.705.aaaa, tlebo,
> +1.631.444.aabb, ??P9, Scott_Marshall, ??P1, +46.7.08.13.aadd, ??P13,
> Chimezie, ??P15, +1.412.623.aaee
> * matthias_samwald (m@149.148.241.249) Quit (Ping timeout)
> <HarryH> 412 623  is me - Harry Hochheiser Pittsburgh
> <mscottm> Brian Lowe: Developer on VIVO project, Susan Mitchell also
> works as developer / ontology on VIVO
> <mscottm> Harry Hochheiser - University of Pittsburgh, interested in HCLS
> <mscottm> Brian Lowe: Developer on VIVO project, Stella Mitchell also
> works as developer / ontology on VIVO
> * Stella (80fd5743@78.129.202.38) has joined #hcls
> <ram> Ram from Metaome - We have a life science search engine called
> DistilBio (distilbio.com)
> <Jun> scribe: Jun
> * michael (d17cbd27@64.62.228.82) has joined #hcls
> <Jun> s/Susan/Stella/
> <Zakim> + +1.206.732.aaff
> <mscottm> Chimezie Ogbuji - Cleveland Clinic, Case Western, Recently
> started a startup
> <Janos> http://dl.dropbox.com/u/21690634/Quantifying%20RDF%20data%20sets.pdf
> <Zakim> + +1.857.250.aagg
> <chimezie> Zakim, mute me
> <Zakim> Chimezie should now be muted
> <Jun> Scott: introduce Janos' talk: it's important to differentiate
> RDF datasets apart from by their content, licenses, etc
> * mattgamble (801e06c7@64.62.228.82) has joined #hcls
> <mscottm> VIVO - scientific research network ontology
> <Jun> Janos: one of the members of CTSA Connect graduate programme, to
> connect two major ontologies, VIVO and ***, to connect clinical
> sciences data
> <chimezie> yes, I do
> <Zakim> +Tony.a
> <Janos> http://dl.dropbox.com/u/21690634/Quantifying%20RDF%20data%20sets.pdf
> * BobF (81b0c518@78.129.202.38) has joined #hcls
> <Jun> Slide 1: a lot of further work. this just presents a start
> <Jun> slide 2
> <Jun> Janos: Semantic Web is based on RDF, a graph-based data model
> <mscottm> CTSA Connect: http://www.ctsaconnect.org/about-us
> <Jun> ... more flexible than relational DBs by allowing parallel edges
> <Jun> slide 3
> <Jun> Janos: a paper submitted to the Triple Challenge 2010
> <Jun> ... they did some quantification of datasets, looking into the
> internal structure of the data
> <Jun> .... drew some of the approaches of this paper
> <Jun> ... took a look of the datasets of the challenge, and did some
> structural analysis and others
> <Jun> slide 4
> * AmitSheth (826c0136@64.62.228.82) has joined #hcls
> <Jun> Janos: a basic python library to parse n-triples. it's a memory
> based approach, and do some processing. based on PyPy
> * AmitSheth (826c0136@64.62.228.82) Quit (Quit: http://www.mibbit.com
> ajax IRC Client)
> <Jun> .... PyPy for just-in-time compiling. speed up the processing
> * Amit (826c0136@109.169.29.95) has joined #hcls
> <Amit> conference is full! cannot join by voice
> <Jun> .... just some basic statistical analysis, then started to do
> some pattern matching analysis. not by using SPARQL endpoint
> <Jun> ... each file is treated as its own graph. didn't use Named Graphs
> <Jun> Q: on scalability
> <Jun> Janos: largest one is LinkedCT
> <Jun> ... 28 millions triples. took 30% of a 64G memory
> <Jun> ... SPARQL1.1 might provide better performance promises
> <Jun> slide 5
> <Jun> ... started with some basic counts
> <Jun> slide 6
> <Jun> Janos: do some simple fractions calculations
> <Jun> ... e.g, how many literals in your triples
> <Jun> ... how many literals are unique?
> <Jun> ... how many objects are unique?
> <Jun> ... structure measurement, by taking out the typing sort of
> information and literals
> * egonw_ (egonw@145.20.139.203) has joined #HCLS
> <Jun> ... subject/object coverage, more pointing or more pointed?
> <Jun> ... more concrete examples to follow
> <Jun> slide 7
> <mscottm> scribenick: Jun
> <Jun> Janos: computed it against a couple of LOD datasets, 4 of the
> LODD, DailyMed, LinkedCT, DrugbankRDF, RxNorm
> <Jun> ... BioGrid database: an open access DB on Protein and Genetic
> Interactions
> <Jun> ... BioPAX: pathways in BioPAX format
> <Jun> ... bioGrid can be downloaded via OWL format
> <Jun> ... VIVO: NIH funded project for scientific networking
> <Jun> .... got n-triples for VIVO dataset
> * amrapali (8b120872@64.62.228.82) Quit (Quit: http://www.mibbit.com
> ajax IRC Client)
> <Jun> ... go through by the number of triples desc
> <Jun> slide 8
> <Jun> Janos: top subjects, top classes, predicates, etc
> * markthompson (9158d121@78.129.202.38) has joined #hcls
> <Jun> ... give you a good idea of how people use ontologies
> * egonw_ (egonw@145.20.139.203) Quit (Ping timeout)
> <Jun> ... LinkCT: 40% are literals, objects have 80% repetition
> <Jun> ... three dominant classes
> <Jun> Michael: have you done this analysis on the GO ontology?
> <Jun> Janos: not yet
> <Jun> Michael: expecting more diverse coverage
> <Jun> Janos: would be interesting to look at
> <Jun> slide 9
> <Jun> Janos: BioGrid in BioPAX
> <Jun> ... 50MB in owl but 40 millions triples in n-triple format
> <Jun> ... again, subject, object coverage, and top classes. they are not LOD yet
> * egonw_ (egonw@145.20.139.203) has joined #HCLS
> <Jun> ... get a good sense of what's actually in the content
> <Jun> slide 10
> <Jun> Janos: RxNorm
> <Jun> ... only 6 classes. pretty small
> <Jun> ... quite a bit of literals. structure data is higher than other datasets
> <Jun> Q: do you see a big structure differences from these datasets?
> <Jun> Janos: TBD
> * egonw_ (egonw@145.20.139.203) Quit (Ping timeout)
> <Jun> slide 11
> <Jun> Janos: 1.2 million triples
> <Jun> ... data about publications, such as Authorship, Person ...
> <Zakim> -??P15
> <Jun> ... publication is dominant data source there. pretty good
> subject/object coverage
> <Jun> slide 12
> <Jun> Janos: it has a lot of links to outside datasets, have a much
> higher object coverage
> <Jun> slide 13
> <Jun> Janos: top predicate: owl:sameAs. again has a lot of links to
> outside datasets
> * matthias_samwald1 (m@128.131.167.8) Quit (Ping timeout)
> <Jun> Scott: any idea about how one type of matrix could be more
> useful than another, or searching for others?
> <Jun> s/Scott/mscottm/
> <Jun> slide 14
> <mscottm> s/matrix/metric/
> <Jun> Janos: there are a lot of tools for graph vis and analysis, but
> not so good with RDF data
> * ericP Zakim please dial ericP-office
> * ericP Zakim, please dial ericP-office
> * Zakim ok, ericP; the call is being made
> <Zakim> +EricP
> <Jun> slide 15
> <Jun> Janos: the twist is to allow multiple paths between 2 nodes
> <Jun> slide 16
> <Jun> Janos: there are ways to collapse the parallel edges, or put RDF
> into XML, in order to use some graph analysis tools
> <Jun> slide 17
> <Jun> Janos: show some examples
> <Jun> ... get co-authors that are only members of a site, to get a
> smaller co-author network
> <Jun> slide 18
> <Jun> Janos: do some basic graph analysis using Mathematica
> <Jun> ... basic in-degrees, out-degrees, histograms, one/two degree
> separation etc
> <mscottm> Nice!
> <Jun> slide 19
> <Jun> Janos: Gephi doesn't support parallel edges. you have to do some
> pre-processing
> <Zakim> -tlebo
> <Jun> slide 20
> <Jun> Janos:some links
> <Jun> q+ have you thought about encoding some of the statistics using VoID?
> * Zakim Jun, you typed too many words without commas; I suspect you
> forgot to start with 'to ...'
> <michael> thanks, janos, i need to drop off
> <Zakim> - +1.206.732.aaff
> * rkiefer (81b0c518@207.192.75.252) Quit (Quit: http://www.mibbit.com
> ajax IRC Client)
> <Zakim> -Tony
> <Jun> Eric: any further analysis on some of the results, like the
> social network?
> <Jun> <mscottm, I have to leave for another meeting>
> <mattgamble> First how do you work out which metrics are useful?
> * ericP away again
> <Zakim> -EricP
> <Zakim> - +46.7.08.13.aadd
> * bbalsa (d5723e05@109.169.29.95) Quit (Quit: http://www.mibbit.com
> ajax IRC Client)
> <egombocz> Our Knowledge Explorer also provides metrics for weighing
> of connections in several ways
> * BobF (81b0c518@78.129.202.38) Quit (Quit: http://www.mibbit.com ajax
> IRC Client)
> <Zakim> -??P1
> <chimezie> Zakim, unmute me
> <Zakim> Chimezie should no longer be muted
> <mscottm> Chime - would you please jot your comment/question into IRC?
> I received an urgent call exactly when you started.. :(
> <Zakim> -Tony.a
> <chimezie> My question was whether he had considered using rdflib
> (https://github.com/RDFLib)
> * BrianLowe (8054fe2e@78.129.202.38) Quit (Ping timeout)
> <Zakim> - +1.510.705.aaaa
> * egombocz (42758fa2@207.192.75.252) Quit (Quit: http://www.mibbit.com
> ajax IRC Client)
> * michael (d17cbd27@64.62.228.82) has left #hcls
> <Zakim> - +1.857.250.aagg
> * mattgamble (801e06c7@64.62.228.82) Quit (Quit: http://www.mibbit.com
> ajax IRC Client)
> * markthompson (9158d121@78.129.202.38) Quit (Quit:
> http://www.mibbit.com ajax IRC Client)
> * Jun (81431aef@78.129.202.38) Quit (Quit: http://www.mibbit.com ajax
> IRC Client)
> * Amit (826c0136@109.169.29.95) Quit (Quit: http://www.mibbit.com ajax
> IRC Client)
> <Zakim> -Chimezie
> * chimezie (chimezie@99.59.109.32) Quit (Quit: chimezie)
> <mscottm> CTSA Connect - ISF - Integrated Semantic Framework: core is
> combining VIVO ontology and eagle-i ontology
> <HarryH> Thanks , Janos - very interesting!
> <ram> Thanks Janos
> * HarryH (8231b338@64.62.228.82) Quit (Quit: http://www.mibbit.com
> ajax IRC Client)
> <Stella> thanks all, bye
> <Zakim> - +1.412.623.aaee
> * ram (31f97127@78.129.202.38) Quit (Quit: http://www.mibbit.com ajax
> IRC Client)
> <Zakim> -Scott_Marshall
> <Zakim> -??P13
> <Zakim> - +1.631.444.aabb
> <mscottm> bye all
> <Zakim> -??P9
> <Zakim> SW_HCLS(BioRDF)11:00AM has ended
> <Zakim> Attendees were Tony, +1.510.705.aaaa, tlebo, +1.631.444.aabb,
> +46.7.08.13.aacc, Scott_Marshall, +46.7.08.13.aadd, Chimezie,
> +1.412.623.aaee, +1.206.732.aaff, +1.857.250.aagg, EricP
> * Stella (80fd5743@78.129.202.38) Quit (Quit: http://www.mibbit.com
> ajax IRC Client)
> <mscottm> Zakim, please draft minutes
> <Zakim> I don't understand 'please draft minutes', mscottm
> <mscottm> RRSagent, draft minutes
> <RRSAgent> I have made the request to generate
> http://www.w3.org/2012/05/21-HCLS-minutes.html mscottm
> <mscottm>  rrsagent, make log world-visible
> <RRSAgent> I have made the request, mscottm
> 
> On Tue, May 22, 2012 at 1:22 AM, M. Scott Marshall
> <mscottmarshall@gmail.com> wrote:
> > Here are the minutes from today's meeting:
> > http://www.w3.org/2012/05/21-HCLS-minutes.html
> >
> > I see large potential for using tools like Janos's to describe /
> > characterize datasets and their RDF representations for selection and
> > eventual query patterns - something essential in a linked data
> > marketplace.
> >
> > My apologies for not anticipating more participants and increasing the
> > default parameter for Zakim. If you were unable to join and have
> > questions or comments after reading the minutes, please send them to
> > the list or I will do my best to answer (or find someone who can).
> >
> > Thanks to Jun for scribing! Thanks to Janos for presenting!
> >
> > Richard Boyce will present work to us in early June. Stay tuned for details.
> >
> > Cheers,
> > Scott
> 
> 
> 

-- 
David Booth, Ph.D.
http://dbooth.org/

Opinions expressed herein are those of the author and do not necessarily
reflect those of his employer.

Received on Tuesday, 22 May 2012 13:23:46 UTC