- From: M. Scott Marshall <mscottmarshall@gmail.com>
- Date: Tue, 22 May 2012 07:49:13 +0200
- To: HCLS <public-semweb-lifesci@w3.org>, "Eric Prud'hommeaux" <eric@w3.org>
Hi Eric, Something seems to have gone wrong with the generation of the minutes again. Could you please take a look at it? I am pasting the log from my mIRC buffer below for those who want to take a look in the meantime. Cheers, Scott RRSagent, draft minutes <RRSAgent> I have made the request to generate http://www.w3.org/2012/05/21-HCLS-minutes.html mscottm <mscottm> rrsagent, make log world-visible <RRSAgent> I have made the request, mscottm <Zakim> On the phone I see Tony, +1.510.705.aaaa, tlebo, +1.631.444.aabb, ??P9, Scott_Marshall, ??P1, +46.7.08.13.aadd, ??P13, Chimezie <Zakim> On IRC I see ram, matthias_samwald, Jun, amrapali, RRSAgent, Zakim, Janos, egombocz, rkiefer, bbalsa, achille_zappa, mscottm, ericP * matthias_samwald1 (m@128.131.167.8) has joined #hcls <Janos> http://dl.dropbox.com/u/21690634/Quantifying%20RDF%20data%20sets.pdf <Zakim> +??P15 <Zakim> + +1.412.623.aaee * BrianLowe (8054fe2e@78.129.202.38) has joined #hcls * HarryH (8231b338@64.62.228.82) has joined #hcls * chimezie (chimezie@99.59.109.32) has joined #hcls <chimezie> Zakim, who is on the phone? <Zakim> On the phone I see Tony, +1.510.705.aaaa, tlebo, +1.631.444.aabb, ??P9, Scott_Marshall, ??P1, +46.7.08.13.aadd, ??P13, Chimezie, ??P15, +1.412.623.aaee * matthias_samwald (m@149.148.241.249) Quit (Ping timeout) <HarryH> 412 623 is me - Harry Hochheiser Pittsburgh <mscottm> Brian Lowe: Developer on VIVO project, Susan Mitchell also works as developer / ontology on VIVO <mscottm> Harry Hochheiser - University of Pittsburgh, interested in HCLS <mscottm> Brian Lowe: Developer on VIVO project, Stella Mitchell also works as developer / ontology on VIVO * Stella (80fd5743@78.129.202.38) has joined #hcls <ram> Ram from Metaome - We have a life science search engine called DistilBio (distilbio.com) <Jun> scribe: Jun * michael (d17cbd27@64.62.228.82) has joined #hcls <Jun> s/Susan/Stella/ <Zakim> + +1.206.732.aaff <mscottm> Chimezie Ogbuji - Cleveland Clinic, Case Western, Recently started a startup <Janos> http://dl.dropbox.com/u/21690634/Quantifying%20RDF%20data%20sets.pdf <Zakim> + +1.857.250.aagg <chimezie> Zakim, mute me <Zakim> Chimezie should now be muted <Jun> Scott: introduce Janos' talk: it's important to differentiate RDF datasets apart from by their content, licenses, etc * mattgamble (801e06c7@64.62.228.82) has joined #hcls <mscottm> VIVO - scientific research network ontology <Jun> Janos: one of the members of CTSA Connect graduate programme, to connect two major ontologies, VIVO and ***, to connect clinical sciences data <chimezie> yes, I do <Zakim> +Tony.a <Janos> http://dl.dropbox.com/u/21690634/Quantifying%20RDF%20data%20sets.pdf * BobF (81b0c518@78.129.202.38) has joined #hcls <Jun> Slide 1: a lot of further work. this just presents a start <Jun> slide 2 <Jun> Janos: Semantic Web is based on RDF, a graph-based data model <mscottm> CTSA Connect: http://www.ctsaconnect.org/about-us <Jun> ... more flexible than relational DBs by allowing parallel edges <Jun> slide 3 <Jun> Janos: a paper submitted to the Triple Challenge 2010 <Jun> ... they did some quantification of datasets, looking into the internal structure of the data <Jun> .... drew some of the approaches of this paper <Jun> ... took a look of the datasets of the challenge, and did some structural analysis and others <Jun> slide 4 * AmitSheth (826c0136@64.62.228.82) has joined #hcls <Jun> Janos: a basic python library to parse n-triples. it's a memory based approach, and do some processing. based on PyPy * AmitSheth (826c0136@64.62.228.82) Quit (Quit: http://www.mibbit.com ajax IRC Client) <Jun> .... PyPy for just-in-time compiling. speed up the processing * Amit (826c0136@109.169.29.95) has joined #hcls <Amit> conference is full! cannot join by voice <Jun> .... just some basic statistical analysis, then started to do some pattern matching analysis. not by using SPARQL endpoint <Jun> ... each file is treated as its own graph. didn't use Named Graphs <Jun> Q: on scalability <Jun> Janos: largest one is LinkedCT <Jun> ... 28 millions triples. took 30% of a 64G memory <Jun> ... SPARQL1.1 might provide better performance promises <Jun> slide 5 <Jun> ... started with some basic counts <Jun> slide 6 <Jun> Janos: do some simple fractions calculations <Jun> ... e.g, how many literals in your triples <Jun> ... how many literals are unique? <Jun> ... how many objects are unique? <Jun> ... structure measurement, by taking out the typing sort of information and literals * egonw_ (egonw@145.20.139.203) has joined #HCLS <Jun> ... subject/object coverage, more pointing or more pointed? <Jun> ... more concrete examples to follow <Jun> slide 7 <mscottm> scribenick: Jun <Jun> Janos: computed it against a couple of LOD datasets, 4 of the LODD, DailyMed, LinkedCT, DrugbankRDF, RxNorm <Jun> ... BioGrid database: an open access DB on Protein and Genetic Interactions <Jun> ... BioPAX: pathways in BioPAX format <Jun> ... bioGrid can be downloaded via OWL format <Jun> ... VIVO: NIH funded project for scientific networking <Jun> .... got n-triples for VIVO dataset * amrapali (8b120872@64.62.228.82) Quit (Quit: http://www.mibbit.com ajax IRC Client) <Jun> ... go through by the number of triples desc <Jun> slide 8 <Jun> Janos: top subjects, top classes, predicates, etc * markthompson (9158d121@78.129.202.38) has joined #hcls <Jun> ... give you a good idea of how people use ontologies * egonw_ (egonw@145.20.139.203) Quit (Ping timeout) <Jun> ... LinkCT: 40% are literals, objects have 80% repetition <Jun> ... three dominant classes <Jun> Michael: have you done this analysis on the GO ontology? <Jun> Janos: not yet <Jun> Michael: expecting more diverse coverage <Jun> Janos: would be interesting to look at <Jun> slide 9 <Jun> Janos: BioGrid in BioPAX <Jun> ... 50MB in owl but 40 millions triples in n-triple format <Jun> ... again, subject, object coverage, and top classes. they are not LOD yet * egonw_ (egonw@145.20.139.203) has joined #HCLS <Jun> ... get a good sense of what's actually in the content <Jun> slide 10 <Jun> Janos: RxNorm <Jun> ... only 6 classes. pretty small <Jun> ... quite a bit of literals. structure data is higher than other datasets <Jun> Q: do you see a big structure differences from these datasets? <Jun> Janos: TBD * egonw_ (egonw@145.20.139.203) Quit (Ping timeout) <Jun> slide 11 <Jun> Janos: 1.2 million triples <Jun> ... data about publications, such as Authorship, Person ... <Zakim> -??P15 <Jun> ... publication is dominant data source there. pretty good subject/object coverage <Jun> slide 12 <Jun> Janos: it has a lot of links to outside datasets, have a much higher object coverage <Jun> slide 13 <Jun> Janos: top predicate: owl:sameAs. again has a lot of links to outside datasets * matthias_samwald1 (m@128.131.167.8) Quit (Ping timeout) <Jun> Scott: any idea about how one type of matrix could be more useful than another, or searching for others? <Jun> s/Scott/mscottm/ <Jun> slide 14 <mscottm> s/matrix/metric/ <Jun> Janos: there are a lot of tools for graph vis and analysis, but not so good with RDF data * ericP Zakim please dial ericP-office * ericP Zakim, please dial ericP-office * Zakim ok, ericP; the call is being made <Zakim> +EricP <Jun> slide 15 <Jun> Janos: the twist is to allow multiple paths between 2 nodes <Jun> slide 16 <Jun> Janos: there are ways to collapse the parallel edges, or put RDF into XML, in order to use some graph analysis tools <Jun> slide 17 <Jun> Janos: show some examples <Jun> ... get co-authors that are only members of a site, to get a smaller co-author network <Jun> slide 18 <Jun> Janos: do some basic graph analysis using Mathematica <Jun> ... basic in-degrees, out-degrees, histograms, one/two degree separation etc <mscottm> Nice! <Jun> slide 19 <Jun> Janos: Gephi doesn't support parallel edges. you have to do some pre-processing <Zakim> -tlebo <Jun> slide 20 <Jun> Janos:some links <Jun> q+ have you thought about encoding some of the statistics using VoID? * Zakim Jun, you typed too many words without commas; I suspect you forgot to start with 'to ...' <michael> thanks, janos, i need to drop off <Zakim> - +1.206.732.aaff * rkiefer (81b0c518@207.192.75.252) Quit (Quit: http://www.mibbit.com ajax IRC Client) <Zakim> -Tony <Jun> Eric: any further analysis on some of the results, like the social network? <Jun> <mscottm, I have to leave for another meeting> <mattgamble> First how do you work out which metrics are useful? * ericP away again <Zakim> -EricP <Zakim> - +46.7.08.13.aadd * bbalsa (d5723e05@109.169.29.95) Quit (Quit: http://www.mibbit.com ajax IRC Client) <egombocz> Our Knowledge Explorer also provides metrics for weighing of connections in several ways * BobF (81b0c518@78.129.202.38) Quit (Quit: http://www.mibbit.com ajax IRC Client) <Zakim> -??P1 <chimezie> Zakim, unmute me <Zakim> Chimezie should no longer be muted <mscottm> Chime - would you please jot your comment/question into IRC? I received an urgent call exactly when you started.. :( <Zakim> -Tony.a <chimezie> My question was whether he had considered using rdflib (https://github.com/RDFLib) * BrianLowe (8054fe2e@78.129.202.38) Quit (Ping timeout) <Zakim> - +1.510.705.aaaa * egombocz (42758fa2@207.192.75.252) Quit (Quit: http://www.mibbit.com ajax IRC Client) * michael (d17cbd27@64.62.228.82) has left #hcls <Zakim> - +1.857.250.aagg * mattgamble (801e06c7@64.62.228.82) Quit (Quit: http://www.mibbit.com ajax IRC Client) * markthompson (9158d121@78.129.202.38) Quit (Quit: http://www.mibbit.com ajax IRC Client) * Jun (81431aef@78.129.202.38) Quit (Quit: http://www.mibbit.com ajax IRC Client) * Amit (826c0136@109.169.29.95) Quit (Quit: http://www.mibbit.com ajax IRC Client) <Zakim> -Chimezie * chimezie (chimezie@99.59.109.32) Quit (Quit: chimezie) <mscottm> CTSA Connect - ISF - Integrated Semantic Framework: core is combining VIVO ontology and eagle-i ontology <HarryH> Thanks , Janos - very interesting! <ram> Thanks Janos * HarryH (8231b338@64.62.228.82) Quit (Quit: http://www.mibbit.com ajax IRC Client) <Stella> thanks all, bye <Zakim> - +1.412.623.aaee * ram (31f97127@78.129.202.38) Quit (Quit: http://www.mibbit.com ajax IRC Client) <Zakim> -Scott_Marshall <Zakim> -??P13 <Zakim> - +1.631.444.aabb <mscottm> bye all <Zakim> -??P9 <Zakim> SW_HCLS(BioRDF)11:00AM has ended <Zakim> Attendees were Tony, +1.510.705.aaaa, tlebo, +1.631.444.aabb, +46.7.08.13.aacc, Scott_Marshall, +46.7.08.13.aadd, Chimezie, +1.412.623.aaee, +1.206.732.aaff, +1.857.250.aagg, EricP * Stella (80fd5743@78.129.202.38) Quit (Quit: http://www.mibbit.com ajax IRC Client) <mscottm> Zakim, please draft minutes <Zakim> I don't understand 'please draft minutes', mscottm <mscottm> RRSagent, draft minutes <RRSAgent> I have made the request to generate http://www.w3.org/2012/05/21-HCLS-minutes.html mscottm <mscottm> rrsagent, make log world-visible <RRSAgent> I have made the request, mscottm On Tue, May 22, 2012 at 1:22 AM, M. Scott Marshall <mscottmarshall@gmail.com> wrote: > Here are the minutes from today's meeting: > http://www.w3.org/2012/05/21-HCLS-minutes.html > > I see large potential for using tools like Janos's to describe / > characterize datasets and their RDF representations for selection and > eventual query patterns - something essential in a linked data > marketplace. > > My apologies for not anticipating more participants and increasing the > default parameter for Zakim. If you were unable to join and have > questions or comments after reading the minutes, please send them to > the list or I will do my best to answer (or find someone who can). > > Thanks to Jun for scribing! Thanks to Janos for presenting! > > Richard Boyce will present work to us in early June. Stay tuned for details. > > Cheers, > Scott -- M. Scott Marshall, PhD https://plus.google.com/u/0/114642613065018821852/posts http://www.linkedin.com/pub/m-scott-marshall/5/464/a22
Received on Tuesday, 22 May 2012 05:50:04 UTC