Re: [Linked Life Data] May 21 RDF graph analytics for linked data metrics - Janos Halagos

<Blush> Thanks for asking that! - I meant to do a reply all. -Scott
On May 16, 2012 5:44 PM, "Jun Zhao" <jun.zhao@zoo.ox.ac.uk> wrote:

> Very interesting!
>
> Is this meant to be a private message to me?
>
> I put this call down to my calendar:)
>
> -- Jun
>
>
> On 16/05/2012 14:55, M. Scott Marshall wrote:
>
>> Hi Jun,
>>
>> Janos sent me an updated abstract and title (below). The initial
>> announcement was something that I cobbled up out of e-mail with Janos.
>>
>> The intention of RDF graph metrics, as I imagine applying them, is to
>> provide measures that can be used to characterize the RDF data set and its
>> contents so that potential consumers (and their agents) in the 'data
>> marketplace' can evaluate whether to pursue access, as well get cues as to
>> which query patterns will be most effective for their purposes. Data
>> sniffing if you like. A nice term was coined by Ed Chi: 'information
>> scent'
>> ( http://www-users.cs.umn.edu/~**echi/<http://www-users.cs.umn.edu/~echi/>) At least, that's my motivation. I
>> will let Janos explain his perspective.
>>
>> Cheers,
>> Scott
>>
>> Title: Quantifying RDF data sets****
>>
>>
>> Abstract:****
>>
>> The semantic Web is built on the Resource Description Framework (RDF).
>>  RDF
>> is a graph model. It would be expected that a wide range of network
>> analytical tools could be directly applied to a RDF data set. However,
>> most
>> network algorithms assume that a graph does not have parallel edges which
>> the RDF graph model allows. Two approaches will be examined: direct
>> measures of RDF graph structure using ratios and extraction of graphs from
>> an RDF data set.   Py-Triple-Simple (
>> http://code.google.com/p/py-**triple-simple/<http://code.google.com/p/py-triple-simple/>),
>> an experimental pure Python
>> library, can extract “well behaved” graphs from an N-triples file andcan
>> quantify RDF graph structure using ratios.****
>>
>>
>> Bio: Janos Hajagos is a Senior Programmer/Analyst at Stony Brook
>> University
>> School of Medicine, New York. He is the principal data analyst for the New
>> York State Department of Health Modernization of Medicaid Initiatives and
>> the campus lead for CTSA Connect (http://www.ctsaconnect.org/).  He
>> received his Ph.D. in Ecology and Evolution from Stony Brook University.
>>
>>
>> On Wed, May 16, 2012 at 10:25 AM, Jun Zhao<jun.zhao@zoo.ox.ac.uk>  wrote:
>>
>>  HI Scott and Janos,
>>>
>>>  From the abstract I understand that Janos' metrics is highly related to
>>> the quality of RDF data. Is that right?
>>>
>>> Cheers,
>>>
>>> Jun
>>>
>>>
>>> On 15/05/2012 23:31, M. Scott Marshall wrote:
>>>
>>>  Next Monday, Janos Hajagos will present his work on direct analytics on
>>>> RDF
>>>> graphs at 11AM ET / 5PM CET in the BioRDF / LODD teleconference
>>>> timeslot.
>>>>
>>>> Janos: "We need to better understand the RDF data that we are publishing
>>>> and the internal structure.  If we want to improve quality of the RDF
>>>> that
>>>> we publish we need actual metrics that go beyond the 5 stars of linked
>>>> data. I think this is a direction where the LODD/BioRDF group could make
>>>> significant progress in is developing methodology for analyzing
>>>> qualityof
>>>> RDF data sets.  Part of these metrics would tie into some of the
>>>> ontology
>>>> improvements that could be made to describe the publishing process as
>>>> was
>>>> mentioned on the call, e.g. how often is this data refreshed."
>>>>
>>>> http://code.google.com/p/py-******triple-simple/<http://code.google.com/p/py-****triple-simple/>
>>>> <http://code.**google.com/p/py-**triple-**simple/<http://code.google.com/p/py-**triple-simple/>
>>>> >
>>>> <http://code.**google.com/p/**py-triple-simple/<http://google.com/p/py-triple-simple/>
>>>> <http://code.**google.com/p/py-triple-simple/<http://code.google.com/p/py-triple-simple/>
>>>> **>
>>>> **>
>>>>
>>>>
>>>> Cheers,
>>>> Scott
>>>>
>>>>
>>>
>>
>

Received on Wednesday, 16 May 2012 16:22:35 UTC