Re: [Linked Life Data] Announce: HCLS LLD task force W3C Note, Metadata policies, Monday 5PM CET

Hm, just a note, it seems the LLD acronym is a collision here.  It is also being used for "Linked Library Data"…

  http://www.w3.org/2005/Incubator/lld/


- Sands Fish
- Senior Software Engineer / Data Scientist
- MIT Libraries
- sands@mit.edu<mailto:sands@mit.edu>



On Dec 7, 2012, at 12:06 PM, "M. Scott Marshall" <mscottmarshall@gmail.com<mailto:mscottmarshall@gmail.com>>
 wrote:

Metadata of Linked Open Drug Data = dbcatalog description as N3 = namedgraph reflection

CKAN folks invited!

On Monday, Dec. 10 HCLS IG Linked Life Data task force will have a teleconference at 11AM ET / 5PM CET. We will be ratifying http://www.w3.org/2001/sw/hcls/notes/hcls-rdf-guide/ and discussing computable data descriptions. Sorry for cross-posting but this is relevant to all of the above lists.

Computable data descriptions
The need for a standardized approach to disclosing data descriptions about RDF renderings of datasets (where the master copy is sometimes in another form). All are welcome. Although our focus is on health care and life science data, the initial topic will be about generic data description metadata, so CKAN Datahub related.

Our trigger issue is the ongoing migration of linked data from fu-berlin to UMannheim (still hoping to hear precise status and expected status from Chris Bizer) and how to better synchronize aggregated metadata such as that maintained by CKAN or LODD (predecessor of this task force) on the HCLS wiki.

Due to the potentially expansive nature of the discussion (first one should be short) and limited teleconference bridge capacity, please let me know if you plan to attend. Feel free to continue the discussion that we've been having on the hcls list.

Some propositions:
* Create tools that enable simplified "publishing" of datasets (automatically add metadata triples to graph itself, with graphURI as subject)
* Write synchronizer that updates aggregate indexes such as CKAN with metadata from the graph

Test case:
How do I write a SPARQL query to retrieve all versions of DrugBank that have an update frequency that is more frequent than yearly?

Note that there is no assumption of only one version of the RDF. There could be tens of RDF versions of DrugBank, or more.

Teleconference Information:
Dial-In #: +1.617.761.6200 (Cambridge, MA) Participant Access Code: 4257 ("HCLS") IRC Channel: http://irc.w3.org<http://www.google.com/url?q=http%3A%2F%2Firc.w3.org&usd=2&usg=AFQjCNGyvJWqmBFYeGsEXwKaQZ3endepYA> port 6665 channel #HCLS

Agenda:
* ratification of IG Note http://www.w3.org/2001/sw/hcls/notes/hcls-rdf-guide/ (10 min?) - All
* status of fu-berlin LODD data, actions? (10 min?) - Any!
* approach to create more up-to-date metadata (5 min?) - All
* well-defined standardized metadata to provide information about data update frequency (10 min?)

-Scott

--
M. Scott Marshall, PhD
MAASTRO clinic, http://www.maastro.nl/en/1/
http://eurecaproject.eu/
https://plus.google.com/u/0/114642613065018821852/posts
http://www.linkedin.com/pub/m-scott-marshall/5/464/a22

On Thu, Nov 22, 2012 at 11:51 AM, Pablo N. Mendes <pablomendes@gmail.com<https://mail.google.com/mail/?view=cm&fs=1&tf=1&to=pablomendes@gmail.com>> wrote:

FYI, it has been the practice to put the metadata on a CKAN instance hosted at TheDataHub.org<http://TheDataHub.org>. There has been work to share the metadata as RDF: http://lod2.eu/BlogPost/1095-ckan-rdf-output.html

We have also talked about adding more quality-related metadata (including how current the data is, more fine grained descriptions of what is in there, what links where, etc.) [1,2,3]. If you are putting together a consortium, you may want to try to channel some of that support towards the CKAN folks, so that changes can be made on the source (i.e. the catalog that holds all of the LOD cloud metadata).

I am cross posting this message to CKAN-discuss to see if anybody picks up from there.

Cheers,
Pablo

[1] http://webcache.googleusercontent.com/search?q=cache:cfLMuj9qNbMJ:wiki.ckan.org/Linked_Data+&cd=2&hl=en&ct=clnk
[2] http://wiki.planet-data.eu/uploads/c/c0/D4.1.pdf
[3] http://wiki.planet-data.eu/uploads/d/d7/D2.1.pdf



On Wed, Nov 21, 2012 at 4:35 PM, M. Scott Marshall <mscottmarshall@gmail.com<https://mail.google.com/mail/?view=cm&fs=1&tf=1&to=mscottmarshall@gmail.com>> wrote:
Thanks Keith. I was aware of VoID and the lov engine. That's the right direction but not the level of detail that we have in mind.

In the scenario of a (de-centralized) data marketplace, the metadata/description for a dataset stored in a quad store could be expressed as triples with the named graph URI as the subject.

A requirement for unambiguous representation of the dataset description would be that values would come from a namespace, preferrably one provided by an ontology so that machine reasoning can be used on the constraints.

One test case for update frequency could be:

How do I write a SPARQL query to retrieve all versions of DrugBank that have an update frequency that is more frequent than yearly?

Note that there is no assumption of only one version of the RDF. There could be tens of versions, or more.

where the update frequency could be more specific than the current possibilities in http://purl.org/NET/dady#UpdateFrequency. Presumably, you would have a term URI for each of hourly, daily, weekly, monthly, yearly, etc. making the value machine consumable and 'machine reasonable'. So, the values should not be string literals.

BTW, there are other types of information, such as license type that we also would like to encode with precise URIs for the values. These types of information are of great importance to some consumers of the data such as pharmaceutical companies.

Cheers,
Scott


On Wed, Nov 21, 2012 at 3:12 PM, Keith Alexander <keithalexander@keithalexander.co.uk<https://mail.google.com/mail/?view=cm&fs=1&tf=1&to=keithalexander@keithalexander.co.uk>> wrote:
Hi

On Wed, Nov 21, 2012 at 1:58 PM, M. Scott Marshall <mscottmarshall@gmail.com<https://mail.google.com/mail/?view=cm&fs=1&tf=1&to=mscottmarshall@gmail.com>> wrote:
In discussions at the Biohackathon 2011 (Kyoto), we agreed that a standard data set description would make it easier to consume distributed data such as LOD. We created a wishlist of metadata that we would like to be able to consume via SPARQL, including date of last update of RDF rendering and date of last update of source data (if the RDF is an additional representation of that data source). We also discussed update frequency as something that we would like to represent in RDF.

See
http://rdfs.org/ns/void
http://www.w3.org/TR/void/


Does anybody know of a good way of representing periodicity in a generic fashion (appropriate ontology/namespace)? Of course, just being able to represent hourly, daily, weekly, monthly, annually and provide it to software agents via SPARQL would be an improvement on having to ask around. :)

http://vocab.deri.ie/dady# ?

there is also the RSS 1 module
http://web.resource.org/rss/1.0/modules/syndication/

sy:updatePeriod
"Describes the period over which the channel format is updated. Acceptable values are: hourly, daily, weekly, monthly, yearly. If omitted, daily is assumed."

btw, if you don't know it, http://lov.okfn.org/dataset/lov/ is a really handy vocabulary search engine.

Best

Keith

Cheers,
Scott

--
M. Scott Marshall, PhD
MAASTRO clinic, http://www.maastro.nl/en/1/
http://eurecaproject.eu/
https://plus.google.com/u/0/114642613065018821852/posts
http://www.linkedin.com/pub/m-scott-marshall/5/464/a22


On Tue, Nov 20, 2012 at 4:49 PM, Sands Alden Fish <sands@mit.edu<https://mail.google.com/mail/?view=cm&fs=1&tf=1&to=sands@mit.edu>> wrote:
Yes, I'd be curious to know the update frequency as well.  This being from September, 2011, we'd be anticipating a new cut right now.



On Nov 20, 2012, at 8:52 AM, Michael Hausenblas <michael.hausenblas@deri.org<https://mail.google.com/mail/?view=cm&fs=1&tf=1&to=michael.hausenblas@deri.org>>
 wrote:

>
>> What's the update frequency of this effort?
>
> AFAIK roughly once per year up to now but Richard would be the more competent person to provide you with an answer ;)
>
> Cheers,
>          Michael
>
> --
> Dr. Michael Hausenblas, Research Fellow
> DERI - Digital Enterprise Research Institute
> NUIG - National University of Ireland, Galway
> Ireland, Europe
> Tel.: +353 91 495730<tel:%2B353%2091%20495730>
> http://mhausenblas.info/
>
> On 20 Nov 2012, at 13:48, Kingsley Idehen wrote:
>
>> On 11/20/12 7:59 AM, Michael Hausenblas wrote:
>>>> I would like to ask you if you can give me the information, in linked open data project, which data sets makes reference to which data sets and how many links there are between them.
>>> http://lod-cloud.net/state/
>>
>> Michael,
>>
>> What's the update frequency of this effort?
>>
>> Kingsley
>>>
>>>
>>> Cheers,
>>>        Michael
>>>
>>> --
>>> Dr. Michael Hausenblas, Research Fellow
>>> DERI - Digital Enterprise Research Institute
>>> NUIG - National University of Ireland, Galway
>>> Ireland, Europe
>>> Tel.: +353 91 495730<tel:%2B353%2091%20495730>
>>> http://mhausenblas.info/
>>>
>>> On 19 Nov 2012, at 15:42, Mary Koutraki wrote:
>>>
>>>> Dear all,
>>>>
>>>> I would like to ask you if you can give me the information, in linked open data project, which data sets makes reference to which data sets and how many links there are between them.
>>>>
>>>> Thank you in advance.
>>>>
>>>> --
>>>> Mary Koutraki
>>>> PhD Student on Semantic Web
>>>> UVSQ - ETIS Lab
>>>>
>>>>
>>>
>>>
>>>
>>
>>
>> --
>>
>> Regards,
>>
>> Kingsley Idehen
>> Founder & CEO
>> OpenLink Software
>> Company Web: http://www.openlinksw.com<http://www.openlinksw.com/>
>> Personal Weblog: http://www.openlinksw.com/blog/~kidehen
>> Twitter/Identi.ca<http://Identi.ca> handle: @kidehen
>> Google+ Profile: https://plus.google.com/112399767740508618350/about
>> LinkedIn Profile: http://www.linkedin.com/in/kidehen
>>
>>
>>
>>
>>
>
>




--
M. Scott Marshall, PhD
MAASTRO clinic, http://www.maastro.nl/en/1/
http://eurecaproject.eu/
https://plus.google.com/u/0/114642613065018821852/posts
http://www.linkedin.com/pub/m-scott-marshall/5/464/a22



--

Pablo N. Mendes
http://pablomendes.com<http://pablomendes.com/>

Received on Friday, 7 December 2012 20:20:55 UTC