Re: owl:sameAs - Harmful to provenance? from Michel Dumontier on 2013-04-08 (public-semweb-lifesci@w3.org from April 2013)

From: Michel Dumontier <michel.dumontier@gmail.com>
Date: Mon, 8 Apr 2013 14:41:29 -0400
To: Alan Ruttenberg <alanruttenberg@gmail.com>
Cc: "Bhat, Talapady N." <talapady.bhat@nist.gov>, Phillip Lord <phillip.lord@newcastle.ac.uk>, Oliver Ruebenacker <curoli@gmail.com>, David Booth <david@dbooth.org>, Pat Hayes <phayes@ihmc.us>, Peter Ansell <ansell.peter@gmail.com>, public-semweb-lifesci <public-semweb-lifesci@w3.org>
Message-ID: <CALcEXf6wX6e2Od762+52B2TrjwJm1eypf4JpwakMpqkWyhgk0g@mail.gmail.com>

On Mon, Apr 8, 2013 at 2:23 PM, Alan Ruttenberg <alanruttenberg@gmail.com>wrote:

>
>
>
> On Mon, Apr 8, 2013 at 2:10 PM, Michel Dumontier <
> michel.dumontier@gmail.com> wrote:
>
>>
>>
>>
>> On Mon, Apr 8, 2013 at 1:23 PM, Alan Ruttenberg <alanruttenberg@gmail.com
>> > wrote:
>>
>>> Nicely pointed out, TN.
>>>
>>> Thinking about "metadata" as some other category of data is usually a
>>> bad sign. I've often found it to mean, in practice, "data I care less
>>> about".
>>>
>>> Phil, to make the case that RDF helps here, we would want to compare how
>>> easy it is to do significant work using the ill-represented examples you
>>> find versus raw text, versus xml, versus tab-delimited files.
>>>
>> While there is some limited benefit to getting rid of the surface syntax
>>> problem, it's not clear how much of a problem that ever was.
>>>
>>> surely you're joking Mr. Ruttenberg!
>>
>> for anybody who has worked with more than one file format clearly
>> understands the challenges and productivity death in dealing with multiple
>> ad-hoc syntaxes that require specialized parsers (I assume you are familiar
>> with this).
>>
>
> Quite. So I make the above statement as an expert in the field. Compared
> to understanding what is asserted in a database, or resolving identifiers,
> syntax is very easy.
>

then let us solve the syntax problem with a single, web-friendly syntax.
it's not meaningful to draw a comparison. it is simply another issue.

>
>
>> Bio2RDF primarily exists just to normalize syntax first (RDF), and then
>> to ensure referential integrity second (naming).
>>
>
> I have little more to say about Bio2RDF.
>

I think we've made a lot of progress over the last year, and there's much
work to be done. But go ahead, we're listening to all constructive
criticisms.

>
>
>>  Other projects can now take these normalized data and transform them
>> into unifying schema and vocabulary (e.g. we use SIO to do this), and
>> others (e.g. cytoscape, virtuoso, etc) can build tools to analyze and make
>> pretty views of data.
>>
>
> Odd, that's what I though the semantic web was for.
>

it is, and we have exactly the semantic web technologies to do this now.
SPARQL + OWL.

>
>   It's pretty clear to me that this effort is not a all or nothing
>> proposition. Standardization and agreement at every level brings benefits,
>> but it's a non-starter to wait for full agreement just as much as it's a
>> non-starter for a small group of people to claim (and be solely recognized
>> for) their "community" standard.
>>
>
> Like the PROV group, which I assume is what you are referring to ;-)
>

ha!  if only it were one group. With hundreds if not thousands of
terminologies and ontologies now in play, each covering some part of what
we need to express any one dataset, the challenge is really how to both
consolidate efforts (PROV is a great example of grassroots-driven
consolidation) and also manage an increase in representational diversity
and complexity (e.g. mapping, query rewriting, etc).

Since we do not have a complete understanding nor a comprehensive guide of
how to optimally represent knowledge for all possible use cases, we can
expect a plurality of approaches to be explored in the foreseeable
future.  We can engineer social structures (like OBO Foundry and other
organizations) and technological projects (like Bio2RDF and other projects)
to increase our return on any data/knowledge that is produced, regardless
of its lack of conformance to a perceived solution.

m.

>
> -Alan
>
>
>>
>> m.
>>
>>
>>
>>> -Alan
>>>
>>>
>> --
>> Michel Dumontier
>> Associate Professor of Bioinformatics, Carleton University
>> Chair, W3C Semantic Web for Health Care and the Life Sciences Interest
>> Group
>> http://dumontierlab.com
>>
>
>

-- 
Michel Dumontier
Associate Professor of Bioinformatics, Carleton University
Chair, W3C Semantic Web for Health Care and the Life Sciences Interest Group
http://dumontierlab.com

Received on Monday, 8 April 2013 18:42:21 UTC