Re: CAS, DUNS and LOD (was Re: Cost/Benefit Anyone? Re: Vote for my Semantic Web presentation at SXSW)

John

On 8/23/2011 9:05 AM, John Erickson wrote:
> This is an important discussion that (I believe) foreshadows how
> canonical identifiers are managed moving forward.
>
> Both CAS and DUNS numbers are a good example. Consider the challenge
> of linking EPA data; it's easy to create a list of toxic chemicals
> that are common across many EPA datasets. Based on those chemical
> names, its possible to further find (in most cases) references in
> DBPedia and other sources, such as PubChem:
>
> * ACETALDEHYDE
> * http://dbpedia.org/page/Acetaldehyde
> * http://pubchem.ncbi.nlm.nih.gov/summary/summary.cgi?cid=177
> * etc...
>
> Now, add to this a sensible agency-rooted URI design and a
> DBPedia-like infrastructure and one has a very powerful hub that
> strengthens the Linked Data ecosystem. It would arguably be stronger
> if CAS identifiers were also (somehow) included, but even the bits of
> linking shown above change the value proposition of traditional
> proprietary naming schemes...
Quite so and I did not mean to imply otherwise. Yes, gathering 
government agency URI identifiers for toxic chemicals is a value-add 
proposition.

I am curious if you find that different offices within agencies use the 
same URIs? Or did they have other identifiers in their records prior to 
the URIs?

That is will the URIs map to the identifiers used in EPA datasets for 
example?

Despite its obvious value, I don't agree that the project "change[s] the 
value proposition of traditional proprietary naming schemes..."

Mostly because it does not address the *prior* use of other identifiers 
in the published literature. However convenient it may be to pretend 
that we are starting off fresh, in fact we are not, in any information 
system.

The fact remains that even if we switched (miraculously) today to all 
new URI identifiers, we will be accessing literature using prior 
identifiers for a very long time. I suspect hundreds of years.

BTW, who bridges between the new URI schemes and the CAS identifiers? 
For searching traditional literature?

> John
> PS: At TWC we are about to go live with a registry called "Instance
> Hub" that will demonstrate the association of agency-based URI schemes
> --- think EPA, HHS, DOE, USDA, etc --- with instance data over which
> the agency has some authority or interest...More very soon!
Looking forward to it!

Hope you are having a great day!

Patrick


>
> On Tue, Aug 23, 2011 at 8:31 AM, Patrick Durusau<patrick@durusau.net>  wrote:
>> David,
>>
>> On 8/22/2011 9:55 PM, David Booth wrote:
>>
>> On Mon, 2011-08-22 at 20:27 -0400, Patrick Durusau wrote:
>> [ . . . ]
>>
>> The use of CAS identifiers supports searching across vast domains of
>> *existing* literature. Not all, but most of it for the last 60 or so
>> years.
>>
>> That is non-trivial and should not be lightly discarded.
>>
>> BTW, your objection is that "non-licensed systems" cannot use CAS
>> identifiers? Are these commercial systems that are charging their
>> customers? Why would you think such systems should be able to take
>> information created by others?
>>
>> Using the information associated with an identifier is one thing; using
>> the identifier itself is another.  I'm sure the CAS numbers have added
>> non-trivial value that should not be ignored.  But their business model
>> needs to change.  It is ludicrous in this web era to prohibit the use of
>> the identifiers themselves.
>>
>> If there is one principle we have learned from the web, it is enormous
>> value and importance of freely usable universal identifiers.  URIs rule!
>> http://urisrule.org/
>>
>> :)
>>
>> Well, I won't take the bait on URIs, ;-), but will note that re-use of
>> identifiers of a sort was addressed quite a few years ago.
>>
>> See: Feist Publications, Inc., v. Rural Telephone Service Co., 499 U.S. 340
>> (1991) or follow this link:
>>
>> http://en.wikipedia.org/wiki/Feist_v._Rural
>>
>> The circumstances with CAS numbers is slightly different because to get
>> access to the full set of CAS numbers I suspect you have to sign a licensing
>> agreement on re-use, which makes it a matter of *contract* law and not
>> copyright.
>>
>> Perhaps they should increase the limits beyond 10,000 identifiers but the
>> only people who want the whole monty as it were are potential commercial
>> competitors.
>>
>> The people who publish the periodical "Brain" for example at $10,000 a year.
>> Why should I want the complete set of identifiers to be freely available to
>> help them?
>>
>> Personally I think given the head start that the CAS maintainers have on the
>> literature, etc., that different models for use of the identifiers might
>> suit their purposes just as well. Universal identifiers change over time and
>> my concern is with the least semantic friction and not as much with how we
>> get there.
>>
>> Hope you are having a great day!
>>
>> Patrick
>>
>>
>>
>>
>> --
>> Patrick Durusau
>> patrick@durusau.net
>> Chair, V1 - US TAG to JTC 1/SC 34
>> Convener, JTC 1/SC 34/WG 3 (Topic Maps)
>> Editor, OpenDocument Format TC (OASIS), Project Editor ISO/IEC 26300
>> Co-Editor, ISO/IEC 13250-1, 13250-5 (Topic Maps)
>>
>> Another Word For It (blog): http://tm.durusau.net
>> Homepage: http://www.durusau.net
>> Twitter: patrickDurusau
>>
>
>

-- 
Patrick Durusau
patrick@durusau.net
Chair, V1 - US TAG to JTC 1/SC 34
Convener, JTC 1/SC 34/WG 3 (Topic Maps)
Editor, OpenDocument Format TC (OASIS), Project Editor ISO/IEC 26300
Co-Editor, ISO/IEC 13250-1, 13250-5 (Topic Maps)

Another Word For It (blog): http://tm.durusau.net
Homepage: http://www.durusau.net
Twitter: patrickDurusau

Received on Tuesday, 23 August 2011 13:48:07 UTC