Re: Question on User Story S33: Normalizing data from Karen Coyle on 2014-11-24 (public-data-shapes-wg@w3.org from November 2014)

From: Karen Coyle <kcoyle@kcoyle.net>
Date: Mon, 24 Nov 2014 07:26:22 -0800
To: public-data-shapes-wg@w3.org
Message-ID: <54734E1E.8070205@kcoyle.net>

On 11/24/14 4:56 AM, Eric Prud'hommeaux wrote:
> This wasn't about repairing the data, merely identifying a conformant
> dataset over which SPARQL queries can be executed without exhaustive
> error checking. The example I provided would be pretty trivial to
> repair (I edited it to clarify that it's a simplification), but there
> are lots of ways the data can be broken and executing rules to
> normalize that data requires serious babysitting, and would generally
> be decoupled from analysis. Medical record custodians are typically
> risk-adverse and researchers are typically happy with representative
> subsets of the data. The same validation can be used by the custodians,
> if they every decide they'd like to clean up.

There's a similar, but not the same, use case in the DC work where 
incoming data passes through an "enhancement" step (that includes 
cleanup and additions) before going on to the "real" application. 
(Normalizing dates, etc.) It occurs to me that this could carry a 
requirement to mark certain properties for enhancement. A typical 
enhancement in our environment is to take the literal representing a 
creator and do a lookup against data stores that can provide standard 
URIs for those entities.

kc
-- 
Karen Coyle
kcoyle@kcoyle.net http://kcoyle.net
m: 1-510-435-8234
skype: kcoylenet/+1-510-984-3600

Received on Monday, 24 November 2014 15:26:53 UTC