- From: Irene Polikoff <irene@topquadrant.com>
- Date: Mon, 24 Nov 2014 22:19:48 -0500
- To: "'Eric Prud'hommeaux'" <eric@w3.org>, "'Holger Knublauch'" <holger@topquadrant.com>
- Cc: <public-data-shapes-wg@w3.org>
So, is this about validating that the assessment test with two results is OK as long as both results have the same coding term and the same assessor? And not about reshaping the data? Or is it about actually changing the data to collapse the two results into a single one along the lines of pre-processing before the validation as Karen may be suggesting? -----Original Message----- From: Eric Prud'hommeaux [mailto:eric@w3.org] Sent: Monday, November 24, 2014 7:56 AM To: Holger Knublauch Cc: public-data-shapes-wg@w3.org Subject: Re: Question on User Story S33: Normalizing data * Holger Knublauch <holger@topquadrant.com> [2014-11-21 09:38+1000] > Hi Eric, > > I have a question on the User Story S33 that you added recently: > > https://www.w3.org/2014/data-shapes/wiki/User_Stories#S33:_Normalizing > _data_patterns_for_simple_query > > You describe the requirement to normalize data - I guess automatically > to drop extra duplicate entries? Could you clarify how this would work > in practice: is your assumption that if there are two identical blank > nodes (like in your example) then the system could delete one of them? > What about cases where the two blank nodes have slight differences - > would this also be covered and how? Is this about automatically fixing > constraint violations? This wasn't about repairing the data, merely identifying a conformant dataset over which SPARQL queries can be executed without exhaustive error checking. The example I provided would be pretty trivial to repair (I edited it to clarify that it's a simplification), but there are lots of ways the data can be broken and executing rules to normalize that data requires serious babysitting, and would generally be decoupled from analysis. Medical record custodians are typically risk-adverse and researchers are typically happy with representative subsets of the data. The same validation can be used by the custodians, if they every decide they'd like to clean up. > Thanks for clarification > Holger > > -- -ericP office: +1.617.599.3509 mobile: +33.6.80.80.35.59 (eric@w3.org) Feel free to forward this message to any list for any purpose other than email address distribution. There are subtle nuances encoded in font variation and clever layout which can only be seen by printing this message on high-clay paper.
Received on Tuesday, 25 November 2014 03:20:26 UTC