- From: Eric Prud'hommeaux <eric@w3.org>
- Date: Mon, 24 Nov 2014 07:56:08 -0500
- To: Holger Knublauch <holger@topquadrant.com>
- Cc: public-data-shapes-wg@w3.org
* Holger Knublauch <holger@topquadrant.com> [2014-11-21 09:38+1000] > Hi Eric, > > I have a question on the User Story S33 that you added recently: > > https://www.w3.org/2014/data-shapes/wiki/User_Stories#S33:_Normalizing_data_patterns_for_simple_query > > You describe the requirement to normalize data - I guess > automatically to drop extra duplicate entries? Could you clarify how > this would work in practice: is your assumption that if there are > two identical blank nodes (like in your example) then the system > could delete one of them? What about cases where the two blank nodes > have slight differences - would this also be covered and how? Is > this about automatically fixing constraint violations? This wasn't about repairing the data, merely identifying a conformant dataset over which SPARQL queries can be executed without exhaustive error checking. The example I provided would be pretty trivial to repair (I edited it to clarify that it's a simplification), but there are lots of ways the data can be broken and executing rules to normalize that data requires serious babysitting, and would generally be decoupled from analysis. Medical record custodians are typically risk-adverse and researchers are typically happy with representative subsets of the data. The same validation can be used by the custodians, if they every decide they'd like to clean up. > Thanks for clarification > Holger > > -- -ericP office: +1.617.599.3509 mobile: +33.6.80.80.35.59 (eric@w3.org) Feel free to forward this message to any list for any purpose other than email address distribution. There are subtle nuances encoded in font variation and clever layout which can only be seen by printing this message on high-clay paper.
Received on Monday, 24 November 2014 12:56:15 UTC