- From: Eric Prud'hommeaux <eric@w3.org>
- Date: Tue, 25 Nov 2014 01:49:22 -0500
- To: Irene Polikoff <irene@topquadrant.com>
- Cc: public-data-shapes-wg <public-data-shapes-wg@w3.org>, Holger Knublauch <holger@topquadrant.com>
- Message-ID: <CANfjZH3yVeDg176we1wh7P6Mm_qKQTGEqLG_N2kqBJKFh3p_vg@mail.gmail.com>
On Nov 25, 2014 4:20 AM, "Irene Polikoff" <irene@topquadrant.com> wrote: > > So, is this about validating that the assessment test with two results is OK > as long as both results have the same coding term and the same assessor? And > not about reshaping the data? In my use case, any data that did not pass the designated shape would be rejected. That includes the example broken data. > Or is it about actually changing the data to collapse the two results into a > single one along the lines of pre-processing before the validation as Karen > may be suggesting? Where mine is about selecting conformant data, Karen's is probably more easily accomplished by selecting non-conformant data, which could then be normalized via various heuristics and rules. > -----Original Message----- > From: Eric Prud'hommeaux [mailto:eric@w3.org] > Sent: Monday, November 24, 2014 7:56 AM > To: Holger Knublauch > Cc: public-data-shapes-wg@w3.org > Subject: Re: Question on User Story S33: Normalizing data > > * Holger Knublauch <holger@topquadrant.com> [2014-11-21 09:38+1000] > > Hi Eric, > > > > I have a question on the User Story S33 that you added recently: > > > > https://www.w3.org/2014/data-shapes/wiki/User_Stories#S33:_Normalizing > > _data_patterns_for_simple_query > > > > You describe the requirement to normalize data - I guess automatically > > to drop extra duplicate entries? Could you clarify how this would work > > in practice: is your assumption that if there are two identical blank > > nodes (like in your example) then the system could delete one of them? > > What about cases where the two blank nodes have slight differences - > > would this also be covered and how? Is this about automatically fixing > > constraint violations? > > This wasn't about repairing the data, merely identifying a conformant > dataset over which SPARQL queries can be executed without exhaustive error > checking. The example I provided would be pretty trivial to repair (I edited > it to clarify that it's a simplification), but there are lots of ways the > data can be broken and executing rules to normalize that data requires > serious babysitting, and would generally be decoupled from analysis. Medical > record custodians are typically risk-adverse and researchers are typically > happy with representative subsets of the data. The same validation can be > used by the custodians, if they every decide they'd like to clean up. > > > > Thanks for clarification > > Holger > > > > > > -- > -ericP > > office: +1.617.599.3509 > mobile: +33.6.80.80.35.59 > > (eric@w3.org) > Feel free to forward this message to any list for any purpose other than > email address distribution. > > There are subtle nuances encoded in font variation and clever layout which > can only be seen by printing this message on high-clay paper. > >
Received on Tuesday, 25 November 2014 06:49:50 UTC