RE: Question on User Story S33: Normalizing data

So, is this about validating that the assessment test with two results is OK
as long as both results have the same coding term and the same assessor? And
not about reshaping the data?

Or is it about actually changing the data to collapse the two results into a
single one along the lines of pre-processing before the validation as Karen
may be suggesting?

* Holger Knublauch <> [2014-11-21 09:38+1000]
> Hi Eric,
> I have a question on the User Story S33 that you added recently:
> _data_patterns_for_simple_query
> You describe the requirement to normalize data - I guess automatically 
> to drop extra duplicate entries? Could you clarify how this would work 
> in practice: is your assumption that if there are two identical blank 
> nodes (like in your example) then the system could delete one of them? 
> What about cases where the two blank nodes have slight differences - 
> would this also be covered and how? Is this about automatically fixing 
> constraint violations?

This wasn't about repairing the data, merely identifying a conformant
dataset over which SPARQL queries can be executed without exhaustive error
checking. The example I provided would be pretty trivial to repair (I edited
it to clarify that it's a simplification), but there are lots of ways the
data can be broken and executing rules to normalize that data requires
serious babysitting, and would generally be decoupled from analysis. Medical
record custodians are typically risk-adverse and researchers are typically
happy with representative subsets of the data. The same validation can be
used by the custodians, if they every decide they'd like to clean up.

> Thanks for clarification
> Holger


