[ISSUE-34] "Pieces of information" for quality for agreement

Based on the discussion in yesterday's meeting, I am sending the following list of "pieces of information" to see if we have agreement on them. When we have agreement on which pieces we are implementing, then we can return to the actual structure of how they are represented.

It seems that we will need two separate data categories, locQualityProfile and locQualityIssue. Issues in green are ones where we do not seem to have consensus on adopting them.)

The structure for locQualityProfile is:

locQualityProfileDescrip: A QName that provides a prefix for the profile (which can be used to refer to the profile) and a URI where more information about the tool/profile can be found. (Default: human:human)
locQualityProfileScore (optional): A score as generated by the tool or model referenced in locQualityProfileDescription. No default value defined.
locQualityProfileThreshold (optional): Defines what score constitutes a "passing" score according to the model/tool used.

Open question: can the above be treated as provenance or otherwise unified with text analytics, which has a similar need as this category (see Issue-42).

The structure for locQualityIssue is at least one of the following:

locQualityIssueProfileRef: Contains a text pointer to a locQualityProfileDescrip-defined prefix to bind locQualityIssue to a specific profile. Default is human. Normal inheritance applies. For example, if the code <body its-loc-quality-issue-profile="something"> appears in an HTML file, then all locQualityIssue instances within the body would inherit the value of "something" unless it is specifically overridden. (I realize this is already implementation-specific, but it illustrates the point.)
locQualityIssueType: A value from the picklist that identifies the generic issue type. (Default: unclassified)
locQualityIssueCode: A tool-specific code that corresponds to the value of locQualityIssueType. (Note: Yves now thinks this is unnecessary because the values are not constrained. Arle thinks this is needed even if the values are not constrained…)
locQualityIssueComment: A human-readable note about the issue
locQualityIssueSeverity: A value corresponding to the severity of the error. (The initial proposal was for this to be a numeric value, but Des and David both argue that this should be a free value. If this is the case, there is no guarantee of interoperability at all between values. E.g., what would a tool make of a value such as "severe" if there is no correlate to know what severe means in its own system. It is conceivable that the document pointed to in the profile could define values, but we are not defining what the profile itself looks like.)
locQualityissueSuggestion: A machine-readable suggestion for how to resolve the issue. (Felix is concerned that the complexity of a machine-readable solution might be too high)
locQualityIssueStatus: An indicator of whether an issue is active or resolved. Possible values: active|resolved|rejected
locQualityIssueStage: An indication of where in a workflow the issue is. (Des notes that we do not want fixed values for this. Arle questions whether it is needed if we have the issue stage since open values are not interoperable.)
locQualityIssueAgent: An identifier for the agent that produced the issue. Possible values: human|machine (Arle: if we have the locQualityIssueProfileRef, I think we don't need this since that is a more robust solution.)

To move forward with this, if you are considering implementing these data categories, which pieces do you consider essential enough to implement?As long as we have the two parts (a profile and an issue), it seems that the locQualityIssueProfileRef (what a horrible name!) and the locQualityProfileDescrip are required since the structure falls apart without them. But beyond those, will we have commitments to implement any of these particular pieces?

Best,

Arle

Received on Friday, 10 August 2012 10:46:17 UTC