RE: [ISSUE-34] "Pieces of information" for quality for agreement

Hi Arle,

 

Comments inline.

 

 

From: Arle Lommel [mailto:arle.lommel@dfki.de] 
Sent: Friday, August 10, 2012 4:46 AM
To: Multilingual Web LT Public List
Subject: [ISSUE-34] "Pieces of information" for quality for agreement

 

Based on the discussion in yesterday's meeting, I am sending the following list of "pieces of information" to see if we have agreement on them. When we have agreement on which pieces we are implementing, then we can return to the actual structure of how they are represented.

 

It seems that we will need two separate data categories, locQualityProfile and locQualityIssue. Issues in green are ones where we do not seem to have consensus on adopting them.)

 

YS> Its seems your note about locQualityIssueProfileRef and locQualityProfileDescrip, at the bottom, implies that if we have two data categories, the implementers must implement both. That would be a first in ITS where normally data categories can work alone.

 

 

The structure for locQualityProfile is:

* locQualityProfileDescrip: A QName that provides a prefix for the profile (which can be used to refer to the profile) and a URI where more information about the tool/profile can be found. (Default: human:human)
* locQualityProfileScore (optional): A score as generated by the tool or model referenced in locQualityProfileDescription. No default value defined.
* locQualityProfileThreshold (optional): Defines what score constitutes a "passing" score according to the model/tool used.

 

Open question: can the above be treated as provenance or otherwise unified with text analytics, which has a similar need as this category (see Issue-42).

 

The structure for locQualityIssue is at least one of the following:

 

* locQualityIssueProfileRef: Contains a text pointer to a locQualityProfileDescrip-defined prefix to bind locQualityIssue to a specific profile. Default is human. Normal inheritance applies. For example, if the code <body its-loc-quality-issue-profile="something"> appears in an HTML file, then all locQualityIssue instances within the body would inherit the value of "something" unless it is specifically overridden. (I realize this is already implementation-specific, but it illustrates the point.)
* locQualityIssueType: A value from the picklist that identifies the generic issue type. (Default: unclassified)
* locQualityIssueCode: A tool-specific code that corresponds to the value of locQualityIssueType. (Note: Yves now thinks this is unnecessary because the values are not constrained. Arle thinks this is needed even if the values are not constrained…)

YS> Note that if we drop locQualityIssueType we don’t need QNames, locQualityIssueprofileRef can be just a URI and can truly separate profile from issues.

* locQualityIssueComment: A human-readable note about the issue
* locQualityIssueSeverity: A value corresponding to the severity of the error. (The initial proposal was for this to be a numeric value, but Des and David both argue that this should be a free value. If this is the case, there is no guarantee of interoperability at all between values. E.g., what would a tool make of a value such as "severe" if there is no correlate to know what severe means in its own system. It is conceivable that the document pointed to in the profile could define values, but we are not defining what the profile itself looks like.)

YS> IMO the values for locQualityIssueSeverity should be 0-100 or some similar numeric range. I think most forms of severity can be mapped to that. For example CheckMate uses “high”, “medium” and “low” (actually we use colors, but internally it’s a 3-values system), I think I can map those display to ranges of values. Sure the implementation will require some tune-up to store the ITS original values to make sure they are preserved, but that’s the price we’ll happily pay for interoperability.

As Arle points out, using a free value would break most severity-related operations on issues coming from different tools. For example how to sort them? 

* locQualityissueSuggestion: A machine-readable suggestion for how to resolve the issue. (Felix is concerned that the complexity of a machine-readable solution might be too high)
* locQualityIssueStatus: An indicator of whether an issue is active or resolved. Possible values: active|resolved|rejected

YS> I don’t agree with the current locQualityIssueStatus. IMO a simple enabled/disabled flag is a better way to go. It allows the necessary means to handle false-positives when doing recurring checks. I wouldn’t know what to do with a workflow-type status.

* locQualityIssueStage: An indication of where in a workflow the issue is. (Des notes that we do not want fixed values for this. Arle questions whether it is needed if we have the issue stage since open values are not interoperable.)

YS> I wouldn’t know what to do with that one.

* locQualityIssueAgent: An identifier for the agent that produced the issue. Possible values: human|machine (Arle: if we have the locQualityIssueProfileRef, I think we don't need this since that is a more robust solution.)

 

To move forward with this, if you are considering implementing these data categories, which pieces do you consider essential enough to implement?As long as we have the two parts (a profile and an issue), it seems that the locQualityIssueProfileRef (what a horrible name!) and the locQualityProfileDescrip are required since the structure falls apart without them. But beyond those, will we have commitments to implement any of these particular pieces?

 

YS> So as far as implementation, here is my best guess:

 

Checkmate does not have a use for the locQualityProfile data category, so we might implemented if time/resource permit, but we would limit that to the ITS engine library and not use it ourselves.

 

We would certainly be very keen in implementing the locQualityIssue data category.

 

The minimal attributes IMO would be: locQualityIssueComment and locQualityIssueType.

 

locQualityIssueSeverity would be a big plus (as long as the values are interoperable)

 

locQualitySuggestion would be nice too.

 

A locQualityIssueEnabled=’yes/no’ instead of the locaQualityIssueStatus would be nice as well.

 

And last locQualityIssueProfileRef (ugly name indeed).

 

Any other information we would handle because it’s part of the data category, but we would not use them.

 

I hope this helps,

-yves

 

Received on Friday, 10 August 2012 14:24:17 UTC