[All implementors question] Re: [ISSUE-34] "Pieces of information" for quality for agreement from Felix Sasaki on 2012-08-10 (public-multilingualweb-lt@w3.org from August 2012)

From: Felix Sasaki <fsasaki@w3.org>
Date: Fri, 10 Aug 2012 17:16:49 +0200
To: public-multilingualweb-lt@w3.org
Message-ID: <CAL58czrH+cvWc-rqV0=P-AV7Gjs7D7YNrb6aY187wO0ge3FQFg@mail.gmail.com>
Hi Yves all,

co-chair hat on, see question below, which is relevant to all implementors
of quality.

2012/8/10 Yves Savourel <ysavourel@enlaso.com>

> Hi Arle,****
>
> ** **
>
> Comments inline.****
>
> ** **
>
> ** **
>
> *From:* Arle Lommel [mailto:arle.lommel@dfki.de]
> *Sent:* Friday, August 10, 2012 4:46 AM
> *To:* Multilingual Web LT Public List
> *Subject:* [ISSUE-34] "Pieces of information" for quality for agreement***
> *
>
> ** **
>
> Based on the discussion in yesterday's meeting, I am sending the following
> list of "pieces of information" to see if we have agreement on them. When
> we have agreement on which pieces we are implementing, then we can return
> to the actual structure of how they are represented.****
>
> ** **
>
> It seems that we will need two separate data categories, locQualityProfile
> and locQualityIssue. Issues in green are ones where we do not seem to have
> consensus on adopting them.)****
>
> ** **
>
> YS> Its seems your note about locQualityIssueProfileRef and
> locQualityProfileDescrip, at the bottom, implies that if we have two data
> categories, the implementers must implement both. That would be a first in
> ITS where normally data categories can work alone.****
>
> ** **
>
> ** **
>
> The structure for *locQualityProfile* is:****
>
>    - *locQualityProfileDescrip*: A QName that provides a prefix for the
>    profile (which can be used to refer to the profile) and a URI where more
>    information about the tool/profile can be found. (Default: human:human)
>    ****
>    - *locQualityProfileScore* (optional): A score as generated by the
>    tool or model referenced in locQualityProfileDescription. No default value
>    defined.****
>    - *locQualityProfileThreshold *(optional): Defines what score
>    constitutes a "passing" score according to the model/tool used.****
>
> ** **
>
> *Open question: can the above be treated as provenance or otherwise
> unified with text analytics, which has a similar need as this category (see
> Issue-42).*****
>
> ** **
>
> The structure for *locQualityIssue* is *at least* one of the following:***
> *
>
> ** **
>
>    - *locQualityIssueProfileRef*: Contains a text pointer to a
>    locQualityProfileDescrip-defined prefix to bind locQualityIssue to a
>    specific profile. Default is human. Normal inheritance applies. For
>    example, if the code <body its-loc-quality-issue-profile="something">appears in an HTML file, then all locQualityIssue instances within the body
>    would inherit the value of "something" unless it is specifically
>    overridden. (I realize this is already implementation-specific, but it
>    illustrates the point.)****
>    - *locQualityIssueType*: A value from the picklist that identifies the
>    generic issue type. (Default: unclassified)****
>    - *locQualityIssueCode*: A tool-specific code that corresponds to the
>    value of locQualityIssueType. *(Note: Yves now thinks this is
>    unnecessary because the values are not constrained. Arle thinks this is
>    needed even if the values are not constrained…)*****
>
> YS> Note that if we drop locQualityIssueType we don’t need QNames,
> locQualityIssueprofileRef can be just a URI and can truly separate profile
> from issues.****
>
>    - *locQualityIssueComment*: A human-readable note about the issue****
>    - *locQualityIssueSeverity*: A value corresponding to the severity of
>    the error. *(The initial proposal was for this to be a numeric value,
>    but Des and David both argue that this should be a free value. If this is
>    the case, there is no guarantee of interoperability at all between values.
>    E.g., what would a tool make of a value such as "severe" if there is no
>    correlate to know what severe means in its own system. It is conceivable
>    that the document pointed to in the profile could define values, but we are
>    not defining what the profile itself looks like.)*****
>
> YS> IMO the values for locQualityIssueSeverity should be 0-100 or some
> similar numeric range. I think most forms of severity can be mapped to
> that. For example CheckMate uses “high”, “medium” and “low” (actually we
> use colors, but internally it’s a 3-values system), I think I can map those
> display to ranges of values. Sure the implementation will require some
> tune-up to store the ITS original values to make sure they are preserved,
> but that’s the price we’ll happily pay for interoperability.****
>
> As Arle points out, using a free value would break most severity-related
> operations on issues coming from different tools. For example how to sort
> them? ****
>
>    - *locQualityissueSuggestion*: A machine-readable suggestion for how
>    to resolve the issue. *(Felix is concerned that the complexity of a
>    machine-readable solution might be too high)*****
>    - *locQualityIssueStatus*: An indicator of whether an issue is active
>    or resolved. Possible values: active|resolved|rejected****
>
> YS> I don’t agree with the current locQualityIssueStatus. IMO a simple
> enabled/disabled flag is a better way to go. It allows the necessary means
> to handle false-positives when doing recurring checks. I wouldn’t know what
> to do with a workflow-type status.****
>
>    - *locQualityIssueStage*: An indication of where in a workflow the
>    issue is. *(Des notes that we do not want fixed values for this. Arle
>    questions whether it is needed if we have the issue stage since open values
>    are not interoperable.)*****
>
> YS> I wouldn’t know what to do with that one.****
>
>    - *locQualityIssueAgent*: An identifier for the agent that produced
>    the issue. Possible values: human|machine *(Arle: if we have the
>    locQualityIssueProfileRef, I think we don't need this since that is a more
>    robust solution.)*****
>
> ** **
>
> To move forward with this, if you are considering implementing these data
> categories, which pieces do you consider essential enough to implement?As
> long as we have the two parts (a profile and an issue), it seems that the
> *locQualityIssueProfileRef* (what a horrible name!) and the *
> locQualityProfileDescrip* are required since the structure falls apart
> without them. But beyond those, will we have commitments to implement any
> of these particular pieces?****
>
> ** **
>
> YS> So as far as implementation, here is my best guess:****
>
> ** **
>
> Checkmate does not have a use for the locQualityProfile data category, so
> we might implemented if time/resource permit, but we would limit that to
> the ITS engine library and not use it ourselves.****
>
> ** **
>
> We would certainly be very keen in implementing the locQualityIssue data
> category.****
>
> ** **
>
> The minimal attributes IMO would be: locQualityIssueComment and
> locQualityIssueType.****
>
> ** **
>
> locQualityIssueSeverity would be a big plus (as long as the values are
> interoperable)****
>
> ** **
>
> locQualitySuggestion would be nice too.****
>
> ** **
>
> A locQualityIssueEnabled=’yes/no’ instead of the locaQualityIssueStatus
> would be nice as well.****
>
> ** **
>
> And last locQualityIssueProfileRef (ugly name indeed).****
>
> ** **
>
> Any other information we would handle because it’s part of the data
> category, but we would not use them.
>


Who would implement - in addition to Enlaso - locQualityIssueComent,
locQualityIssueType, locQualityIssueSeverity, locQualitySuggestion,
locQualityIssueEnabled=’yes/no’, locQualityIssueProfileRef?
A subset of these is fine too. We basically need to know: for which of
these items would we have at least two implementations?

Another question: who would need and implement additional items? Which one?

Best,

Felix



> ****
>
> ** **
>
> I hope this helps,****
>
> -yves****
>
> ** **
>



-- 
Felix Sasaki
DFKI / W3C Fellow
Received on Friday, 10 August 2012 15:17:15 UTC