mlw-lt-track-ISSUE-43 (Tool/process/model signatures): Unified model for tool, process, or model ???signatures??? [MLW-LT Standard Draft]

mlw-lt-track-ISSUE-43 (Tool/process/model signatures): Unified model for tool, process, or model “signatures” [MLW-LT Standard Draft]

http://www.w3.org/International/multilingualweb/lt/track/issues/43

Raised by: Arle Lommel
On product: MLW-LT Standard Draft

In looking at the localization quality data category, I realized that a split we discussed early on is needed and that we really have two data categories, not one.

1. The attributes locQualityProfile and locQualityScore really tell us something about an entire document and the state of the document. E.g., "this document was checked by CheckMate” or “this document has a quality score of 96% according to the LISA QA Model”.

2. The attributes locQualityType, locQualityCode, locQualitySeverity, and locQualityComment all pertain to specific items defined using that tool/model/process. It makes no sense to declare these about nodes bigger than a typical span, and certainly not to an entire document (except, I suppose, in the rare case where the document consists of only a few words, but that is a reductio ad absurdum).

So it would be possible to split these into two data categories (e.g., "Localization Quality Profile" and "Localization Quality Issue") and provide a mechanism for the Issue level to refer to the Profile level (i.e., for an issue to say "I was generated by CheckMate" without requiring CheckMate itself to be redeclared each time.

For example, in an HTML5 document the Profile bits could be declared in meta elements with a prefix defined (a QName) and then any elements with the Issue-level attributes could reference that pointer to define who/what generated them. So if I have the following:

<meta name="its-loc-quality-profile" content="abc:http://www.abcnotreal.org" />
<meta name="its-loc-quality-profile" content="grammar:http://www.somegrammarchecker.org" />

Then a dedicated attribute could have a value of "abc" to "sign" the element and its attributes for ABC, while a value of "grammar" would sign it for the other option.

So far so good, but it seems that this is actually a general issue for a number of categories (e.g., text analytics annotation) that need to refer to what (tool/process/model) generated some individual piece of content, but where it may be desirable to refer to that entity for the document as a whole rather than for each node.

So I am wondering if we might not consider unifying all of these sorts of these "signature" issues into a single model (i.e., separate data category) that can be shared by multiple categories. Doing this would eliminate non-essential redundancy across categories and would promote a consistent way of doing things in ITS 2.0.

So the question is if we can do this and if other agree. If there is agreement, then we need to work on this new category ASAP.

Received on Thursday, 9 August 2012 13:50:05 UTC