RE: [ISSUE 34] Quality error category proposal

>> - when is "later"? after the summer or for ITS 3.0?
> We're actually starting work on this topic in another DFKI 
> project now, but I do not anticipate seeing anything 
> suitable until later this year at the earliest 
> (certainly later than September).

My question is: would the final definition of the additional info be in ITS 2.0?
I'm assuming yes.

> I think the URI pointer will be vital, even if it 
> doesn't take the form described, because we will 
> still need a way to point to the machine-readable
> definition of the error, however it will be defined.

If the info the URI is pointing too is to be part of ITS 2.0, then I have no problem with the model. Actually it may even make the XLIFF 2.0 mapping a lot easier.

My concern was that the additional info would not be part of ITS 2.0, then having just a very simplistic set of information for quality-error would have been not very useable.

>> type/code of error,
> This is actually what I see the URL as supplying. Just a 
> code by itself does not support interoperability. However,
> if we point to machine-readable information (and define 
> what that information looks like, which obviously won't 
> happen in the next week or so), then we can work towards 
> interoperability. Would that be OK for you, or are 
> you thinking of (a) a defined picklist or (b) simple native 
> codes (like the ones you supplied me a while back)? I'm 
> really hoping that your answer is that you don't want A or B.

If the goal is to define the full set of information for ITS 2.0, then I have no problem doing it step by step. I just think it shouldn't be done for after ITS 2.0.

As for the value of type type/code of the issue:

It seems we keep running into this pattern of needing a main finite list of values for interoperability and at the same time a way to optionally provide user-defined values.

The category + sub-category model we talked several times about may work here as well. Actually it would probably work very well.

The first part of the composite value (so called the category) would be a pre-defined ITS finite list. Something like: inline-code, whitespace, grammar, terminology, spelling, date-format, number-format, etc. Any tool can likely decide in which of this broad values the specific issue belongs.

Then they can, if they want to, supplement this with their more specialized type. That value would be composed of some authority identifier and the actual value, using a QName-like format for example.

So used together we would have something like:




issueType="whitespace" + issueSubType="enlaso:MISSING_LEADINGWS"

issueType="inline-code" + issueSubType="enlaso:EXTRA_CODE"

The actual notation using a single attribute or two is secondary. The idea is that the main category is mandatory if the sub-category is used, so tools can always fall back to the broad type of issue.

>> and a flag indicating if the given issue is active or not.
> Good suggestion. What values would you suggest? I think there 
> is more to it than whether it is active or not. For instance,
> a reviewer might catch and error and flag it in the file 
> (making it active). It then goes back to the translator, 
> who cannot resolve it or needs confirmation about the 
> proposed resolution, in which case it is still active but 
> you would treat it differently than in the first case. 

The simpler the better. Something like enabled='yes|no" would do fine IMO. It just says this issue is currently disable/enabled, that's all users care as far as my experience goes. It's mostly used to flag false-positives as the same user re-run the check after fixing a set of problems.

I would add one more information: An attribute to store a possible suggested replacement text. Quite a few issues can be fixed automatically, or with a simple human validation. That attribute would hold the content to substitute to the content selected for the annotation.


Received on Wednesday, 18 July 2012 09:31:39 UTC