action-221 summary of overriding discussion

Hi all,

during todays call we discussed the overriding issue. The summary below
does not take into account who said what, just the main points. Below that
there is a proposal for a solution.

1)
In ITS 1.0 we specified overriding of ITS information (given via one of:
local markup, global rules, inheritance or defaults) to be "complete". That
can be shown by the ITS 1.0 test suite files. They specify for each node
the approach (local, global etc.) ITS information has been created.
Example:

<o:node path="/{}msgList/{}body[1]/{}msg[3]" outputType="new-value-global">
 <o:output o:locNoteType="alert">
  <o:locNoteText>The variable <code>{0}</code> has three possible values:
'printer','stacker' and 'stapler options'.</o:locNoteText>
 </o:output>
</o:node>

That only makes sense with complete overriding.

2) In ITS 2.0 there are data categories like mtConfidence or
textAnalysisAnnotation that would benefit a lot from not having to specify
information like "which tool produced this annotation or which tool created
this machine translation" in sync with other information, e.g. MT
confidence score or the actual named entity annotation.

3) Partial overriding might be a solution to the requirement of 2). One
could provide once, e.g. at the top of a document, what tools have been
used, and take that for the complete document into account.

4) Partial overriding would create backwards compatibility issues (see 1)
above) and problems for other data categories. E.g. for localization
quality issue, there are a lot of mutually exclusive options for using
metadata. The partipal overriding can get very complex if we start to use
pointers (e.g. localizationQualityTypePointer), standoff (again for
localization quality issue). Also, sometimes an inherited value might not
be what the creator of the metadata intended, e.g. having an inherited
"alert" localization note type.

5) Another solution might be a compound data category. That was mentioned
during the call, but we ran out of time to discuss it.

6) For the data categories in question (the "tool" specific part of
mtConfidence and text analyis annotation), it seems that there is no need
for global rules or inline markup. It would be sufficient to have a means
to say: "for a given document the following tooling was used".

Any comments on that summary?


SOLUTION PROPOSAL.

My proposal would be that we do not change overriding behavior at all. From
6), it seems that the tool specific information does not have a need for
selection at all, so no need for local markup, global rules, inheritance or
defaults. It is sufficient to express: "Then processing data category X,
the ITS aware processor MUST pass the information given about tooling to
the application".
So we could define a description of tooling information, based in
Christian's proposal, and say each description MUST make clear what data
category it relates to. The effect would be:
- Dropping the tool specific part of text analysis annotation
- Dropping the tool specific part of for mtConfidence
- Having a tool description format that is not part of the data to be
processed or of global rules, but that is independent. In addition to the
tooling information in Christians proposal, the format should target what
data category it refers to.
Influence on processing ITS: the tool information would be have a parameter
providing the tool specific information. Note that this is different to
its:param, which is a  for XPath-processing.
Influence on conformance: an application implementing the data categories
in question MUST be able to ship the information to an application
consuming ITS information. There is no influence on the selection
mechanisms, that is existing conformance clauses stay "as is".
Influence on test suite: in addition to the current format or as a kind of
"header" in it, we would define a block with the tool specific info.

Thoughts? If people agree on this, I'd be happy to draft a section and some
text cases.

Best,

Felix


-- 
Felix Sasaki
DFKI / W3C Fellow

Received on Thursday, 20 September 2012 17:05:19 UTC