- From: Felix Sasaki <fsasaki@w3.org>
- Date: Fri, 10 Aug 2012 15:46:14 +0200
- To: public-multilingualweb-lt@w3.org
- Cc: Daniel Naber <naber@danielnaber.de>
- Message-ID: <CAL58czq7TgjTtAqx869s+hdxfHirHhyFPJdr+evjgce-cagTUA@mail.gmail.com>
Hi all, today Arle and I met with the lead developer of languagetool http://www.languagetool.org/ , Daniel Naber (see CC). Background: we wanted to discuss how the output of languagetool could be re-used as pat of quality data category information. In languagetool, XML files are one means (in addition to Java rules and tool configuration) to specify rules for checking, see e.g. http://languagetool.svn.sourceforge.net/viewvc/languagetool/trunk/JLanguageTool/src/rules/en/grammar.xml?content-type=text%2Fplain The XML files don't have information for categorizing errors: names of categories, rules and rule groups are not standardized and not mapped to our top level quality types. First, Daniel agreed to add an attribute to grammar.xml for specifying in languagetool locQualityType. The value list for that attribute is not yet fixed, but we will keep Daniel in the loop so that later, when we have finalized the list, he can start implementing the attribute. This does not mean that all rules in languagetool are mapped to locQualityType, but it provides a hook to do the mapping. We when can work on implementing the mapping in community or other, more specific efforts. The attribute will be available in various parts of grammar.xml: at the "category" level for rule groups and for rules. Settings on a lower level override settings on the upper level(s). In that way, the writers of grammar rules can have their own mappings for single rules, even if the general mappings are different. A second topic we discussed was the output from languagetool. The current output of language tool looks like this, in an XML representation: <error fromy="0" fromx="0" toy="0" tox="5" ruleId="UPPERCASE_SENTENCE_START" msg="This sentence does not start with an uppercase letter" replacements="This" context="this is a test." contextoffset="0" errorlength="4"/> To get the mapping to locQualityType, one could use the ruleID, go back to grammar.xml and get the mapping. But the disadvantage is that you need to do access on the file level, and you will miss rules that come from the system, general java rules etc. Daniel proposed a better solution: he would add two more attributes to the output: one will convey the language tool version (this relates to issue-42). The other attribute will output (if available) the locqualitytype, from grammar.xml or whatever source was used to create the message. We discussed also other topics like the bi-text module in languagetool, see http://languagetool.wikidot.com/checking-translations-bilingual-texts Or the structure of error messages. Here we didn't conclude anything that is relevant for our topic. All, please let me know what you think, and Daniel, Arle, correct me if I missed something or got something wrong. We also agreed to put Daniel into the acknowledgement section, for contributing an implementation of the quality data category - without any support. Thanks a lot for that, Daniel! Best, Felix -- Felix Sasaki DFKI / W3C Fellow
Received on Friday, 10 August 2012 13:46:44 UTC