W3C Workshop in Pisa "Content on the Multlingual Web" - ITS-related Takeaways

Hi there,
 
I presented work by Felix Sasaki, Yves Savourel, and myself at the W3C Workshop in Pisa "Content on the Multlingual Web" (see http://www.multilingualweb.eu/en/documents/pisa-workshop/program for general information on the workshop, and http://www.w3.org/International/multilingualweb/pisa/slides/lieske.pdf for the presentation).
 
While listening to the presenters, I made the following ITS-related notes I wanted to share:
 
1. Several speakers mentioned that it would be good if content could be categorized in a standard way as "Generated by Machine Translation (MT)". I guess there are various ways of looking at this from an ITS point of view:
	a. an additional data category with a semantics such as "generatedBy" 
	b. via a special, BCP47-compliant, value for the existing ITS data category "Language Information"; that special value may actually be a composite one since there may
be a need to capture things like the following
 
	o   Name of MT system that generated
	o   Quality of the input
	o   (Semi-)official quality rating of the system (BLEU score or the like)

2. Several speakers explained that it would be good if content could be categorized in a standard way as "OK to be submitted to Natural Language Processing (NLP)". Example: In order to build models for statistical Machine Translation the Web is deemed to be an invaluable resource. However, some uncertainty seems to exist whether this use of Web-based content would be permitted or not. A standardized categorization could help. I guess there are various ways of looking at this from an ITS point of view:
 
	a. an additional data category with a semantics such as "nlpOK" 
	b. something similar to the existing ITS data category "Localization Note" (namely one that captures information for machine processing, not for human consumption; see the discussion at http://www.w3.org/Bugs/Public/show_bug.cgi?id=3460)
 
3. Charles McCathieNevile mentioned the addition of the notion of a default locale to the Widget Packaging and Configuration (see http://www.w3.org/TR/widgets/#widget-package ). This made me wonder if "defaultLocale" might not be something that could be useful in quite a number of contexts - and thus would be a candidate for an additional ITS data  category. The Widget document actually initiated another localization related thought (namely that the Widget document should be required reading for anyone who works on standardized packaging for translation-related processes).
 
Cheers,
Christian

Received on Monday, 2 May 2011 14:55:59 UTC