- From: Lieske, Christian <christian.lieske@sap.com>
- Date: Tue, 30 Nov 2010 09:25:22 +0100
- To: "multilingualweb-partners@w3.org" <multilingualweb-partners@w3.org>, "public-i18n-its-ig@w3.org" <public-i18n-its-ig@w3.org>
- CC: Felix Sasaki <felix.sasaki@dfki.de>
- Message-ID: <8EA44C66E2911C4AB21558F4720695DC5E65D27A37@DEWDFECCR01.wdf.sap.corp>
Hi there, The first workshop of the W3C-coordinated Thematic Network "Multingual Web" (see http://www.multilingualweb.eu/documents/madrid-workshop/slides-video-irc-notes) revived some thoughts that have been nagging Felix and myself for some time. In particular, Felix' and my own talks (see http://www.w3.org/International/multilingualweb/madrid/slides/sasaki.pdf and http://www.w3.org/International/multilingualweb/madrid/slides/lieske.pdf) made us wonder, how the following might be related to forthcoming standards-based Natural Language Processing applications on the web: 1. W3C Internationalization Tag Set (ITS) 2. Standard "packaging" format (as one contribution for covering some of the 3 gaps Felix has mentioned) As you may remember, we have already been throwing out some ideas related to this (see http://www.localisation.ie/xliff/resources/presentations/2010-10-04_xliff-its-secret-marriage.pdf (slide 22 and 23). This time around, we got stuck at the insight that very often, we have two separate steps in between the original language content (e.g. a set of source XML files), and Natural Language Processing: 1. Preparation related to individual objects - this may for example relate to the insertion of local or global, "term"-related ITS markup 2. Preparation related to packages of objects - this may for example relate to packaging all translation-relevant objects into a container With this in mind, we arrive at two ideas related to standards and tools that we might be lacking for forthcoming standards-based Natural Language Processing on the web: 1. Something that could be called "Mark-Up Plug-in (MUP)" - This may for example be a plug-in for an Browser-based editor that allows for example authors to mark certain parts with "its:translate='no'" (this marking may result in local or global ITS markup). 2. Something that could be called "Standard Packing Format for Multilingual Processing (STAMP)" - This may for example be something akin to ePUB (one of the formats that is used in eReaders) 3. Something that could be called "Resource Annotation Workbench (RAW)" - This may for example be a special capability for an application like Rainbow (see http://okapi.opentag.com/applications.html#rainbow) , that allows the following: a. Create RDF-based metadata (embedded into the original files, or as additional, standalone/sidecar files) for objects that have to be processed b. Package the translatables, the supplementary files, and the aforementioned "sidecars" into a standardized NLP-processing format Any thoughts on this? Cheers, Christian (and Felix)
Received on Tuesday, 30 November 2010 08:26:03 UTC