- From: Felix Sasaki <fsasaki@w3.org>
- Date: Wed, 4 Jul 2012 10:39:56 +0200
- To: "Dr. David Filip" <David.Filip@ul.ie>
- Cc: Arle Lommel <arle.lommel@dfki.de>, Yves Savourel <ysavourel@enlaso.com>, "Pedro L. Díez Orzas" <pedro.diez@linguaserve.com>, "<public-multilingualweb-lt@w3.org>" <public-multilingualweb-lt@w3.org>, "Giuseppe Deriard [Linguaserve I.S. SA]" <giuseppe.deriard@linguaserve.com>
- Message-ID: <CAL58czrymz_JeDrjYD9n2Aw4XBfvwnwXk8Xr4fafxXBLQZ2JuA@mail.gmail.com>
Hi David, all, would it be OK to give one of the people who are in the XLIFF TC an action item to make sure that we don't re-invent the wheel, and, if possible, just have a reference to Fredrik Estreen's solution, without creating a new data category? Otherwise people creating XLIFF roundtripping scenarios will be unsure what to use - native XLIFF or a similar ITS data category. Best, Felix 2012/7/3 Dr. David Filip <David.Filip@ul.ie> > Hi all, I believe that length restrictions are important metadata and > importantly, one that should be preserved throughout the localization > roundtrip ergo XLIFF roundtrip. > > Fredrik Estreen, is currently working on a draft for this and there are > chances that his solution will make it into core XLIFF 2.0. > > It is more or less inline with Yves thinking that he posted in this > thread. Basically we need to discern between display size and storage size. > Storage size seems more basic as it can be easily calculated if you know > encoding, so encoding might be a required attribute here. > The display size is more complicated and simply counting code points has > limited usability if you come to think of it. > So the display limitation (if at all used) mechanism should be open to > private extensions handling sophisticated display rules including area size > and shape, fonts etc. (again this sort of extensibility will be specified > in Fredrik's draft) > > Regarding the banned characters. It seems an unrelated topic, but worth > encoding nevertheless. as in many cases we should not prescribe what regexp > machine people use. Prescribing implementation details is a discouraged > standardization prectice. Instead the user should be able to specify which > regexp machine they are using. While pearl might seem nice, ICU is kind of > canonical implementation of a Unicode compliant regexp machine. So I would > not really exlude either here and let people choose what they want to use.. > > Rgds > dF > > > > Dr. David Filip > ======================= > LRC | CNGL | LT-Web | CSIS > University of Limerick, Ireland > telephone: +353-6120-2781 > *cellphone: +353-86-0222-158* > facsimile: +353-6120-2734 > mailto: david.filip@ul.ie > > > > On Tue, Jul 3, 2012 at 7:33 AM, Arle Lommel <arle.lommel@dfki.de> wrote: > >> For what it’s worth, it seems that Perl5 regex enjoy broad acceptance and >> the syntax is more compact and easier to read that POSIX in come cases, so >> I would favor that one. >> >> Arle >> >> -- >> Arle Lommel >> Berlin, Germany >> Skype: arle_lommel >> Phone (US): +1 707 709 8650 >> >> Sent from a mobile device. Please excuse any typos. >> >> On Jul 3, 2012, at 8:24, Yves Savourel <ysavourel@enlaso.com> wrote: >> >> > Hi Pedro, Giuseppe, all, >> > >> > Thanks for the details for this data category. >> > Here are a few questions/notes: >> > >> > - For 'maxLengthChar' and 'maxlengthCharWord': I assume the unit is a >> Unicode code-point. Is that correct? >> > >> > - My understanding is that 'maxLengthChar' indicates the maximum size >> the text can have when serialized in its storage and 'maxlengthCharWord' is >> a maximum display size of sort. Is that correct? If that is the case >> 'maxLengthCharWord' could be renamed something like 'maxDisplayLength' and >> 'maxLengthChar' could be something like 'maxFieldSize' or 'maxStorageSize'. >> > >> > - For 'charRestricted': I would suggest the value of this attribute to >> be a regular expression that matches the forbidden characters. We would >> have to specify what regular expression 'standard' should be used (POSIX, >> ICU, Java, Perl5, etc.) >> > >> > - For 'charRestricted': It may also be better to name this attribute >> something like 'allowedChars' (and reverse the regex value), as >> 'restricted' is not very clear (it can be read as 'char restricted to' and >> a list of the only chars allowed.) Or call it 'forbiddenChars'. >> > >> > - while I see the relationship between restrictions of length and >> content, it seems those could be separate data categories. But I'm not sure >> if it's worth separating them either. >> > >> > Cheers, >> > -yves >> > >> > >> > From: Pedro L. Díez Orzas [mailto:pedro.diez@linguaserve.com] >> > Sent: Friday, June 29, 2012 4:56 PM >> > To: public-multilingualweb-lt@w3.org >> > Cc: Giuseppe Deriard [Linguaserve I.S. SA] >> > Subject: [ACTION-135] specialRequirements flesh out >> > >> > Hi all, >> > >> > Giuseppe sent me this about ACTION 135. Please, mind that the currently >> accepted “localizationNote” is a human readable info, while >> specialRequirements can be used by machines without human intervention. We >> see this data category as something quite “basic” and consequently >> necessary. Also, to confirm you that will provide already one >> implementation for specialRequirements in WP3, so we would need only >> another one. >> > >> > Here the specialRequirements flesh out. >> > >> > maxLengthChar >> > Declare a limitation on the number of characters allowed in the field. >> > >> > maxLengthCharWord >> > Declare a word length limitation. For example, the text display on a >> display panel with a maximum width of 30 characters. >> > >> > charRestricted >> > Declare a ban on use of a character. For example: Do not use the single >> quote in the translated text, do not use “<” or ”>” >> > >> > <its:specialRequirements maxLengthChar="200" maxLengthCharWord="30" >> charRestricted="’"> >> > Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do >> eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad >> minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex >> ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate >> velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat >> cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id >> est laborum. >> > </its:specialRequirements> >> > >> > >> > <span its-specialRequirements="maxLengthChar:200; maxLengthCharWord:30 >> charRestricted:’"> >> > Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do >> eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad >> minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex >> ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate >> velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat >> cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id >> est laborum. >> > </span> >> > >> > Cheers, >> > >> > Giuseppe Deriard >> > IT Director >> > Linguaserve I.S. S.A. >> > Tel.: +34 91 761 64 60 >> > Mob.: +34 657 958 677 >> > www.linguaserve.com >> > giuseppe.deriard@linguaserve.com >> > es.linkedin.com/in/gderiard >> > "According to the provisions set forth in articles 21 and 22 of Law >> 34/2002 of July 11 regarding Information Society and eCommerce Services, we >> will store and use your personal data with the sole purpose of marketing >> the products and services offered by LINGUASERVE INTERNACIONALIZACIÓN DE >> SERVICIOS, S.A. If you do not wish your personal data to be stored and >> handled, or you do not wish to receive further information regarding >> products and services offered by our company, please e-mail us to >> clients@linguaserve.com. Your request will be processed immediately." >> > ________________________________________ >> > >> > Best, >> > Pedro >> > >> > >> >> > -- Felix Sasaki DFKI / W3C Fellow
Received on Wednesday, 4 July 2012 08:40:25 UTC