Re: [ACTION-135] specialRequirements flesh out

Sounds like a good idea.

Phil.





From:   Felix Sasaki <fsasaki@w3.org>
To:     "Dr. David Filip" <David.Filip@ul.ie>, 
Cc:     Arle Lommel <arle.lommel@dfki.de>, Yves Savourel 
<ysavourel@enlaso.com>, "Pedro L. Díez Orzas" 
<pedro.diez@linguaserve.com>, "<public-multilingualweb-lt@w3.org>" 
<public-multilingualweb-lt@w3.org>, "Giuseppe Deriard [Linguaserve I.S. 
SA]" <giuseppe.deriard@linguaserve.com>
Date:   04/07/2012 09:40
Subject:        Re: [ACTION-135] specialRequirements flesh out



Hi David, all,

would it be OK to give one of the people who are in the XLIFF TC an action 
item to make sure that we don't re-invent the wheel, and, if possible, 
just have a reference to Fredrik Estreen's solution, without creating a 
new data category? Otherwise people creating XLIFF roundtripping scenarios 
will be unsure what to use - native XLIFF or a similar ITS data category.

Best,

Felix 

2012/7/3 Dr. David Filip <David.Filip@ul.ie>
Hi all, I believe that length restrictions are important metadata and 
importantly, one that should be preserved throughout the localization 
roundtrip ergo XLIFF roundtrip.

Fredrik Estreen, is currently working on a draft for this and there are 
chances that his solution will make it into core XLIFF 2.0.

It is more or less inline with Yves thinking that he posted in this 
thread. Basically we need to discern between display size and storage 
size. Storage size seems more basic as it can be easily calculated if you 
know encoding, so encoding might be a required attribute here.
The display size is more complicated and simply counting code points has 
limited usability if you come to think of it.
So the display limitation (if at all used) mechanism should be open to 
private extensions handling sophisticated display rules including area 
size and shape, fonts etc. (again this sort of extensibility will be 
specified in Fredrik's draft)

Regarding the banned characters. It seems an unrelated topic, but worth 
encoding nevertheless. as in many cases we should not prescribe what 
regexp machine people use. Prescribing implementation details is a 
discouraged standardization prectice. Instead the user should be able to 
specify which regexp machine they are using. While pearl might seem nice, 
ICU is kind of canonical implementation of a Unicode compliant regexp 
machine. So I would not really exlude either here and let people choose 
what they want to use..

Rgds
dF



Dr. David Filip
=======================
LRC | CNGL | LT-Web | CSIS
University of Limerick, Ireland
telephone: +353-6120-2781
cellphone: +353-86-0222-158 
facsimile: +353-6120-2734
mailto: david.filip@ul.ie



On Tue, Jul 3, 2012 at 7:33 AM, Arle Lommel <arle.lommel@dfki.de> wrote:
For what it’s worth, it seems that Perl5 regex enjoy broad acceptance and 
the syntax is more compact and easier to read that POSIX in come cases, so 
I would favor that one.

Arle

--
Arle Lommel
Berlin, Germany
Skype: arle_lommel
Phone (US): +1 707 709 8650

Sent from a mobile device. Please excuse any typos.

On Jul 3, 2012, at 8:24, Yves Savourel <ysavourel@enlaso.com> wrote:

> Hi Pedro, Giuseppe, all,
>
> Thanks for the details for this data category.
> Here are a few questions/notes:
>
> - For 'maxLengthChar' and 'maxlengthCharWord': I assume the unit is a 
Unicode code-point. Is that correct?
>
> - My understanding is that 'maxLengthChar' indicates the maximum size 
the text can have when serialized in its storage and 'maxlengthCharWord' 
is a maximum display size of sort. Is that correct? If that is the case 
'maxLengthCharWord' could be renamed something like 'maxDisplayLength' and 
'maxLengthChar' could be something like 'maxFieldSize' or 
'maxStorageSize'.
>
> - For 'charRestricted': I would suggest the value of this attribute to 
be a regular expression that matches the forbidden characters. We would 
have to specify what regular expression 'standard' should be used (POSIX, 
ICU, Java, Perl5, etc.)
>
> - For 'charRestricted': It may also be better to name this attribute 
something like 'allowedChars' (and reverse the regex value), as 
'restricted' is not very clear (it can be read as 'char restricted to' and 
a list of the only chars allowed.) Or call it 'forbiddenChars'.
>
> - while I see the relationship between restrictions of length and 
content, it seems those could be separate data categories. But I'm not 
sure if it's worth separating them either.
>
> Cheers,
> -yves
>
>
> From: Pedro L. Díez Orzas [mailto:pedro.diez@linguaserve.com]
> Sent: Friday, June 29, 2012 4:56 PM
> To: public-multilingualweb-lt@w3.org
> Cc: Giuseppe Deriard [Linguaserve I.S. SA]
> Subject: [ACTION-135] specialRequirements flesh out
>
> Hi all,
>
> Giuseppe sent me this about ACTION 135. Please, mind that the currently 
accepted “localizationNote” is a human readable info, while 
specialRequirements can be used by machines without human intervention. We 
see this data category as something quite “basic” and consequently 
necessary. Also, to confirm you that will provide already one 
implementation for specialRequirements in WP3, so we would need only 
another one.
>
> Here the specialRequirements flesh out.
>
> maxLengthChar
> Declare a limitation on the number of characters allowed in the field.
>
> maxLengthCharWord
> Declare a word length limitation. For example, the text display on a 
display panel with a maximum width of 30 characters.
>
> charRestricted
> Declare a ban on use of a character. For example: Do not use the single 
quote in the translated text, do not use “<” or ”>”
>
> <its:specialRequirements maxLengthChar="200" maxLengthCharWord="30" 
charRestricted="’">
> Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod 
tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim 
veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea 
commodo consequat. Duis aute irure dolor in reprehenderit in voluptate 
velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat 
cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id 
est laborum.
> </its:specialRequirements>
>
>
> <span its-specialRequirements="maxLengthChar:200; maxLengthCharWord:30 
charRestricted:’">
> Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod 
tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim 
veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea 
commodo consequat. Duis aute irure dolor in reprehenderit in voluptate 
velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat 
cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id 
est laborum.
> </span>
>
> Cheers,
>
> Giuseppe Deriard
> IT Director
> Linguaserve I.S. S.A.
> Tel.:    +34 91 761 64 60
> Mob.: +34 657 958 677
> www.linguaserve.com
> giuseppe.deriard@linguaserve.com
> es.linkedin.com/in/gderiard
> "According to the provisions set forth in articles 21 and 22 of Law 
34/2002 of July 11 regarding Information Society and eCommerce Services, 
we will store and use your personal data with the sole purpose of 
marketing the products and services offered by LINGUASERVE 
INTERNACIONALIZACIÓN DE SERVICIOS, S.A. If you do not wish your personal 
data to be stored and handled, or you do not wish to receive further 
information regarding products and services offered by our company, please 
e-mail us to clients@linguaserve.com. Your request will be processed 
immediately."
> ________________________________________
>
> Best,
> Pedro
>
>





-- 
Felix Sasaki
DFKI / W3C Fellow



************************************************************
This email and any files transmitted with it are confidential and
intended solely for the use of the individual or entity to whom they
are addressed. If you have received this email in error please notify
the sender immediately by e-mail.

www.vistatec.com
************************************************************

Received on Wednesday, 4 July 2012 09:59:41 UTC