W3C home > Mailing lists > Public > public-multilingualweb-lt@w3.org > July 2012

Re: [ACTION-135] specialRequirements flesh out

From: Felix Sasaki <fsasaki@w3.org>
Date: Wed, 4 Jul 2012 10:39:56 +0200
Message-ID: <CAL58czrymz_JeDrjYD9n2Aw4XBfvwnwXk8Xr4fafxXBLQZ2JuA@mail.gmail.com>
To: "Dr. David Filip" <David.Filip@ul.ie>
Cc: Arle Lommel <arle.lommel@dfki.de>, Yves Savourel <ysavourel@enlaso.com>, "Pedro L. Díez Orzas" <pedro.diez@linguaserve.com>, "<public-multilingualweb-lt@w3.org>" <public-multilingualweb-lt@w3.org>, "Giuseppe Deriard [Linguaserve I.S. SA]" <giuseppe.deriard@linguaserve.com>
Hi David, all,

would it be OK to give one of the people who are in the XLIFF TC an action
item to make sure that we don't re-invent the wheel, and, if possible, just
have a reference to Fredrik Estreen's solution, without creating a new data
category? Otherwise people creating XLIFF roundtripping scenarios will be
unsure what to use - native XLIFF or a similar ITS data category.

Best,

Felix

2012/7/3 Dr. David Filip <David.Filip@ul.ie>

> Hi all, I believe that length restrictions are important metadata and
> importantly, one that should be preserved throughout the localization
> roundtrip ergo XLIFF roundtrip.
>
> Fredrik Estreen, is currently working on a draft for this and there are
> chances that his solution will make it into core XLIFF 2.0.
>
> It is more or less inline with Yves thinking that he posted in this
> thread. Basically we need to discern between display size and storage size.
> Storage size seems more basic as it can be easily calculated if you know
> encoding, so encoding might be a required attribute here.
> The display size is more complicated and simply counting code points has
> limited usability if you come to think of it.
> So the display limitation (if at all used) mechanism should be open to
> private extensions handling sophisticated display rules including area size
> and shape, fonts etc. (again this sort of extensibility will be specified
> in Fredrik's draft)
>
> Regarding the banned characters. It seems an unrelated topic, but worth
> encoding nevertheless. as in many cases we should not prescribe what regexp
> machine people use. Prescribing implementation details is a discouraged
> standardization prectice. Instead the user should be able to specify which
> regexp machine they are using. While pearl might seem nice, ICU is kind of
> canonical implementation of a Unicode compliant regexp machine. So I would
> not really exlude either here and let people choose what they want to use..
>
> Rgds
> dF
>
>
>
> Dr. David Filip
> =======================
> LRC | CNGL | LT-Web | CSIS
> University of Limerick, Ireland
> telephone: +353-6120-2781
> *cellphone: +353-86-0222-158*
> facsimile: +353-6120-2734
> mailto: david.filip@ul.ie
>
>
>
> On Tue, Jul 3, 2012 at 7:33 AM, Arle Lommel <arle.lommel@dfki.de> wrote:
>
>> For what it’s worth, it seems that Perl5 regex enjoy broad acceptance and
>> the syntax is more compact and easier to read that POSIX in come cases, so
>> I would favor that one.
>>
>> Arle
>>
>> --
>> Arle Lommel
>> Berlin, Germany
>> Skype: arle_lommel
>> Phone (US): +1 707 709 8650
>>
>> Sent from a mobile device. Please excuse any typos.
>>
>> On Jul 3, 2012, at 8:24, Yves Savourel <ysavourel@enlaso.com> wrote:
>>
>> > Hi Pedro, Giuseppe, all,
>> >
>> > Thanks for the details for this data category.
>> > Here are a few questions/notes:
>> >
>> > - For 'maxLengthChar' and 'maxlengthCharWord': I assume the unit is a
>> Unicode code-point. Is that correct?
>> >
>> > - My understanding is that 'maxLengthChar' indicates the maximum size
>> the text can have when serialized in its storage and 'maxlengthCharWord' is
>> a maximum display size of sort. Is that correct? If that is the case
>> 'maxLengthCharWord' could be renamed something like 'maxDisplayLength' and
>> 'maxLengthChar' could be something like 'maxFieldSize' or 'maxStorageSize'.
>> >
>> > - For 'charRestricted': I would suggest the value of this attribute to
>> be a regular expression that matches the forbidden characters. We would
>> have to specify what regular expression 'standard' should be used (POSIX,
>> ICU, Java, Perl5, etc.)
>> >
>> > - For 'charRestricted': It may also be better to name this attribute
>> something like 'allowedChars' (and reverse the regex value), as
>> 'restricted' is not very clear (it can be read as 'char restricted to' and
>> a list of the only chars allowed.) Or call it 'forbiddenChars'.
>> >
>> > - while I see the relationship between restrictions of length and
>> content, it seems those could be separate data categories. But I'm not sure
>> if it's worth separating them either.
>> >
>> > Cheers,
>> > -yves
>> >
>> >
>> > From: Pedro L. Díez Orzas [mailto:pedro.diez@linguaserve.com]
>> > Sent: Friday, June 29, 2012 4:56 PM
>> > To: public-multilingualweb-lt@w3.org
>> > Cc: Giuseppe Deriard [Linguaserve I.S. SA]
>> > Subject: [ACTION-135] specialRequirements flesh out
>> >
>> > Hi all,
>> >
>> > Giuseppe sent me this about ACTION 135. Please, mind that the currently
>> accepted “localizationNote” is a human readable info, while
>> specialRequirements can be used by machines without human intervention. We
>> see this data category as something quite “basic” and consequently
>> necessary. Also, to confirm you that will provide already one
>> implementation for specialRequirements in WP3, so we would need only
>> another one.
>> >
>> > Here the specialRequirements flesh out.
>> >
>> > maxLengthChar
>> > Declare a limitation on the number of characters allowed in the field.
>> >
>> > maxLengthCharWord
>> > Declare a word length limitation. For example, the text display on a
>> display panel with a maximum width of 30 characters.
>> >
>> > charRestricted
>> > Declare a ban on use of a character. For example: Do not use the single
>> quote in the translated text, do not use “<” or ”>”
>> >
>> > <its:specialRequirements maxLengthChar="200" maxLengthCharWord="30"
>> charRestricted="’">
>> > Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do
>> eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad
>> minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex
>> ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate
>> velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat
>> cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id
>> est laborum.
>> > </its:specialRequirements>
>> >
>> >
>> > <span its-specialRequirements="maxLengthChar:200; maxLengthCharWord:30
>> charRestricted:’">
>> > Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do
>> eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad
>> minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex
>> ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate
>> velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat
>> cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id
>> est laborum.
>> > </span>
>> >
>> > Cheers,
>> >
>> > Giuseppe Deriard
>> > IT Director
>> > Linguaserve I.S. S.A.
>> > Tel.:    +34 91 761 64 60
>> > Mob.: +34 657 958 677
>> > www.linguaserve.com
>> > giuseppe.deriard@linguaserve.com
>> > es.linkedin.com/in/gderiard
>> > "According to the provisions set forth in articles 21 and 22 of Law
>> 34/2002 of July 11 regarding Information Society and eCommerce Services, we
>> will store and use your personal data with the sole purpose of marketing
>> the products and services offered by LINGUASERVE INTERNACIONALIZACIÓN DE
>> SERVICIOS, S.A. If you do not wish your personal data to be stored and
>> handled, or you do not wish to receive further information regarding
>> products and services offered by our company, please e-mail us to
>> clients@linguaserve.com. Your request will be processed immediately."
>> > ________________________________________
>> >
>> > Best,
>> > Pedro
>> >
>> >
>>
>>
>


-- 
Felix Sasaki
DFKI / W3C Fellow
Received on Wednesday, 4 July 2012 08:40:25 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 16:31:47 UTC