Re: Re: Re: [ACTION-160] (related to [ACTION-135] too) Summarize specialRequirements from Felix Sasaki on 2012-07-10 (public-multilingualweb-lt@w3.org from July 2012)

From: Felix Sasaki <fsasaki@w3.org>
Date: Tue, 10 Jul 2012 18:45:31 +0200
To: Michael Kruppa <Michael.Kruppa@cocomore.com>
Cc: "ysavourel@enlaso.com" <ysavourel@enlaso.com>, "public-multilingualweb-lt@w3.org" <public-multilingualweb-lt@w3.org>, "fredrik.estreen@lionbridge.com" <fredrik.estreen@lionbridge.com>
Message-ID: <CAL58czoi7P_x2EgSZij_0PauoJcdrt6cg3=J9CbdQ0hADY46xA@mail.gmail.com>

2012/7/10 Michael Kruppa <Michael.Kruppa@cocomore.com>

>  Hi Felix,all,
>
>  from our rather technical point of view, the forbidden characters are
> highly relevant if not to say absolutely necessary in order to avoid
> certain problems. If we can not agree on an approach based on regular
> expressions due to the inherent complexity, we would definately opt for a
> solution that would at least allow us to enumerate forbidden characters
> (using unicode pointers as you suggested).
>
>  For us, the regex solution would be of potential interest, but the
> simple enumeration approach would suffice for the current purpose we have
> in mind.
>


Great, that would resolve most of my concerns - again, only the
coordination with the solution developed within other groups is important -
at least XLIFF.

Best,

Felix


>
>  Best
>
>  Micha
>
>
>
>
> Von Samsung-Tablet gesendet
>
> Felix Sasaki <fsasaki@w3.org> hat geschrieben:
>    Hi Yves, all,
>
> 2012/7/10 Yves Savourel <ysavourel@enlaso.com>
>
>> Hi Felix,
>>
>> In the case of forbiddenChars I think the matter of which regex syntax to
>> use can be solved by either:
>>
>> a) Selecting a single syntax (maybe the one of XSD like Shaun noted). But
>> I think the data will be validated outside of XML most of the time, using
>> XSD’s may not be a good idea.
>>
>
>  I agree, but which one to choose? Won't we just postpone the interop
> issue?
>
>
>>
>> b) Having an extra attribute to specify which syntax is used (like
>> Giuseppe did in his latest proposal)
>>
>
>
>  Mmm ... but what identifiers you use for the syntax? There is no stable
> identifier for regex syntaxes. If we invent our own, that again may lead to
> the "charclass" path that we don't want to go ...
>
>
>>
>> c) Defining the sub-set of regex expressions that can be used, and make
>> sure it’s compatible across most regex engines. That’s I think the simplest
>> and more interoperable solution. The drawback is that someone has to take
>> the time to define that list once.
>>
>
>
>  Indeed. Or, if the main use case is to have characters, we say that this
> is a list of disallowed unicode code points, nothing more. That doesn't
> have the power of regex', but the code points are stable after all.
>
>
>



-- 
Felix Sasaki
DFKI / W3C Fellow

Received on Tuesday, 10 July 2012 16:45:56 UTC