- From: Felix Sasaki <fsasaki@w3.org>
- Date: Tue, 10 Jul 2012 18:45:31 +0200
- To: Michael Kruppa <Michael.Kruppa@cocomore.com>
- Cc: "ysavourel@enlaso.com" <ysavourel@enlaso.com>, "public-multilingualweb-lt@w3.org" <public-multilingualweb-lt@w3.org>, "fredrik.estreen@lionbridge.com" <fredrik.estreen@lionbridge.com>
- Message-ID: <CAL58czoi7P_x2EgSZij_0PauoJcdrt6cg3=J9CbdQ0hADY46xA@mail.gmail.com>
2012/7/10 Michael Kruppa <Michael.Kruppa@cocomore.com> > Hi Felix,all, > > from our rather technical point of view, the forbidden characters are > highly relevant if not to say absolutely necessary in order to avoid > certain problems. If we can not agree on an approach based on regular > expressions due to the inherent complexity, we would definately opt for a > solution that would at least allow us to enumerate forbidden characters > (using unicode pointers as you suggested). > > For us, the regex solution would be of potential interest, but the > simple enumeration approach would suffice for the current purpose we have > in mind. > Great, that would resolve most of my concerns - again, only the coordination with the solution developed within other groups is important - at least XLIFF. Best, Felix > > Best > > Micha > > > > > Von Samsung-Tablet gesendet > > Felix Sasaki <fsasaki@w3.org> hat geschrieben: > Hi Yves, all, > > 2012/7/10 Yves Savourel <ysavourel@enlaso.com> > >> Hi Felix, >> >> In the case of forbiddenChars I think the matter of which regex syntax to >> use can be solved by either: >> >> a) Selecting a single syntax (maybe the one of XSD like Shaun noted). But >> I think the data will be validated outside of XML most of the time, using >> XSD’s may not be a good idea. >> > > I agree, but which one to choose? Won't we just postpone the interop > issue? > > >> >> b) Having an extra attribute to specify which syntax is used (like >> Giuseppe did in his latest proposal) >> > > > Mmm ... but what identifiers you use for the syntax? There is no stable > identifier for regex syntaxes. If we invent our own, that again may lead to > the "charclass" path that we don't want to go ... > > >> >> c) Defining the sub-set of regex expressions that can be used, and make >> sure it’s compatible across most regex engines. That’s I think the simplest >> and more interoperable solution. The drawback is that someone has to take >> the time to define that list once. >> > > > Indeed. Or, if the main use case is to have characters, we say that this > is a list of disallowed unicode code points, nothing more. That doesn't > have the power of regex', but the code points are stable after all. > > > -- Felix Sasaki DFKI / W3C Fellow
Received on Tuesday, 10 July 2012 16:45:56 UTC