AW:Re: Re: [ACTION-160] (related to [ACTION-135] too) Summarize specialRequirements from Michael Kruppa on 2012-07-10 (public-multilingualweb-lt@w3.org from July 2012)

From: Michael Kruppa <Michael.Kruppa@cocomore.com>
Date: Tue, 10 Jul 2012 16:16:08 +0000
To: "fsasaki@w3.org" <fsasaki@w3.org>, "ysavourel@enlaso.com" <ysavourel@enlaso.com>
CC: "public-multilingualweb-lt@w3.org" <public-multilingualweb-lt@w3.org>, "fredrik.estreen@lionbridge.com" <fredrik.estreen@lionbridge.com>
Message-ID: <dbxsha129xnyj45tuxarjkln.1341936965936@email.android.com>

Hi Felix,all,

from our rather technical point of view, the forbidden characters are highly relevant if not to say absolutely necessary in order to avoid certain problems. If we can not agree on an approach based on regular expressions due to the inherent complexity, we would definately opt for a solution that would at least allow us to enumerate forbidden characters (using unicode pointers as you suggested).

For us, the regex solution would be of potential interest, but the simple enumeration approach would suffice for the current purpose we have in mind.

Best

Micha

Von Samsung-Tablet gesendet

Felix Sasaki <fsasaki@w3.org> hat geschrieben:
Hi Yves, all,

2012/7/10 Yves Savourel <ysavourel@enlaso.com<mailto:ysavourel@enlaso.com>>
Hi Felix,

In the case of forbiddenChars I think the matter of which regex syntax to use can be solved by either:

a) Selecting a single syntax (maybe the one of XSD like Shaun noted). But I think the data will be validated outside of XML most of the time, using XSD’s may not be a good idea.

I agree, but which one to choose? Won't we just postpone the interop issue?

b) Having an extra attribute to specify which syntax is used (like Giuseppe did in his latest proposal)

Mmm ... but what identifiers you use for the syntax? There is no stable identifier for regex syntaxes. If we invent our own, that again may lead to the "charclass" path that we don't want to go ...

c) Defining the sub-set of regex expressions that can be used, and make sure it’s compatible across most regex engines. That’s I think the simplest and more interoperable solution. The drawback is that someone has to take the time to define that list once.

Indeed. Or, if the main use case is to have characters, we say that this is a list of disallowed unicode code points, nothing more. That doesn't have the power of regex', but the code points are stable after all.

Received on Tuesday, 10 July 2012 16:16:49 UTC