- From: Pablo Nieto Caride <pablo.nieto@linguaserve.com>
- Date: Thu, 4 Apr 2013 17:12:17 +0200
- To: <public-multilingualweb-lt@w3.org>
- Message-ID: <05a401ce3146$cb8d9b50$62a8d1f0$@linguaserve.com>
Hi all, I made some headway on action 385. Just to summarise: Yves raised an Issue (https://www.w3.org/International/multilingualweb/lt/track/issues/67) on Allowed Characters (http://www.w3.org/International/multilingualweb/lt/drafts/its20/its20.html# allowedchars) stating that using the XML Schema Character Class regular expression syntax reduces interoperability (http://lists.w3.org/Archives/Public/public-multilingualweb-lt-comments/2013 Jan/0000.html). Shaun did a research for Action-385 (https://www.w3.org/International/multilingualweb/lt/track/actions/385) and came up with a small sub-set of common regular expressions supported by most of the engines (http://lists.w3.org/Archives/Public/public-multilingualweb-lt/2013Jan/0180. html) which is more or less the one Yves suggested before: 1. character classes [abc] [a-zA-Z_\-] 2. ranges [a-c] 3. negations [^abc] 4. "^" and "]" must never appear unless backslash-escaped 5. "-" may be backslash-escaped 6. escape sequences "\n", "\r", "\t", "\d", and "\D" 7. literal "\" is escaped as "\\" Subsequently he developed a regex which Felix corrected, that is: ^(\.|\[\^?-?(([	

 -,.-[_-퟿&# xE000;-�𐀀-]|\\n|\\r|\\t|\\]|\\^|\\-|\\\\)(-([	 

 -,.-[_-퟿-�Ā 00;-]|\\n|\\r|\\t|\\]|\\^|\\-|\\\\))?)+-?\])?$ but it doesn’t seem to work. Since I don’t quite understand the regex structure chosen by Shaun, I took the liberty of adapting it a little bit, I think that now is simpler and It supports the sub-sets, character classes, ranges, negations, etc… plus greedy and lazy operators (which I can drop if they’re no needed, but I believe that most of engines use them and are usually helpful) but I’m still working on it because it needs more work, for instance points 4, 5 and 7 work but without limitations. So just like Shaun did: Here is the proposed regular expression escaped with XML numeric character entities, as if it were put into an XML document: ^(\.(\*|\+)?\??|\[\^?(([	

 -,.-[_-& #xD7FF;-�𐀀-])*-?([	

 - ,.-[_-퟿-�𐀀-])+)+ -?\])$ And here is a regular expression that matches a subset of our subset, limited to Plane 1, with the \u escape (I tested it with PHP and JavaScript and It works): ^(\.(\*|\+)?\??|\[\^?(([\u0009\u000A\u000D\u0020-\u002C\u002E-\u005B\u005F-\ uD7FF\uE000-\uFFFD\u10000-\u10FFFF])*-?([\u0009\u000A\u000D\u0020-\u002C\u00 2E-\u005B\u005F-\uD7FF\uE000-\uFFFD\u10000-\u10FFFF])+)+-?\])$ Please, implementers and whoever that is interested, give feedback if necessary so I can move forward and evolve the regex. Cheers, __________________________________ Pablo Nieto Caride Dpto. Técnico/I+D+i Linguaserve Internacionalización de Servicios, S.A. Tel.: +34 91 761 64 60 ext. 0422 Fax: +34 91 542 89 28 E-mail: <mailto:pablo.nieto@linguaserve.com> pablo.nieto@linguaserve.com <http://www.linguaserve.com/> www.linguaserve.com «En cumplimiento con lo previsto con los artículos 21 y 22 de la Ley 34/2002, de 11 de julio, de Servicios de la Sociedad de Información y Comercio Electrónico, le informamos que procederemos al archivo y tratamiento de sus datos exclusivamente con fines de promoción de los productos y servicios ofrecidos por LINGUASERVE INTERNACIONALIZACIÓN DE SERVICIOS, S.A. En caso de que Vdes. no deseen que procedamos al archivo y tratamiento de los datos proporcionados, o no deseen recibir comunicaciones comerciales sobre los productos y servicios ofrecidos, comuníquenoslo a clients@linguaserve.com, y su petición será inmediatamente cumplida.» "According to the provisions set forth in articles 21 and 22 of Law 34/2002 of July 11 regarding Information Society and eCommerce Services, we will store and use your personal data with the sole purpose of marketing the products and services offered by LINGUASERVE INTERNACIONALIZACIÓN DE SERVICIOS, S.A. If you do not wish your personal data to be stored and handled, or you do not wish to receive further information regarding products and services offered by our company, please e-mail us to clients@linguaserve.com. Your request will be processed immediately.” __________________________________
Received on Thursday, 4 April 2013 15:12:53 UTC