Re: issue-67 (Re: Comment on ITS 2.0 specification WD) from Felix Sasaki on 2013-01-06 (public-multilingualweb-lt-comments@w3.org from January 2013)

From: Felix Sasaki <fsasaki@w3.org>
Date: Sun, 06 Jan 2013 16:40:42 +0100
To: public-multilingualweb-lt-comments@w3.org
Message-ID: <50E99AFA.7020309@w3.org>
Hi Yves, Jörg, all,

co-chair hat on:
Yves, you write below that you don't use an XSD engine. I assume the 
same for Karl, and for Philip I assume that he is just wrapping Okapi. 
So all three implementers at
https://docs.google.com/spreadsheet/ccc?key=0AgIk0-aoSKOadG5HQmJDT2EybWVvVC1VbnF5alN2S3c#gid=0
do not fully conform to
http://www.w3.org/TR/2012/WD-its20-20121206/#allowedchars
this gives us three options:

1) declare the "allowed characters" data category as "at risk" and 
remove it after the testing period (end of "candidate recommendation", 
see sec. 2.2 at
http://www.w3.org/2011/12/mlw-lt-charter.html )
2) change the definition of allowed characters so that it follows the 
simple regex, with the subset Yves' described at
http://lists.w3.org/Archives/Public/public-multilingualweb-lt/2012Aug/0296.html
3) confince the implementers to use xsd enginges.

I don't see a majority in the group for 1) dropping allowed characters. 
2) would very likely mean a substantive change, that is another last 
call period. It would also mean that we need tests (positive and 
negative) for the regex subset. 3) would be a burden on implementers, 
but would not mean new tests: we can defer that to XML Schema, like we 
don't provide tests for XPath.

So no matter what we do, we cannot just reject the comment: some action 
is required. 1) would be needed to avoid the "SRX" situation.

co-chair hat off: I would not underestimate the burden of 2) creating 
tests for our "own" regex syntax. Without such tests very likely 
creators of "allowed characters" regex' would just do what they want, 
and sometimes the regex would work, sometimes not. As Jirka said at
http://lists.w3.org/Archives/Public/public-multilingualweb-lt/2012Aug/0298.html
The "use XSD" approach puts a burden on implementers (for sure), but it 
has a benefit for users and interoperability. I won't object going the 
path of 2), but we would need volunteers to do testing (including 
providing the tests) and would have the risk to go through another last 
call (3 weeks delay).

co-chair hat on again: so far we don't have a single comment from 
outside the working group. This is a separate topic, but getting out of 
the last call period without feedback from outside will be very hard 
too. So tomorrow we need to start an initiatve to nudge many, many 
people about making comments.

Best,

Felix

Am 04.01.13 14:45, schrieb Jörg Schütz:
> Hi Yves, Felix, and all,
>
> Now I have browsed through the discussions related to action-189, and 
> indeed the new issue 67 will entirely repeat this previous discussion 
> and it doesn't add any new evidence to the apparently still unresolved 
> concerns.
>
> Given the description of, and in particular the examples, in the 
> "Allowed Characters" section of the specification, I would claim that 
> most regex engines could be used with the given syntax and the 
> intended use cases if we take so-called text-directed engines (aka DFAs).
>
> Therefore, what about simply adding a short paragraph on "processing 
> expectations"? Otherwise, I fear that we will also discuss the 
> extensions or flavors available in XQuery and XPath on top of the XSD 
> regexes... ;-)
>
> Thanks and cheers -- Jörg
>
> On Jan 04, 2013, at 14:14 (UTC+1), Yves Savourel wrote:
>> Hi Felix, all,
>>
>>> I *think* in the discussion around action-189, all the arguments you
>>> give below were discussed; Jirka and partially I gave counterarguments,
>>> e.g. a counterargument to your SRX example at
>>> http://lists.w3.org/Archives/Public/public-multilingualweb-lt/2012Aug/0302.html 
>>>
>>> saying that each XML Schema 1.0 or 1.1 implementation correctly 
>>> implements
>>> the regular expressions we refer to - so no danger for an "SRX 
>>> situation" here.
>>> What is the "new evidence" in your argumentation below? I'm just 
>>> trying to
>>> avoid repetition for whose who followed the previous discussion.
>>
>> I have no new evidence. In my opinion the old ones are still valid: 
>> The RE engines for Java and many other programming languages do not 
>> support the same syntax as XSD RE. There are therefore two options 
>> for the implementers:
>>
>> a) Use a RE engine that supports XSD RE: There is at least one in 
>> Java, but as an implementer I don't want to have a dependency on yet 
>> another component just to support rarely used features when the same 
>> functionality could be obtain with the runtime RE engine if a simple 
>> sub-set was the prescribed syntax.
>>
>> b) Create a schema on-the-fly with RE restrictions for the content 
>> parts that need to be verified and use an XSD validator to verify 
>> those snippets of data. Such solution would complicated to implement 
>> and far from efficient.
>>
>>
>>> so no danger for an "SRX situation" here.
>>
>> Actually we already have the situation: We've implemented Allowed 
>> Characters (the use of the RE not just its parsing) in Okapi and we 
>> do not use an XSD RE engine. It's a not-completely-conformant 
>> implementation and it will stay that way except if there are 
>> overwhelming reasons for that component to have a dependency on an 
>> XSD RE engine at some point (very unlikely), or if Java suddenly 
>> starts to support that syntax.
>>
>> The main argument is that, while the ITS markup lives in the 
>> HTML5/XML world, the consumers of the data categories may have no 
>> relations with the XML technology stack. For every other data 
>> categories the information we pass to the consumers are simple: 
>> number, labels, identifiers: all highly interoperable. Except for 
>> Allowed Characters where the information a regular expression. I 
>> simply would like to have a RE syntax that is common to most RE 
>> engines, not specific to one.
>>
>> My question is why do you want to make the implementers life more 
>> difficult when you can have the same functionality with a more 
>> interoperable solution?
>>
>> Cheers,
>> -yves
>>
>
Received on Sunday, 6 January 2013 15:41:07 UTC