Re: Re: [ACTION-160] (related to [ACTION-135] too) Summarize specialRequirements from Felix Sasaki on 2012-07-10 (public-multilingualweb-lt@w3.org from July 2012)

From: Felix Sasaki <fsasaki@w3.org>
Date: Tue, 10 Jul 2012 17:36:36 +0200
To: Yves Savourel <ysavourel@enlaso.com>
Cc: Michael Kruppa <Michael.Kruppa@cocomore.com>, public-multilingualweb-lt@w3.org, fredrik.estreen@lionbridge.com
Message-ID: <CAL58czrBziX914one6mWkeE11yoqZbwP4wb+_p2iaRZp3-zrvw@mail.gmail.com>

Hi Yves, all,

2012/7/10 Yves Savourel <ysavourel@enlaso.com>

> Hi Felix,
>
> In the case of forbiddenChars I think the matter of which regex syntax to
> use can be solved by either:
>
> a) Selecting a single syntax (maybe the one of XSD like Shaun noted). But
> I think the data will be validated outside of XML most of the time, using
> XSD’s may not be a good idea.
>

I agree, but which one to choose? Won't we just postpone the interop issue?


>
> b) Having an extra attribute to specify which syntax is used (like
> Giuseppe did in his latest proposal)
>


Mmm ... but what identifiers you use for the syntax? There is no stable
identifier for regex syntaxes. If we invent our own, that again may lead to
the "charclass" path that we don't want to go ...


>
> c) Defining the sub-set of regex expressions that can be used, and make
> sure it’s compatible across most regex engines. That’s I think the simplest
> and more interoperable solution. The drawback is that someone has to take
> the time to define that list once.
>


Indeed. Or, if the main use case is to have characters, we say that this is
a list of disallowed unicode code points, nothing more. That doesn't have
the power of regex', but the code points are stable after all.



>
>
> > I understand your argument about what currently you
> > do the checking later in the chain. But what about
> > doing the checking earlier, with schematron, and then
> > just passing the results to the application(s)?
>
> That make little sense to me: that type of checks is best done
> interactively, or at some point when the translation can be fixed without
> having consequences in the process, so either when translating, or just
> after. But not when the document is back in its original format.
>

OK, I see.


>
> I have nothing against Schematron, but if the original data is a database
> and the tool used by the translator doesn't know anything about XML, why
> force using an XML for this? Sure XML is involved in the process, but just
> as an intermediary.
>
>
> > Another benefit is that you can get content creators
> > at least from the XML realm to provide and use
> > these schematron files.
>
> For me, I'd rather have them spent time making sure the constraints are
> set and can be passed to the translation process in a standardized way, so
> the checks can be done where it's best and by whatever tool the process
> owner deems the best.
>
> Besides, even when using Schematron, having a single parameterized script
> that use the ITS data as input would be better than hard-coding the checks
> in the script. Write once and re-use it forever.
>

If we use forbidden characters in the "regex" approach, I doubt that
"re-use forever" would work, see above.

But let me try to summarize the consensus we have reached so far:, using
your summary at
http://lists.w3.org/Archives/Public/public-multilingualweb-lt/2012Jul/0066.html
As an input: I think we agree on that we can move forward with
"storageSize", if there is compatibility with the to be developed XLIFF
approach. For "display length" and "forbidden characters" we'd need the
same compatibility, and more mature proposals I think. With regards to the
XLIFF compatibility, it would be helpful to understand the timeline on the
XLIFF side.

Best,

Felix




>
> Cheers,
> -yves
>
>
>


-- 
Felix Sasaki
DFKI / W3C Fellow

Received on Tuesday, 10 July 2012 15:37:12 UTC