W3C home > Mailing lists > Public > public-multilingualweb-lt@w3.org > July 2012

Re: Re: [ACTION-160] (related to [ACTION-135] too) Summarize specialRequirements

From: Felix Sasaki <fsasaki@w3.org>
Date: Tue, 10 Jul 2012 10:46:34 +0200
Message-ID: <CAL58czrJ_Cnn6sB0pzq0bZ6=weOYDxVkgZzBswRyhi2sybQpsQ@mail.gmail.com>
To: Yves Savourel <ysavourel@enlaso.com>
Cc: Michael Kruppa <Michael.Kruppa@cocomore.com>, public-multilingualweb-lt@w3.org, fredrik.estreen@lionbridge.com
Hi Yves, all,

thanks a lot for the explanation about charclass, that is very helpful.

I am a bit worried that with "forbidden characters" we might end up with
similar problems: each tool later in the chain can use its own regular
expression mechanism.

This again brings me to ... schematron: if you have schematron with XPath
2.0 (e.g. a processor like saxon), the checking of "only katakana
characters" is very simple, see
http://www.xmlschemareference.com/regularExpression.html
\P{Katakana}
since "Katakana" is a fixed code block in Unicode.

I understand your argument about what currently you do the checking later
in the chain. But what about doing the checking earlier, with schematron,
and then just passing the results to the application(s)? That is, using my
example from before

<schema xmlns="http://purl.oclc.org/dsdl/schematron">

   <pattern>

      <title>Display length</title>

      <rule context="gui">

         <assert test="string-length() &lt; 35">The length of the string in

the "gui" element should be less than 35 characters.</assert>

      </rule>

   </pattern>

</schema>

You would do the checking on the original content, and then just pass the
error message "The length of the string in

the "gui" element should be less than 35 characters." to your application.

The benefit of this approach is that there will be no interoperability
problems about about "what regex syntax has been used" etc. Another benefit
is that you can get content creators at least from the XML realm to provide
and use these schematron files. To accomodate the "testing later" scenario,
that of course requires re-serialization of the content. But you have to do
that anyway at some point.

I think that approach would be a higher implementation effort for you,
Pedro and others since you'd have to implement the "passing schematron
output through the pipeline" part. That's probably more difficult than
passing the various pieces of metadata that we are discussing here.
Nevertheless these pieces might lead to more interoperability problems than
we can think of now.

Best,

Felix

2012/7/9 Yves Savourel <ysavourel@enlaso.com>

> Hi Felix, all,
>
> XLIFF 1.2’s charclass is a typical example where XLIFF made the mistake of
> defining an attribute without defining a clear set of value (or an open-end
> set).
>
> There is no way work really work transparently with this property.
>
> See https://lists.oasis-open.org/archives/xliff/200203/msg00014.html for
> a tentative set of values, and
> https://lists.oasis-open.org/archives/xliff/200204/msg00019.html . That's
> more than 10 year ago...
>
> I'm not sure the CSS-based value idea was a good one. A regex sounds a lot
> better now. But it never made it to the specification anyway.
>
> So would not worry too much about charclass in 1.2. The intent is
> certainly a class of chars rather than a regex. Then we can use an ITS
> attribute for a regex. Since nothing is defined we can't be breaking any
> interoperability.
>
> We do have to make sure 2.0 is interoperable though.
>
> Pedro, Arle: charclass is a good example of why I don't like
> 'user-defined' lists and open-ended values :)
>
> cheers,
> -ys
>
>
> From: Felix Sasaki [mailto:fsasaki@w3.org]
> Sent: Monday, July 09, 2012 6:37 PM
> To: Michael Kruppa
> Cc: Yves Savourel; public-multilingualweb-lt@w3.org;
> fredrik.estreen@lionbridge.com
> Subject: Re: Re: [ACTION-160] (related to [ACTION-135] too) Summarize
> specialRequirements
>
> Hi Michael, all,
>
> A question to the XLIFF people in this thread: how do "forbidden
> characters" related to XLIFF? I see at
> http://docs.oasis-open.org/xliff/v1.2/cs02/xliff-core.html#charclass
> that there is a charclass attribute saying
> "This indicates that a translation is restricted to a subset of characters
> (i.e. ASCII only, Katakana only, uppercase only, etc.). "
> What would we do if there is an XLIFF file with charclass allowing some
> characters that are forbidden by a "forbidden characters" data category?
>
> Felix
> 2012/7/9 Michael Kruppa <Michael.Kruppa@cocomore.com>
> Hi Felix,
>
> max-size is relevant to us. But we would also like to support the
> forbidden chars since this may be relevant when storing translated content
> and certain characters may also lead to problems when integrating them with
> html and javascript code generated by the CMS.
>
> So, we hope that we can clarify the forbidden chars topic.
>
> Best
>
> Micha
>
>
>
>
> Von Samsung-Tablet gesendet
>
> Felix Sasaki <fsasaki@w3.org> hat geschrieben:
>
> Hi Michael,
>
> just a clarification question: do you mean max-size? So far for the other
> aspects of specialRequirements, we don't have a clear definition, so we may
> end up "just" with max-size.
>
> Best,
>
> Felix
> 2012/7/9 Michael Kruppa <Michael.Kruppa@cocomore.com>
> Hi all,
>
> I would just like to point out that for us, the specialRequirements meta
> tag appears to be of high relevance with respect to some of the business
> cases we have in mind for future usage of ITS. Therefore, we would like to
> declare our strong support for this meta tag and we will be happy to
> implement it on the CMS side.
>
> Cheers
>
> Micha
>
> ________________________________________
> Dr. Michael Kruppa, Senior IT-Consultant
> Tel.: +49 69 972 69 189 Fax: +49 69 972 69 204; E-Mail:
> michael.kruppa@cocomore.com
> Cocomore AG, Gutleutstraße 30, D-60329 Frankfurt
> Internet: http://www.cocomore.de Facebook:
> http://www.facebook.com/cocomore Google+: http://plus.cocomore.de
> Cocomore ist aktives Mitglied im World Wide Web Consortium (W3C) und im
> Bundesverband Digitale Wirtschaft (BVDW)
> Cocomore is active member of the World Wide Web Consortium (W3C)
> Vorstand: Dr. Hans-Ulrich von Freyberg (Vors.), Dr. Jens Fricke, Marc
> Kutschera, Vors. des Aufsichtsrates: Martin Velasco, Sitz: Frankfurt/Main,
> Amtsgericht Frankfurt am Main, HRB 51114
>
>
>
> -----Ursprüngliche Nachricht-----
> Von: Yves Savourel [mailto:ysavourel@enlaso.com]
> Gesendet: Montag, 9. Juli 2012 09:58
> An: 'Felix Sasaki'
> Cc: public-multilingualweb-lt@w3.org; 'Fredrik Estreen'
> Betreff: RE: [ACTION-160] (related to [ACTION-135] too) Summarize
> specialRequirements
> Hi Felix, all,
>
> > So I think - if I understand you correctly - what you want to achieve
> > is that tools that make use of metadata that is coming from various
> > sources (XLIFF, ITS, PO, ..). Currently that metadata is not coming at
> > all, or only in priority ways. The aim now is to have one agreed
> > metadata definition for max-size, right?
> > ...
> > ...
> > What worries me then is that we aim to create a single piece of
> > metadata, which is not part of the big picture. That raises several
> > questions / requirements:
>
> Maybe it'll help to go back at the root of this requirement (as far as I
> understand it):
>
> Sometimes a string to be translated has a limitation on how long it can
> be. The limitation can be in the storage (fixed length DB field in a CMS
> for example), or in the display: We are talking about the storage here.
>
> What I think ITS needs to provide is the way to pass that information down
> the consumer tools so the limitation can be verified at some stage (for
> example: during the translation, or/and at a QA step after).
>
> That's the "big picture" for me. I'm not sure what you mean by "special
> purpose length solution". To me the proposal Giuseppe has for maxStorgeSize
> is rather general.
>
> But maybe I'm missing your point.
> -yves
>
>
>
>
>
>
> --
> Felix Sasaki
> DFKI / W3C Fellow
>
>
>
>
>
> --
> Felix Sasaki
> DFKI / W3C Fellow
>
>
>


-- 
Felix Sasaki
DFKI / W3C Fellow
Received on Tuesday, 10 July 2012 08:47:00 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 16:31:47 UTC