RE: Processing question for HTML default behavior

Hi Felix, Karl,

Thanks for the feedback.

Basically I think, like both of you, that this could be address in some BP document.
But if so, I think it need to be very strong and clear.

The bottom line is: how MUST an ITS processor behave on an HTML file if you give it no rules?
Currently it behaves in a way that is not really useable: one has to add rules to properly process the file.
So having an official set of rules to complement the specification is good, but I'm almost thinking it should be mentioned in the specifications (pointing to the rules file for example?)

Karl: the work-around you described would probably work, but it would be quite a hack. I don't think we'd like to go that direction.


-----Original Message-----
From: Karl Fritsche [] 
Sent: Wednesday, February 20, 2013 12:26 PM
Subject: Re: Processing question for HTML default behavior

Hi Yves,

I thought the best practice document should address exactly this topic.
There should be described such default rule-set authors can use.

But with your describe problem you have a very good hint. And I'm not sure if you can solve this, as long as you add automatically rules to the content. The author has to be aware of this rules and should send them to you, so he can disable the rules when everything shouldn't be translated.

Only thing I can currently think of would be a very hackish solution on your side. If you automatically add rules, then the local attributes has more weight than any rule, even on inheritance AND attributes. As long as a local attribute or because of inheritance of a local attribute, translate rules doesn't apply.
I think this would only work for the translate data category and not for other categories, as you don't have only yes or no. And it would be nothing we could describe well in a standard, because from the standard point of view, if you have this rules than the apply, which is totally correct. But not in the case when the are added automatically like in your case, because you are nice to your customers and aware of people, which doesn't know much about ITS.

But in my mind you can't handle this the correct way, when you're changing content and add rules there. They have to send the rules to you and not you have to add rules for them. Because when the author doesn't want you to translate any title tag, he also has to be aware that you add rules to translate the title tag, to write a rule which disallows it and overwrites your rule.


On 20.02.2013 18:54, Yves Savourel wrote:
> Hi all,
> I'm running into processing issue in our HTML filter because I'm trying to provide a set of default rules.
> Maybe some of you have run into the same issue and have fund a solution.
> The problem:
> When our filter process an HTML file we set a list of default ITS rules that correspond to what user would expect from a normal extraction of HTML. For example title or alt attributes should be translatable, b, I, u, em, and many more elements should be seen as inline, etc.
> The user does not have to define those rules. they can modify them, but usually they would not.
> The issue comes when there are local ITS markup. For example a translate='no' on <html>. Such document when you look at it should be completely non-translatable. But in our case, because we have default rules, anything that is defined globally as translatable in those rules is not inheriting the top-level translate='no' and therefore is seen as translatable.
> The problem then is that an author doesn't necessarily know what our default-HTML rules are and therefore is not able to markup his HTML accordingly.
> How do other people work with default-ITS behavior vs default HTML-expected behaviors?
> To some degree there is a disconnect between some of the default ITS behavior and the HTML reality. For example the specification explicitly says an HTML id attribute is the same as an ITS id attribute, so there is an expectation that you don't have to set a rule for it. But what about many other things like for example the title and alt attributes? They should be normally translated, but ITS does not say that, so it's up to the tool to provide a way to do it.
> I think we really need to have a more formal way to define what are the expectation on HTML. Maybe not normative, but something written in stone that processors can rely on, otherwise we'll end up with different tools behavior on the same input HTML.
> Cheers,
> -yves

Received on Wednesday, 20 February 2013 22:54:16 UTC