Processing question for HTML default behavior

Hi all,

I'm running into processing issue in our HTML filter because I'm trying to provide a set of default rules.
Maybe some of you have run into the same issue and have fund a solution.

The problem:

When our filter process an HTML file we set a list of default ITS rules that correspond to what user would expect from a normal extraction of HTML. For example title or alt attributes should be translatable, b, I, u, em, and many more elements should be seen as inline, etc.

The user does not have to define those rules. they can modify them, but usually they would not.

The issue comes when there are local ITS markup. For example a translate='no' on <html>. Such document when you look at it should be completely non-translatable. But in our case, because we have default rules, anything that is defined globally as translatable in those rules is not inheriting the top-level translate='no' and therefore is seen as translatable.

The problem then is that an author doesn't necessarily know what our default-HTML rules are and therefore is not able to markup his HTML accordingly.

How do other people work with default-ITS behavior vs default HTML-expected behaviors?

To some degree there is a disconnect between some of the default ITS behavior and the HTML reality. For example the specification explicitly says an HTML id attribute is the same as an ITS id attribute, so there is an expectation that you don't have to set a rule for it. But what about many other things like for example the title and alt attributes? They should be normally translated, but ITS does not say that, so it's up to the tool to provide a way to do it.

I think we really need to have a more formal way to define what are the expectation on HTML. Maybe not normative, but something written in stone that processors can rely on, otherwise we'll end up with different tools behavior on the same input HTML.

Cheers,
-yves

Received on Wednesday, 20 February 2013 17:55:26 UTC