Re: [w3ctag/design-reviews] Serialization of natural language in data formats such as JSON [I18N] (#178)

Solution 1 that would require changes to JSON itself isn't practical, because it would be too much of ocean boiling effort to change all JSON parsers.

I think Solution 2 potentially with bidi control characters within string values is workable.

> This seems related to the discussion currently happening in heycam/webidl#358, where we're attempting to add a shared primitive to Web IDL that all specs can use, and getting stuck. The point of contention is basically whether the pattern should be
> 
> ```
> someAPI({
>  lang: "...",
>  dir: "...",
>  label: "a string governed by the lang/dir"
>  name: "another string, governed by the same lang/dir"
> });
>```
> (the "`Localizable` base dictionary" solution)
>
> or
>
> ```
> someAPI({
>  label: {
>    lang: "...",
>    dir: "...",
>    value: "a string governed by the lang/dir"
>  },
>  name: "another string, using the default lang/dir"
> });
> ```
> (the "`LocalizableString union` typedef" solution).
>
> The former makes it easier to say that all strings have the same lang/dir. The latter allows more granular decision making, at the cost of verbosity.

I would expect the former to face less resistance, because it just adds some key-value pairs without forcing a reorganization of a given JSON-based format compared to its lang/dir-unaware version. Moreover, considering JSON from the perspective of developers trying to escape XML, the added nesting/complexity of the latter would probably not be well received. Therefore, I think pushing the latter as the only option wouldn't be productive.

A third option would be:

```
someAPI({
  label_lang: "...",
  label_dir: "...",
  label: "a string governed by label_lang/label_dir",
  name_lang: "...",
  name_dir: "...",
  name: "another string, governed by name_lang/name_dir"
});
```

> whether we should be encouraging the use of HTML rather than text

I think using HTML in JSON makes sense for strings that carry multi-paragraph text with inline formatting (i.e. something that would make sense inside HTML `<body>`), but I think it wouldn't be good to recommend markup inside JSON for strings that are closer to HTML `<title>`, email subject line, name of a person, a GUI label, invoice/inventory line item, etc.

Even though HTML parsers are now widely available, a plain-text string is a significantly simpler thing for the consumer's _data model_ to deal with than a tree rooted at DOM `DocumentFragment` or equivalent in a non-DOM markup tree API.

People use JSON instead of XML to avoid various complexities of XML and to use a format that maps nicely to and from basic programming language data structures. Making shortish plainish strings (not just ones representing multi-paragraph text with inline formatting) in JSON potentially carry markup would defeat both avoiding XML mixed content complexity and having a format that maps nicely to and from basic programming language data structures.

When a JSON-based format wouldn't use markup in strings for non-bidi reasons, to the extent a base direction taken from an adjacent key-value pair isn't enough, I think finer-grained bidi control should use the bidi control characters instead of importing the full _data model_ complexity of markup for every (human-readable) string.

(Whereas bidi is intrinsic to whole scripts, ruby is a sometimes-used (relatively rarely-used even) typographical device for the scripts with which it is used, so I think it cases where bidi doesn't justify the complexity of markup, ruby doesn't, either.)

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/w3ctag/design-reviews/issues/178#issuecomment-314220664

Received on Monday, 10 July 2017 20:07:42 UTC