It looks quite good for this example, with just one single instance in a
rather long text. It might look worse if there are a lot more instances
of Mongolian, and/or if there are some long words.

One might guess that this convention is more suited for electronic media
(pixels/scrolling are mostly for free), whereas the older convention is
better suited for physical books (paper costs money).

I'd say it's a well established convention. It's actually pretty common, though I don't know the exact percentage.
Attached some quick examples from older sources, grabbed from a Mongolian encoding discussion group. Will ask for the sources' names if you're interested.
