I'm actually typing this from Japan, where I've spent the afternoon being shown some SGML work done here in the context of language research, and it is noticeable that all the LOCALLY-produced DTDs use the full 16-bit Japanese character set for markup, otherwise the students couldn't always understand the markup. ht