- From: Jukka K. Korpela <jkorpela@cs.tut.fi>
- Date: Wed, 13 Sep 2006 11:38:52 +0300 (EEST)
- To: Jose <jose_stephen@cdactvm.in>
- cc: unicode@unicode.org, www-international@w3.org
On Wed, 13 Sep 2006, Jose wrote: > Unicode Technical Report #20 (Unicode in XML and other Markup > Languages) http://www.Unicode.org/Unicode/reports/tr20/ specifies that > Zero-width Joiners/ nonjoiners (ZWJ and ZWNJ) are suitable for use with > in the markup. Yes, for affecting ligature and joining behavior. I mention this because there is a popular word processor that uses ZWJ and ZWNJ quite inappropriately for line break control. Of course, the statement is of general nature: those characters are in principle suitable for use in marked-up text. It does not guarantee or prescribe that a particular markup system allows them or that they will be interpreted by their Unicode semantics. > But when an xml file with the tags written in Malayalam > using ZWJs (In Malayalam ZWJ is used to form certain characters) an > error is reported that the tag contained an invalid character. Reported by which program? I first suspected that you may have tried to enter these characters but they do not appear correctly in the declared or implied character encoding. But reading again, I notice that you are referring to _tags_ and might actually mean the use of characters in element or attribute names, as opposite to their use in content between tags. UTR #20 discusses the latter, i.e. what you can use in document content proper - together with markup, not _inside_ markup (tags). The use of characters in element and attribute names is governed by the use of each markup language, basically in the _identifier_ syntax. Generally, and in XML 1.0, control characters are excluded in that syntax, and ZWJ and ZWNJ are control characters by definition (General Category: Cf). Thus, an attempt to use them in element names would violate well-formedness constraints, and an XML parser would report an error - not about an invalid character per se but about a syntax error. In XML 1.1, ZWJ and ZWNJ are allowed in identifiers, but this is probably of little practical value. -- Jukka "Yucca" Korpela, http://www.cs.tut.fi/~jkorpela/
Received on Wednesday, 13 September 2006 08:38:58 UTC