Re: Checking a UTF-16 instance against a UTF-8 Schema

Mark Feblowitz <mfeblowitz@frictionless.com> writes:

> Probably a FAQ, but not one I've encountered.
> 
> Can all available validating parsers validate an XML instance represented in
> UTF-16 against an XML Schema represented in UTF-8? 

Unless they are non-conformant at the XML level, they should.
Infosets contain characters, not encodings, so the encoding should be
invisible at the schema validation level.

> Would anything special need to be done to achieve the validation, or would
> it suffice for each of them to have their encodings appropriately indicated?

That should do it.

> The intent here is to use a given Schema, represented in UTF-8, to validate
> an instance document that uses the same element and attribute labels, yet
> the element and/or attribute content requires UTF-16, e.g., strings
> containing accented French.

(That doesn't require UTF-16 -- UTF-8 works fine for any Unicode character.)

ht
-- 
  Henry S. Thompson, HCRC Language Technology Group, University of Edinburgh
          W3C Fellow 1999--2002, part-time member of W3C Team
     2 Buccleuch Place, Edinburgh EH8 9LW, SCOTLAND -- (44) 131 650-4440
	    Fax: (44) 131 650-4587, e-mail: ht@cogsci.ed.ac.uk
		     URL: http://www.ltg.ed.ac.uk/~ht/
 [mail really from me _always_ has this .sig -- mail without it is forged spam]

Received on Tuesday, 2 July 2002 04:49:01 UTC