- From: Hans Teijgeler <hans.teijgeler@quicknet.nl>
- Date: Thu, 25 Sep 2003 22:43:37 +0200
- To: Jeni Tennison <jeni@jenitennison.com>
- Cc: xmlschema-dev@w3.org, "weitz, edi" <edi@agharta.de>, "paap, onno" <onno.paap@ezzysurf.com>
- Message-id: <3F735379.E6CC3866@quicknet.nl>
Dear Jeni, Thank you so much for your extensive and thorough reply! You asked for more information regarding the behaviour of Spy, and therefore I made a very simple XML schema called middle-dot-test.xsd: <?xml version="1.0" encoding="UTF-8"?> <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" elementFormDefault="qualified" attributeFormDefault="unqualified"> <xs:simpleType name="xyz"> <xs:restriction base="xs:Name"> <xs:pattern value="([a-zA-Z][a-zA-Z0-9-]*__)*[a-zA-Z0-9\.\-]+(·[a-zA-Z0-9\.\-]+)?"/> </xs:restriction> </xs:simpleType> <xs:element name="test"> <xs:complexType> <xs:attribute name="abc" type="xyz" use="required"/> </xs:complexType> </xs:element> </xs:schema> (I deliberately used phony names to keep it generic) and then derived an XML document from that. In that document I entered an identifier with the Trebuchet MS middle dot: <?xml version="1.0" encoding="UTF-8"?> <test xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="D:\middle-dot-test.xsd" abc="ERDL__1234·a8"/> The error message after validation is then: This file is not valid Invalid value for datatype Name in attribute 'identifier' Question is: where are things going wrong? I hope you can help me out. Regards, Hans ====================================== Jeni Tennison wrote: > Hi Hans, > > > 1. I still need some document in which the whole subject of the Regualar > > Expressions in XML Schema is explained. I read through the concept book of > > Eric van der Vlist > > (http://books.xmlschemata.org/relaxng/RngBookWxsRegExp.html ) but that book > > assumes that I know much more than I do. I need something that starts at > > zero, for dummies, with MANY examples. Any suggestions? > > Perhaps you should start off with something that addresses regular > expressions more generally? A search for "regular expression tutorial" > in Google comes up with a bunch of promising leads; many of them are > written for Perl or Python, but don't let that put you off: the > regular expression syntax in XML Schema is *fairly* standard, at least > for the simple things. > > > 2. What is a "combiningchar" and what an "extender"? It is being talked about > > in XML as being an allowable part of Namechar, but nowhere I can find what > > it really IS and what it is used for. You guys/gals must have read > > something that I haven't, so apparently you know it (if not, why didn't you > > ask or complain?) > > I assume that you've been looking at the XML Recommendation and found > these. In XML terms, a "CombiningChar" is defined as one of the > characters listed at: > > http://www.w3.org/TR/REC-xml#NT-CombiningChar > > and an "Extender" is one of the characters listed at: > > http://www.w3.org/TR/REC-xml#NT-Extender > > In more abstract terms, combining characters and extenders are > particular kinds of character as defined in Unicode. They are both > kinds of characters that combine with preceding characters, creating > different glyphs when you view a string. > > Combining characters are characters that add things like accents to > preceding characters; for example, the character COMBINING RING ABOVE > #x030A is a combining character; when you combine it with the > character 'a' you see 'å'. > > Extenders are characters that extend the shape of preceding > characters; for example, the character MIDDLE DOT #x00B7 is an > extender; when you combine it with the character 'L' you see '?' > (which if it doesn't show up in your font is a L with a dot in the > middle of the glyph). > > If you really want to know more, immerse yourself in www.unicode.org. > Personally, I found the most valuable information there concerning > combining characters and extenders was the explanation of Unicode > normalization, which you can find at: > > http://www.unicode.org/reports/tr15 > > > 3. I want to separate the first part of the identifier > > ([a-zA-Z][a-zA-Z0-9-]*__)*[a-zA-Z0-9.-]+ from the second (optional) part > > ([a-zA-Z0-9.-]+)? by means of a character that normally isn't used in > > system identifiers. So I chose the "middle dot" (#x00B7). I have three > > questions: > > 1. Is the way it has now been introduced in the above RegEx correct? > > Yes, that's fine, since you're using it in an XML document. You're > using an XML character reference (·). This is interpreted when > the XML Schema document is parsed; as far as the application (the > schema validator) is concerned, the regular expression actually > includes the MIDDLE DOT character itself. > > You will probably run into problems if you use that syntax in a > regular expression that *isn't* held in an XML document, however. So > if you're using the Regex Coach, for example, you need to use a > different kind of escaping to include the character. I think that > \u00B7 might work... > > > 2. If I make an XML document based on an XML Schema (e.g. in Spy), how > > can I fill in such a middle dot as part of a Name? I have tried > > everything I could think of, but with no success > > Where does this Name appear? If it's in the value of an attribute or > in text within an element, then you can use the character reference > ·. Again, this character reference will be interpreted when the > document is parsed and as far as the application can tell it's > precisely the same as inserting the MIDDLE DOT character literally in > the attribute or text. > > If you're using the identifier as the name of an element or attribute, > then you can't use the character reference and have to insert the > character literally in the XML document. If you're using Windows, you > can do this using the Character Map utility or by typing Alt+0183 (on > the numeric keypad). > > > 3. In how far does the font type play a role? I found a middle dot in the > > Windows Character Map under Trebuchet MS (called U+00B7 Middle Dot), > > but Spy didn't accept that > > The font determines whether a glyph is available for a particular > character or not: if a font doesn't have a glyph for a character, you > might see a question mark or an empty box or something instead of the > actual character. (You should beware of the fact that some fonts use > glyphs for particular characters that are completely unrelated to what > the character actually is: that's most obviously the case with the > various Wingdings fonts, for example.) > > When you use the Character Map to select a character, it shouldn't > make any difference what font you use when selecting the character; if > the character isn't available in the font that you're using where you > *paste* the character, you'll get the question mark or empty box > appear. > > I'm not sure what XML Spy did when you tried to use that character -- > what "didn't accept that" actually means. If you provide more > information about what you tried and what error XML Spy gave you, we > might be able to help. > > FWIW, I found Mike Brown's "XML Tutorial", which focuses on issues of > character encoding and so on, really helpful in getting me to > understand how characters work in XML. You can find it at: > > http://skew.org/xml/tutorial/ > > Cheers, > > Jeni > > --- > Jeni Tennison > http://www.jenitennison.com/
Received on Thursday, 25 September 2003 16:40:06 UTC