- From: Jeni Tennison <jeni@jenitennison.com>
- Date: Thu, 25 Sep 2003 17:28:21 +0100
- To: Hans Teijgeler <hans.teijgeler@quicknet.nl>
- Cc: xmlschema-dev@w3.org
Hi Hans, > 1. I still need some document in which the whole subject of the Regualar > Expressions in XML Schema is explained. I read through the concept book of > Eric van der Vlist > (http://books.xmlschemata.org/relaxng/RngBookWxsRegExp.html ) but that book > assumes that I know much more than I do. I need something that starts at > zero, for dummies, with MANY examples. Any suggestions? Perhaps you should start off with something that addresses regular expressions more generally? A search for "regular expression tutorial" in Google comes up with a bunch of promising leads; many of them are written for Perl or Python, but don't let that put you off: the regular expression syntax in XML Schema is *fairly* standard, at least for the simple things. > 2. What is a "combiningchar" and what an "extender"? It is being talked about > in XML as being an allowable part of Namechar, but nowhere I can find what > it really IS and what it is used for. You guys/gals must have read > something that I haven't, so apparently you know it (if not, why didn't you > ask or complain?) I assume that you've been looking at the XML Recommendation and found these. In XML terms, a "CombiningChar" is defined as one of the characters listed at: http://www.w3.org/TR/REC-xml#NT-CombiningChar and an "Extender" is one of the characters listed at: http://www.w3.org/TR/REC-xml#NT-Extender In more abstract terms, combining characters and extenders are particular kinds of character as defined in Unicode. They are both kinds of characters that combine with preceding characters, creating different glyphs when you view a string. Combining characters are characters that add things like accents to preceding characters; for example, the character COMBINING RING ABOVE #x030A is a combining character; when you combine it with the character 'a' you see 'å'. Extenders are characters that extend the shape of preceding characters; for example, the character MIDDLE DOT #x00B7 is an extender; when you combine it with the character 'L' you see '?' (which if it doesn't show up in your font is a L with a dot in the middle of the glyph). If you really want to know more, immerse yourself in www.unicode.org. Personally, I found the most valuable information there concerning combining characters and extenders was the explanation of Unicode normalization, which you can find at: http://www.unicode.org/reports/tr15 > 3. I want to separate the first part of the identifier > ([a-zA-Z][a-zA-Z0-9-]*__)*[a-zA-Z0-9.-]+ from the second (optional) part > ([a-zA-Z0-9.-]+)? by means of a character that normally isn't used in > system identifiers. So I chose the "middle dot" (#x00B7). I have three > questions: > 1. Is the way it has now been introduced in the above RegEx correct? Yes, that's fine, since you're using it in an XML document. You're using an XML character reference (·). This is interpreted when the XML Schema document is parsed; as far as the application (the schema validator) is concerned, the regular expression actually includes the MIDDLE DOT character itself. You will probably run into problems if you use that syntax in a regular expression that *isn't* held in an XML document, however. So if you're using the Regex Coach, for example, you need to use a different kind of escaping to include the character. I think that \u00B7 might work... > 2. If I make an XML document based on an XML Schema (e.g. in Spy), how > can I fill in such a middle dot as part of a Name? I have tried > everything I could think of, but with no success Where does this Name appear? If it's in the value of an attribute or in text within an element, then you can use the character reference ·. Again, this character reference will be interpreted when the document is parsed and as far as the application can tell it's precisely the same as inserting the MIDDLE DOT character literally in the attribute or text. If you're using the identifier as the name of an element or attribute, then you can't use the character reference and have to insert the character literally in the XML document. If you're using Windows, you can do this using the Character Map utility or by typing Alt+0183 (on the numeric keypad). > 3. In how far does the font type play a role? I found a middle dot in the > Windows Character Map under Trebuchet MS (called U+00B7 Middle Dot), > but Spy didn't accept that The font determines whether a glyph is available for a particular character or not: if a font doesn't have a glyph for a character, you might see a question mark or an empty box or something instead of the actual character. (You should beware of the fact that some fonts use glyphs for particular characters that are completely unrelated to what the character actually is: that's most obviously the case with the various Wingdings fonts, for example.) When you use the Character Map to select a character, it shouldn't make any difference what font you use when selecting the character; if the character isn't available in the font that you're using where you *paste* the character, you'll get the question mark or empty box appear. I'm not sure what XML Spy did when you tried to use that character -- what "didn't accept that" actually means. If you provide more information about what you tried and what error XML Spy gave you, we might be able to help. FWIW, I found Mike Brown's "XML Tutorial", which focuses on issues of character encoding and so on, really helpful in getting me to understand how characters work in XML. You can find it at: http://skew.org/xml/tutorial/ Cheers, Jeni --- Jeni Tennison http://www.jenitennison.com/
Received on Thursday, 25 September 2003 12:28:55 UTC