- From: Casey Jordan <casey.jordan@jorsek.com>
- Date: Sun, 2 May 2010 11:38:19 -0400
- To: Michael Kay <mike@saxonica.com>
- Cc: xmlschema-dev@w3.org
- Message-ID: <q2xbf585bb61005020838mcd7d54a5wf76f228b2ffb6c8d@mail.gmail.com>
Micheal, I agree, this situation does not violate the UPA rule, I suppose I should have been more specific. My question revolves around creating a PSVI from instances like this. Lets look at the following example based off this schema with possible insertions noted in [element_name]. Example: This is what I would get by stepping through the pattern and marking elements that can be inserted. h [h-sub] section <---------Satisfies the choice particle, however I could technically insert a <section/> above this. [section] section [section] Heres the issue, given this schema I should also be able to insert an <section/> element after the h-sub, but its not really possible to know that unless we look ahead, especially if the <section/> element in the sequence has a maxOccurs that is defined other than unbounded. This problem gets really complex when structures get deeply nested. Kevin Braun summed the problem up here using regular expressions: *Hi Casey, Just using reg exprs for convenience, suppose you have a grammar: Sentence ::= 'Z' ( 'a' 'b'+ End | '1' 'b'+ '2' ) End ::= 'c' | '2' Then consider these sentences: Zabbbbbbbbbbbc and Zabbbbbbbb2. In the first case, the 'a' cannot be replaced because of the 'c' on the end. In the other case, the 'a' may be replaced with a '1', since there is a '2' on the end. You can't determine this without looking to the end of the potentially infinite string. You can, however, figure out that a 'Z' may be followed by either 'a' or '1' (there are sentences in which this occurs). This is what is called a follow set (as you probably know). I would think that as I edit a document, if the editor is going to make suggestions, it would suggest an 'a' and a '1' after a 'Z', and then mark what is wrong, if something becomes wrong, after I make the edit. Good luck!* So, based on this I am left with a situation where in order to determine where elements can actually be inserted into the document I have to do the following: 1.) Assemble all possible elements that can be inserted before any given element or appended to any given element. 2.) Insert these elements one by one into the instance and re-validate the particle, if it fails validation, throw it out. 3.) Return all element names that did not cause the document to become invalid on insertion. This leaves me in a tricky spot since I am doing this all in JavaScript, and this process could get really inefficient. I have tried to find an algorithm that would allow me to do this more efficiently but haven't found anything. Is there a standard way for creating a PSVI when using a FSA method that I am missing? Or am I on the right track? Thanks guys. Cheers, Casey On Sat, May 1, 2010 at 6:45 AM, Michael Kay <mike@saxonica.com> wrote: > > > Suppose I have a schema with a type like this: > > <xs:complexType name="my.type" mixed="false"> > <xs:sequence> > <xs:element ref="h"/> > <xs:choice> > <xs:element ref="h-sub" maxOccurs="unbounded" /> > <xs:element ref="section" /> > </xs:choice> > <xs:element ref="section" minOccurs="0" maxOccurs="unbounded" > /> > </xs:sequence> > </xs:complexType> > > > When using finite automata, and the above pattern, while you can > determine if a document is valid, it would be impossible to determine if a > "section" element belonged to the xs:choice or the xs:sequence making it > also impossible to provide a complete PSVI. > > I'm having difficulty seeing the problem. A <section> that immediately > follows the <h> can only satisfy the choice. A <section> that immediately > follows an <h-sub> or another <section> can only satisfy the final particle. > > > If the choice were optional or repeatable, this content model would violate > UPA. (Though Saxon would actually allow it through, since Saxon only > attributes element instances to declarations, not to particles, and in this > case the two particles refer to the same element declaration.) > > > Regards, > > Michael Kay > http://www.saxonica.com/ > http://twitter.com/michaelhkay > -- -- Casey Jordan Jorsek Software LLC. "CaseyDJordan" on LinkedIn, Twitter & Facebook Cell (585) 771 0189 Office (585) 239 6060 Jorsek.com This message is intended only for the use of the Addressee(s) and may contain information that is privileged, confidential, and/or exempt from disclosure under applicable law. If you are not the intended recipient, please be advised that any disclosure copying, distribution, or use of the information contained herein is prohibited. If you have received this communication in error, please destroy all copies of the message, whether in electronic or hard copy format, as well as attachments, and immediately contact the sender by replying to this e-mail or by phone. Thank you.
Received on Sunday, 2 May 2010 15:38:52 UTC