Re: Implementing the DOM3 Val Spec in Javascript, problem with UPA and creating PSVI. from Casey Jordan on 2010-04-30 (xmlschema-dev@w3.org from April 2010)

From: Casey Jordan <casey.jordan@jorsek.com>
Date: Fri, 30 Apr 2010 16:34:38 -0400
To: Kevin Braun <kbraun@obj-sys.com>, xmlschema-dev@w3.org
Message-ID: <u2ibf585bb61004301334u7d9edde8w99a82b93982fba47@mail.gmail.com>
Kevin,

This is good info. I don't have alot of experience with any of this stuff,
just learned it over the past week so any help like this is much
appreciated.

I am thinking what I am going to have to do is compile the additions that
are possible ( like you said, both a | 1 in this case ), and then since its
not very user friendly to offer a suggestion that is actually wrong, I will
have to come up with another algorithm that filters out additions that
create invalid markup. Basically by creating a sandbox and validating
against that. Hopefully I can be clever enough and cache some of the results
so the processing is not too horrifically inefficient.

Unless someone else has another suggestion it looks like this is the best
solution for now.

Thanks Kevin!

On Fri, Apr 30, 2010 at 4:11 PM, Kevin Braun <kbraun@obj-sys.com> wrote:

>  Hi Casey,
>
> Just using reg exprs for convenience, suppose you have a grammar:
>
> Sentence ::= 'Z' ( 'a' 'b'+ End | '1' 'b'+ '2' )
> End ::= 'c' | '2'
>
> Then consider these sentences:
> Zabbbbbbbbbbbc and Zabbbbbbbb2.  In the first case, the 'a' cannot be
> replaced because of the 'c' on the end.  In the other case, the 'a' may be
> replaced with a '1', since there is a '2' on the end.  You can't determine
> this without looking to the end of the potentially infinite string.
>
> You can, however, figure out that a 'Z' may be followed by either 'a' or
> '1' (there are sentences in which this occurs).  This is what is called a
> follow set (as you probably know).
>
> I would think that as I edit a document, if the editor is going to make
> suggestions, it would suggest an 'a' and a '1' after a 'Z', and then mark
> what is wrong, if something becomes wrong, after I make the edit.
>
> Good luck!
> Kevin
>
>
> On 4/30/2010 3:39 PM, Casey Jordan wrote:
>
> Kevin,
>
> Thank you, for the quick reply. I have read the spec several times now and
> still have some of the same questions as you do. However, since I have not
> been able to get in contact with the editors of the spec I have made an
> logical assumption as you did.
>
> My assumption like yours is based off of the question "What would a user of
> an editor want to receive". Most likely they are going to want to know what
> they can add, remove or move next.
>
> In my opinion, exposed interfaces like allowedNextSiblings should supply a
> quick way to read the PSVI and give the user options as to what they can
> edit next and how.
>
> If this is not the case, I need to implement features like this for my
> parent project, ( which is an editor ) anyway.
>
> I guess the next big question is, in situations like you and I have
> outlined, how do we determine the attribution of a particle efficiently?
>
> Right now, I validate the document using a DFSA method, and as I do that I
> build the PSVI, however based on these "fuzzy" attributions I may have
> elements that if added will change the attribution and make the document
> invalid. Thus to be totally sure that my "allowedNextSiblings" are accurate
> I would need to actually insert them into the particle being validated and
> double check. This would be an efficiency nightmare from the standpoint of a
> web based editor.
>
> I feel like there has to be an elegant solution here, I just haven't
> written anything like this before. I am just hoping someone with a little
> more experience here might be able to shed some light on the problem.
>
> In the meantime I am going to run some tests cases where I use a double
> pass, and see just how inefficient it might be.
>
> Thanks again!
>
> On Fri, Apr 30, 2010 at 2:49 PM, Kevin Braun <kbraun@obj-sys.com> wrote:
>
>>  Hi Casey,
>>
>> If I follow you, your question is how to determine, for example, what
>> should be in the allowedNextSiblings attribute.  The description in the DOM3
>> Validation Spec (which I am not familiar with) says:
>>
>> allowedNextSibling: A NameList, as described in [DOM Level 3 Core<http://www.w3.org/TR/2004/REC-DOM-Level-3-Val-20040127/DOM3-Val.html#references-DOMCore>],
>> of all element information items or wildcards<http://www.w3.org/TR/2004/REC-DOM-Level-3-Val-20040127/DOM3-Val.html#validation-VAL-Interfaces-ElementEditVAL>that can be inserted as a next sibling of this element, or
>> null if this element has no context or schema. Duplicate pairs of
>> {namespaceURI, name} are eliminated.
>>
>> My question is what does it mean to say "Y can be inserted as a next
>> sibling of X"?  Does that mean "what can I change the next sibling into
>> without making this document invalid", or "what can I insert after X without
>> changing anything else and still have a valid document" or "according to the
>> grammar, what are all the things that possibly follow X in any valid
>> sentence"?  For example, suppose you had something like:
>>
>> <xs:sequence>
>>     <xs:element name="one"/>
>>     <xs:choice>
>>         <xs:sequence>
>>             <xs:element name="two"/>
>>             <xs:element name="alpha"/>
>>             <xs:element name="three'/>
>>         </xs:sequence>
>>         <xs:sequence>
>>             <xs:element name="A"/>
>>             <xs:element name="alpha"/>
>>             <xs:element name="B'/>
>>         </xs:sequence>
>>     </xs:choice>
>> </xs:sequence>
>>
>> Given <one><two><alpha><three>, the allowedNextSiblings for <one> is
>> {<two>, <A>}, if you assume any other necessary changes will be made; it is
>> {} if you assume no other changes will be made; it is {<two>} if you assume
>> you are talking about replacing the current next sibling.
>>
>> In your example, inserting an <h-sub> is valid, it just happens to change
>> the particle attribution of the section element.  It seems it does belong in
>> h's allowedNextSibling, under any interpretation.  What if <h-sub> were
>> already there?  You can't insert another one, so is <h-sub> still in the
>> allowedNextSiblings?
>>
>> I hope that helps some.  Perhaps something somewhere better explains what
>> allowedNextSiblings means, but I didn't see it based on a quick look at the
>> spec.  My guess is it is more along the lines of trying to expose the
>> aspects of the grammar so as to let an editor give suggestions to a user,
>> even if making an edit might produce an invalid document  (ie, what could
>> possibly follow, without respect to what actually does follow).
>>
>> My apologies if this is completely useless due to my unfamiliarity with
>> the DOM3 Validation spec.
>>
>> Regards,
>> Kevin
>>
>> --
>> Objective Systems, Inc.
>> REAL WORLD ASN.1 AND XML SOLUTIONS
>> Tel: +1 (484) 875-9841
>> Fax: +1 (484) 875-9830
>> Toll-free: (877) 307-6855 (USA only)http://www.obj-sys.com
>>
>>
>>
>> On 4/30/2010 1:18 PM, Casey Jordan wrote:
>>
>> Hey guys/gals,
>>
>> Micheal Kay suggested that I posted a problem I am having here in the
>> hopes that someone might be able to help me.
>>
>> I am creating an cross browser Open Source implementation of the DOM3
>> Validation Spec<http://www.w3.org/TR/2004/REC-DOM-Level-3-Val-20040127/DOM3-Val.html>,
>> at the moment its just a javascript implementation of a XSD validator and
>> PSVI interface that conforms to the standard.
>>
>> I am using a method based on derivatives of regular expressions (
>> Deterministic finite automaton ) and have encountered a really tricky
>> problem which can be shown by the below example:
>>
>> Suppose I have a schema with a type like this:
>>
>> <xs:complexType name="my.type" mixed="false">
>>         <xs:sequence>
>>             <xs:element ref="h"/>
>>             <xs:choice>
>>                 <xs:element ref="h-sub" maxOccurs="unbounded" />
>>                 <xs:element ref="section" />
>>             </xs:choice>
>>             <xs:element ref="section" minOccurs="0" maxOccurs="unbounded"
>> />
>>         </xs:sequence>
>>     </xs:complexType>
>>
>> When using finite automata, and the above pattern, while you can determine
>> if a document is valid, it would be impossible to determine if a "section"
>> element belonged to the xs:choice or the xs:sequence making it also
>> impossible to provide a complete PSVI.
>>
>> For instance suppose I wanted to know what could be added to the following
>> xml fragment governed by this pattern:
>>
>> <h>
>> <section/>
>> <section/>
>>
>> If we assumed that the first <section/> element satisfied the xs:choice,
>> then all we can do is add more <section/> elements, however if we assume
>> that both <section/> elements belong to the xs:sequence then its possible to
>> add an <h-sub/> element after the <h/>. This all becomes extremely complex
>> as we start nesting more patterns.
>>
>> So all that being said, I've been racking my brain trying to determine if
>> there is an effective way to compute a correct and complete PSVI in a
>> situation where this occurs. Ideally without having to look ahead and
>> remaining efficient.
>>
>>
>> More Details - For those interested.
>> -------------------------
>>
>> First I transform the schema into json , essentially patterns that
>> represent the FSA. So the above type would become the following particles:
>>
>> {
>>    type:'sequence',
>>    minOccurs:0,
>>    maxOccurs:1,
>>    instance:[
>>       { type: 'element', ref: 'h', minOccurs:0,maxOccurs:1},
>>       { type: 'choice', minOccurs:0,maxOccurs:1
>>           instance:[
>>                           { type: 'element', ref: 'h-sub',
>> minOccurs:0,maxOccurs:1},
>>                           { type: 'element', ref: 'section',
>> minOccurs:0,maxOccurs:1},
>>                        ]
>>
>>       }
>>
>>   ]
>> }
>>
>>
>> Then to validate a source node I apply a DFSA stepping through the pattern
>> and matching it to the source instance. Elements that are 'missing' or could
>> be added are inserted into a PSVI which can be exposed to find out
>> information like:
>>
>> Element.allowedNextSiblings
>> Element.allowedChildren
>> Element.allowedFirstChildren
>>
>> etc etc. As the spec describes.
>>
>> --
>> --
>> Casey Jordan
>> Jorsek Software LLC.
>> "CaseyDJordan" on LinkedIn, Twitter & Facebook
>> Cell (585) 771 0189
>> Office (585) 239 6060
>> Jorsek.com
>>
>>
>> This message is intended only for the use of the Addressee(s) and may
>> contain information that is privileged, confidential, and/or exempt from
>> disclosure under applicable law.  If you are not the intended recipient,
>> please be advised that any disclosure  copying, distribution, or use of
>> the information contained herein is prohibited.  If you have received
>> this communication in error, please destroy all copies of the message,
>> whether in electronic or hard copy format, as well as attachments, and
>> immediately contact the sender by replying to this e-mail or by phone.
>> Thank you.
>>
>>
>
>
> --
> --
> Casey Jordan
> Jorsek Software LLC.
> "CaseyDJordan" on LinkedIn, Twitter & Facebook
> Cell (585) 771 0189
> Office (585) 239 6060
> Jorsek.com
>
>
> This message is intended only for the use of the Addressee(s) and may
> contain information that is privileged, confidential, and/or exempt from
> disclosure under applicable law.  If you are not the intended recipient,
> please be advised that any disclosure  copying, distribution, or use of
> the information contained herein is prohibited.  If you have received
> this communication in error, please destroy all copies of the message,
> whether in electronic or hard copy format, as well as attachments, and
> immediately contact the sender by replying to this e-mail or by phone.
> Thank you.
>
>


-- 
--
Casey Jordan
Jorsek Software LLC.
"CaseyDJordan" on LinkedIn, Twitter & Facebook
Cell (585) 771 0189
Office (585) 239 6060
Jorsek.com


This message is intended only for the use of the Addressee(s) and may
contain information that is privileged, confidential, and/or exempt from
disclosure under applicable law.  If you are not the intended recipient,
please be advised that any disclosure  copying, distribution, or use of
the information contained herein is prohibited.  If you have received
this communication in error, please destroy all copies of the message,
whether in electronic or hard copy format, as well as attachments, and
immediately contact the sender by replying to this e-mail or by phone.
Thank you.
Received on Friday, 30 April 2010 20:38:44 UTC