- From: John Boyer <boyerj@ca.ibm.com>
- Date: Thu, 16 Jun 2011 12:22:06 -0700
- To: Nick Van den Bleeken <Nick.Van.den.Bleeken@inventivegroup.com>
- Cc: Public Forms <public-forms@w3.org>
- Message-ID: <OF4C154731.30146F0D-ON882578B1.0063CF58-882578B1.006A649B@ca.ibm.com>
Hi Nick (and Steven), 1) One thing to note about both the "name attribute" and "unique escaping" mechanisms is that they both appear to require the form author to obtain the XML for the JSON in order to figure out how to write their XPaths. Under "unique escaping" they need to obtain the special names of tags and don't have to write xpath predicates. Under the "name attribute" method, they need to determine whether to write an xpath predicate. Because they have to look either way, it does not seem like the authoring of the xpath is any harder. The author can copy-paste the hex encoded tag name as easily as he can write *[@name='blah']. By comparison, the name attribute method has the following drawbacks: i) The xpath expressions themselves are slower to execute because predicate testing every node is much more expensive than name matching. ii) The author will have to pay special mind to encoding what goes inside of the single quotes for comparison, in case special encoding is required for those characters. To fully appreciate this case, try writing an xforms input ref attribute to match the element produced by the following { "single\'double\"quote" : "brutal name" }, Hint: For ref="*[@name='____']", you cannot fill in the blank with anything that works. iii) The simple underscore escaping used when a name attribute is generated can conflict with the legitimate use of underscore. For example, you have the "non-causal" problem of having to write an exceptional and fragile xpath to access an element that does not even need a name attribute. Consider this example: { "simple$name":"cause", "simple_name":"effect" }. Now write an xforms:input ref attribute for the "simple_name". You have to know to write ref="simple_name[2]" or else the input will bind to the first element in the JSON object. iv) If one does end up writing a companion XML schema for the XML corresponding to the JSON, then unique element names are needed for elements that will be of different types. The bottom line is that the name attribute seems easier at first, but it ends up not being much simpler in practice, and it leaves technical bugs in the solution. It might be nice to have "name attribute" generation as an option, but then we should discuss whether to have "unique escaping" or "name attributes" be the default. A third alternative of always generating the name attribute, rather than generating it only when escaping, should also be considered. 2) On illegal XML chars, I would really love to have a way to include them. I propose omitting them only because there currently does not appear to be a solution that includes the illegal characters in some escaped way because there would then need to be a way to escape the escape character. If someone wants to propose a mechanism that fully works, that would be great. As an example, how does one distinguish the values of the two strings in the following object: { "formfeed" : "\f", "backslash_f" : "\\f" } On the telecon, it was proposed that the first string would become <formfeed>\f</formfeed>, but our current solution also produces <backslash_f>\f</backslash_f>, so how could we recover the original JSON object? 3) On arrays, yes I agree that "item" is nicer... if you speak English. My original spec used that word, but I proposed __ because it is the encoding for the unnamed element and is language neutral, whereas "item" may be no different from a pile of underscores for those who don't speak English. In the mapper spec I'm working on, we anticipated it would be hard to get everyone to agree to a name for array elements, so we'll be enabling a configuration entry that allows that to be set by the consumer. So, if we do settle on an English name like "item" then that would still be OK. Either way, because it is configurable, I end up writing a wildcard in the xpath, e.g. arrayName/*[1]. Again it begs the question, do we want to just use the "unnamed element" name, or use a language-specific name, or provide an option to specify what name to use for array elements. Cheers, John M. Boyer, Ph.D. Distinguished Engineer, IBM Forms and Smarter Web Applications IBM Canada Software Lab, Victoria E-Mail: boyerj@ca.ibm.com Blog: http://www.ibm.com/developerworks/blogs/page/JohnBoyer Blog RSS feed: http://www.ibm.com/developerworks/blogs/rss/JohnBoyer?flavor=rssdw From: Nick Van den Bleeken <Nick.Van.den.Bleeken@inventivegroup.com> To: John Boyer/CanWest/IBM@IBMCA Cc: Public Forms <public-forms@w3.org> Date: 06/16/2011 09:31 AM Subject: Re: JSON Instances Sent by: public-forms-request@w3.org I re-read the Json wiki page, and can live with all 'Discussion points' and solutions for those. Nevertheless I have a couple remarks: * Escaping of element names with special characters: I would prefer to add the name attribute, I think this will make life easier for form authors. They need to know if a name contains special characters in both approaches (either use name attribute or hex escaped values in the selector). And you need extra machinery to calculate the hex values. If the escaped names are a must, we could maybe have a configuration option to add the name attribute for escaped element names or not. * Illegal XML characters: I also prefer that we drop those, because it will make binding UI controls to those values to complicated in my opinion (they need to know the escape rules, especially for entering a \). If really necessary we could make this behavior configurable, but only if there is enough demand (in my opinion). * Arrays: I think it is indeed the best solution to create extra depth for an array, so the array type attribute can be attached to the extra level. Personally I don't like the '__' element names and would prefer 'item'. Kind regards, Nick Van den Bleeken R&D Manager Phone: +32 3 821 01 70 Office fax: +32 3 821 01 71 nick.van.den.bleeken@inventivegroup.com www.inventivedesigners.com On 06 Jun 2011, at 23:52, John Boyer wrote: 15) There are numerous examples in the wiki in which the JSON names are not surrounded by quotes. I see this a lot in javascript, so one can imagine that some systems will be lenient on this point, but we should decide whether we want to support that. According to the syntax at JSON.org, the name part must be surrounded by quotes. Here are two interesting examples: { "name" : "value" } { "" : "value"} From: John Boyer/CanWest/IBM@IBMCA To: "Forms WG" <public-forms@w3.org> Date: 06/05/2011 03:48 PM Subject: Re: JSON Instances Sent by: public-forms-request@w3.org Hi everyone, It would help to get some urgency and focus on discussing this on the list so we can get some decisions on our next call. Although this isn't really "my" issue, there seems to be a sudden spike in interest in implementing conversions and getting the word out on "how it should be done", and so I've gotten involved just to make sure our interests are represented... but that means we have to have a solid, clear, complete solution... I presented our approach to an internal IBM task force, amended by the many issues I raised in the thread below, and a few more issues popped up based on their feedback. 10) We need to be able to handle json rooted by an array, not just an object, e.g. [ 10, 20 ]. 11) We need to be able to handle anonymously named objects and arrays at any level, not just the root level, e.g. [[1,2], [3,4]] Here's what this could look like: <data name="" type="array"> <data name="" array="true" type="array"> <data array="true" name="" type="number">1</data> <data array="true" name="" type="number">2</data> </data> <data name="" array="true" type="array"> <data array="true" name="" type="number">3</data> <data array="true" name="" type="number">4</data> </data> </data> 12) We need to be able to distinguish various kinds of emptiness, not just null, and we need to be able to distinguish the emptiness indication from the type. For example, you can have emptiness meaning null (which we current suggest representing with type=null), but how do we represent emptiness meaning empty object or empty array? { a:[] } or { a: {}} Seems we need something more like empty="null | object | array | string | number | boolean" to explain what emptiness means if the element is empty. Then, it would be possible to subsequently assign a type to indicate what it should become if it becomes non-empty. Some XML applications will have a secondary source of information that describes this, and we don't want a type assignment (e.g. type="null") getting in the way. 13) It could be worthwhile using a hex encoding escape mechanism for illegal chars in names. Maybe use minus instead of underscore to separate, because minus will be used less frequently in names. Then, you'll have to espape the escaping character. But, if you have json names that are similar except for some illegal characters, then the tag names will still be unique even if this is not strictly necessary due to preserving the name in a name attribute. 14) What about those characters that are illegal in XML? Can we devise an escaping mechanism to preserve them in names and content? ================================== Here are some key points that are different than our current wiki content: i) Use type="object|array|number|boolean|string" ii) Use empty="null|object|array|number|boolean|string" with string the default to help indicate the meaning of empty content for an element iii) Use <data name=""> for anonymous (unnamed) values anywhere, including root iv) Use non-empty name attribute to record real names for json names that don't match NCName v) Use non-empty name attribute generally to indicate quotes should appear around the json name vi) Use an attribute to mark each array element, not just the start, so it will be clear for any element whether it is part of an array. and here's what some more examples would translate to based on these: {a:[]} becomes <data name="" type="object"> <a array="true" empty="array"></a> </data> {a:[""]} becomes <data name="" type="object"> <a array="true"></a> </data> { a : [[1,2],[3,4]] } becomes <data name="" type="object"> <a array="true" type="array"> <data array="true" name="" type="number">1</data> <data array="true" name="" type="number">2</data> </a> <a array="true" type="array"> <data array="true" name="" type="number">3</data> <data array="true" name="" type="number">4</data> </a> </data> John M. Boyer, Ph.D. Distinguished Engineer, IBM Forms and Smarter Web Applications IBM Canada Software Lab, Victoria E-Mail: boyerj@ca.ibm.com Blog: http://www.ibm.com/developerworks/blogs/page/JohnBoyer Blog RSS feed: http://www.ibm.com/developerworks/blogs/rss/JohnBoyer?flavor=rssdw From: John Boyer/CanWest/IBM@IBMCA To: John Boyer/CanWest/IBM@IBMCA Cc: "Nick Van den Bleeken" <Nick.Van.den.Bleeken@inventivegroup.com >, "Forms WG" <public-forms@w3.org>, "Steven Pemberton" < Steven.Pemberton@cwi.nl> Date: 06/03/2011 12:25 AM Subject: Re: JSON Instances Sent by: public-forms-request@w3.org P.P.S. 9) I do not think you can transliterate \b and \f because their codepoints are 0x08 and 0x0C, which are not allowed by XML Char. More generally, it sounds like all code points between 00 and 1F are out of bounds except 09, 0A and 0D. Does anyone know if JSON allows 00? John M. Boyer, Ph.D. Distinguished Engineer, IBM Forms and Smarter Web Applications IBM Canada Software Lab, Victoria E-Mail: boyerj@ca.ibm.com Blog: http://www.ibm.com/developerworks/blogs/page/JohnBoyer Blog RSS feed: http://www.ibm.com/developerworks/blogs/rss/JohnBoyer?flavor=rssdw From: John Boyer/CanWest/IBM@IBMCA To: John Boyer/CanWest/IBM@IBMCA Cc: "Nick Van den Bleeken" <Nick.Van.den.Bleeken@inventivegroup.com >, "Forms WG" <public-forms@w3.org>, "Steven Pemberton" < Steven.Pemberton@cwi.nl> Date: 06/02/2011 09:27 PM Subject: Re: JSON Instances Sent by: public-forms-request@w3.org P.S. 7) Why should the root element for the XML corresponding to the JSON data be <json>? Why not <data>? 8) The second bullet point that contains the encoding instructions for the variable "name" should be further decomposed into a second level bullet point list. Thanks, JB From: John Boyer/CanWest/IBM@IBMCA To: "Steven Pemberton" <Steven.Pemberton@cwi.nl> Cc: "Nick Van den Bleeken" <Nick.Van.den.Bleeken@inventivegroup.com >, "Forms WG" <public-forms@w3.org> Date: 06/02/2011 04:08 PM Subject: Re: JSON Instances Sent by: public-forms-request@w3.org Even if the problem were only about numbers and booleans, using a bind would make it a lot harder to roundtrip back to json, compared with decorating the data itself. I also agree that the array case pretty much quashes the idea. More generally, it would be preferable if our JSON => XML => back to JSON conversion strategy didn't rely on anything outside of XML. If our conversion relied on another XForms construct, like bind, then people outside of XForms could not reuse. I have several other questions and suggestions related to round-tripping the JSON: 1) Add some way to tell whether the name part of the JSON should have quote marks. Right now it is clear that you need quote marks if the name includes non-NMCHAR characters. In this case, you get a name attribute on the XML tag. So maybe a way to always signal use of quote marks is to put a name attribute, like this: {"size": 50} ==> <json><size name="size" type="number">50</size></json> ==> {"size": 50} The converter becomes really easy too. Use name attr if given and put the attr value in quotes, otherwise use the element name, not in quotes. Finally, I think if you take this approach, then there would not be much point in debating whether we should use something better than just underscores for the illegal chars, right? 2) I think type="null" is a bit underpowered. I think you really mean type="object" because you're just trying to distinguish that the empty content means null rather than the string "". By the way, I recommend against using xsi:nil because it has to correspond to something being nillable="true" in a schema, and it must be manually changed to xsi:nil="false" if the element becomes non-empty. 3) You ask whether the type attr should be replaced with xsi:type. I'd recommend against. It seems better to separate the issue of converting JSON => XML from the issue of improving the XForms processing of the resultant XML. It would always be possible for an XForms author to add an XForms bind whose nodeset uses an xpath predicate to select nodes with a particular type assignment and then assign a type MIP to those nodes to attach a particular schema datatype, e.g. <xf:bind nodeset="/descendant::*[type='number']" type="xsd:double"/> By the way, it does look like javascript number and xsd:double use the same 64-bit IEEE definition, so better to leave this flexible in case the form author wants to be more restrictive, e.g. restrict to integer inputs. Finally, use of xsi:type would then require us to add the ugly xmlns:xsi namespace declaration to the json element. 4) Attaching starts="array" seems underpowered. Suppose I have a particular node and I need to know whether it is part of an array? Why not attach array="true" to each element from an array? Or would there be any value in setting the attribute array equal to the element name? Would there be a benefit to authors of being able to say nodeset="*[array='size']" in order to grab all the nodes in the size array separately from array elements that might be at the same hierarchic level? One might think you could achieve the same effect with nodeset="size[array='true']", so maybe the boolean is enough. 5) Is it just a wiki problem that is producing ?? for the translation of escaped characters? If so, I suggest using a hex notation, e.g. \b to 0x08, \f to 0x0C, \n to 0x0A, \r to 0x0D, and \t to 0x09. 6) Can you update the wiki to indicate what illegal XML characters you might be talking about? Seems it will be hard to decide what to do about the characters without having the research to indicate what they are. Maybe there are just a few? Thank you, John M. Boyer, Ph.D. Distinguished Engineer, IBM Forms and Smarter Web Applications IBM Canada Software Lab, Victoria E-Mail: boyerj@ca.ibm.com Blog: http://www.ibm.com/developerworks/blogs/page/JohnBoyer Blog RSS feed: http://www.ibm.com/developerworks/blogs/rss/JohnBoyer?flavor=rssdw From: "Steven Pemberton" <Steven.Pemberton@cwi.nl> To: "Steven Pemberton" <Steven.Pemberton@cwi.nl>, "Nick Van den Bleeken" <Nick.Van.den.Bleeken@inventivegroup.com> Cc: "Forms WG" <public-forms@w3.org> Date: 06/01/2011 07:04 AM Subject: Re: JSON Instances Sent by: public-forms-request@w3.org The reason they are there is to allow serialization to roundtrip the data. That might work for numbers and boolean, but I don't see how it would work for arrays. (But I may be wrong). Steven On Wed, 01 Jun 2011 11:10:41 +0200, Nick Van den Bleeken <Nick.Van.den.Bleeken@inventivegroup.com> wrote: > Steven, > > Couldn't we use auto generated binds that attach the type information to > the nodes for that? > > Regards, > > Nick van den Bleeken > > > On 30 May 2011, at 14:41, "Steven Pemberton" <Steven.Pemberton@cwi.nl> > wrote: > >> I should note a slight difference with what we had earlier agreed that >> dawned on me while firming it up, that got in the way of round-tripping. >> >> In the transformation of >> {"size": 50} and {"size": "50"} >> you can't tell the difference if you transform both to >> <json><size>50</size></json> >> >> So I've use the type attribute to (arbitrarily) mark the numeric case: >> >> <json><size type="number">50</size><json> >> >> Similarly with the boolean and null cases. >> >> Steven >> >> >> On Fri, 27 May 2011 16:51:50 +0200, Steven Pemberton >> <Steven.Pemberton@cwi.nl> wrote: >> >>> I have rewritten the JSON section, according to my action item. >>> >>> http://www.w3.org/MarkUp/Forms/wiki/Json >>> >>> Comments gladly received. >>> >>> Steven >> >> >> -- >> This message has been scanned for viruses and >> dangerous content by MailScanner, and is >> believed to be clean. >> > > ________________________________ > > Inventive Designers' Email Disclaimer: > http://www.inventivedesigners.com/email-disclaimer -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. Inventive Designers' Email Disclaimer: http://www.inventivedesigners.com/email-disclaimer
Attachments
Received on Thursday, 16 June 2011 19:23:28 UTC