Re: JSON data in the wild

I think fn:parse-json would also work:

   array { unparsed-text-lines("test.txt") ! parse-json(.) }

I have seen line-separated JSON and XML in the Hadoop world because it is easy to split a large file among multiple tasks when documents are separated by a single character.  

Thanks,
Josh

> On Sep 9, 2015, at 4:11 PM, Michael Kay <mike@saxonica.com> wrote:
> 
> It turns out it’s easy enough to handle with
> 
> <j:array>{unparsed-text-lines($input) ! json-to-xml(.)}</j:array>
> 
> so we pass the test in terms of having the tools to deal with it.
> 
> But I was just wondering if JSON is a moving target...
> 
> Michael Kay
> Saxonica
> 
> 
>> On 9 Sep 2015, at 23:42, Robie, Jonathan <jonathan.robie@emc.com <mailto:jonathan.robie@emc.com>> wrote:
>> 
>> If we could handle sequences of JSON objects or arrays, with or without commas, and strings without delimiters, that would probably cover most of the cases.
>> 
>> We should ask some JSON experts whether they agree.
>> 
>> Obviously, this is only in the serialization format.
>> 
>> Jonathan
>> 
>> From: Jim Melton [jim.melton@oracle.com <mailto:jim.melton@oracle.com>]
>> Sent: Wednesday, September 09, 2015 6:10 PM
>> To: Robie, Jonathan
>> Cc: Michael Kay; public-xsl-query@w3.org <mailto:public-xsl-query@w3.org>
>> Subject: Re: JSON data in the wild
>> 
>> I would remember that JSON was invented by and for people who don't like to admit that data formats must be precise and predictable (or, for that matter, documented) in order for them to be usefully shared. 
>> 
>> The standardizer in me says "reject out of hand such abominations". 
>> 
>> The practitioner in me says "that's just how it is; how can we adapt".
>> 
>> Jim
>> 
>> On 9/9/2015 3:42 PM, Robie, Jonathan wrote:
>>> 
>>> I think you'll see both this format and comma-delimited objects in the wild. 
>>> 
>>> {"_id":707860,"name":"Hurzuf","country":"UA","coord":{"lon":34.283333,"lat":44.549999}},
>>> {"_id":519188,"name":"Novinki","country":"RU","coord":{"lon":37.666668,"lat":55.683334}},
>>> {"_id":1283378,"name":"Gorkhā","country":"NP","coord":{"lon":84.633331,"lat":28}}
>>> 
>>> I don't think the JSON specification licenses either.  But you frequently want "a collection of objects", a concept that JSON does not define.  You can always put it in an array, but that's an extra node that you don't really want or need. 
>>> 
>>> [
>>> {"_id":707860,"name":"Hurzuf","country":"UA","coord":{"lon":34.283333,"lat":44.549999}},
>>> {"_id":519188,"name":"Novinki","country":"RU","coord":{"lon":37.666668,"lat":55.683334}},
>>> {"_id":1283378,"name":"Gorkhā","country":"NP","coord":{"lon":84.633331,"lat":28}}
>>> ]
>>> 
>>> You'll also see JSON files in which the keys are not quote delimited. They aren't valid JSON files, but you do see them.
>>> 
>>> My take: 
>>> 
>>> 1. It's OK to require people to make their files valid JSON (e.g. I don't think we need to support keys without quotes)
>>> 2. We really should have a serialization format for collections of objects. One format is sufficient, people can massage things into that format. 
>>> 
>>> Jonathan
>>> 
>>> Jonathan
>>> 
>>> From: Michael Kay [mike@saxonica.com <mailto:mike@saxonica.com>]
>>> Sent: Wednesday, September 09, 2015 5:21 PM
>>> To: public-xsl-query@w3.org <mailto:public-xsl-query@w3.org>
>>> Subject: JSON data in the wild
>>> 
>>> I tried to download some real JSON data today - from openweathermap.org <http://openweathermap.org/> - and found that it’s in a format we can’t handle. Specifically, a sequence of maps/objects, newline-separated:
>>> 
>>> {"_id":707860,"name":"Hurzuf","country":"UA","coord":{"lon":34.283333,"lat":44.549999}}
>>> {"_id":519188,"name":"Novinki","country":"RU","coord":{"lon":37.666668,"lat":55.683334}}
>>> {"_id":1283378,"name":"Gorkhā","country":"NP","coord":{"lon":84.633331,"lat":28}}
>>> 
>>> I wonder if this is common and whether we should cater for it?
>>> 
>>> Michael Kay
>>> Saxonica
>> 
>> -- 
>> ========================================================================
>> Jim Melton --- Editor of ISO/IEC 9075-* (SQL)     Phone: +1.801.942.0144
>>   Chair, ISO/IEC JTC1/SC32 and W3C XML Query WG    Fax : +1.801.942.3345
>> Oracle Corporation        Oracle Email: jim dot melton at oracle dot com
>> 1930 Viscounti Drive      Alternate email: jim dot melton at acm dot org
>> Sandy, UT 84093-1063 USA  Personal email: SheltieJim at xmission dot com
>> ========================================================================
>> =  Facts are facts.   But any opinions expressed are the opinions      =
>> =  only of myself and may or may not reflect the opinions of anybody   =
>> =  else with whom I may or may not have discussed the issues at hand.  =
>> ======================================================================== 

Received on Thursday, 10 September 2015 01:06:12 UTC