Re: [Poplus] Popolo and JSON-LD from James McKinney on 2015-04-13 (public-opengov@w3.org from April 2015)

From: James McKinney <james@opennorth.ca>
Date: Mon, 13 Apr 2015 11:28:08 -0400
To: Andreas Kuckartz <A.Kuckartz@ping.de>
Cc: poplus@googlegroups.com, public-opengov@w3.org
Message-Id: <3D987BCF-BD08-42A0-85BE-7EFAF95EA679@opennorth.ca>
> On Apr 13, 2015, at 1:19 AM, Andreas Kuckartz <A.Kuckartz@ping.de> wrote:
> 
> Am 08.04.2015 23:53, schrieb James McKinney:
>> All the Popolo-compliant data I’m aware of uses plain JSON. There’s
>> no comprehensive registry - I’m not sure what the utility of such a
>> registry would be, outside of focused projects/campaigns like
>> EveryPolitician.
> 
> Data can not be used if it can not be accessed. And it can not be
> accessed if it is not known where it is or that it does exist at all.
> 
> I would like to analyse the data, convert it to RDF and do something
> with that heap of Linked Data (and in combination with other Linked
> Data). Currently I can do none of these.

Thanks for explaining your use case. Having access to all available Popolo data is not a common use case, which is why there is no registry of all Popolo data. More common use cases are, “I want a list of all Canadian MPs” or “I want an API for all votes in the House of Commons.” In such use cases, the format is not important; the person will use whatever data they can find. It’s a small number of people searching specifically for Popolo data. Also, in such cases, the use case is geographically scoped, so what matters is local awareness/promotion, not international awareness/promotion, which is more difficult.

So, for your use case, I think you’ll need to do some research yourself, which if you share with the community, others might contribute to. Besides various PopIt deployments, I also know of:

http://docs.opencivicdata.org/en/latest/api/index.html <http://docs.opencivicdata.org/en/latest/api/index.html> (same API is at http://scrapers.herokuapp.com/ <http://scrapers.herokuapp.com/>)
https://github.com/KohoVolit/api.parldata.eu <https://github.com/KohoVolit/api.parldata.eu>
https://github.com/tmtmtmtm/eduskunta-popolo <https://github.com/tmtmtmtm/eduskunta-popolo>


>>> 4. Is anyone aware of tools to convert pure Popolo JSON to a
>>> JSON-LD serialization ?
>> 
>> For any Popolo-compliant plain JSON document, you can (for most
>> properties) just apply the appropriate JSON-LD context; in other
>> words, most plain JSON documents are JSON-LD documents with an
>> implicit @context.
> 
> I am interested in machine readable Linked Data. Implicit contexts are
> out of band information and therefore prevent machine readability.

JSON-LD has not been the preferred serialization for Popolo data. The only way to avoid out-of-band contexts is to eliminate the plain JSON serialization, which I think will be an unpopular choice.

> And if the implicit context does not apply to all properties there is
> an information loss, something I do not find acceptable because the
> JSON-LD serialization should contain at least as much information as
> the pure JSON version.

The JSON-LD serialization can contain the same information as the plain JSON serialization. It’s just that in some specific cases, the two serializations will look different. Those differences are not resolvable without making the plain JSON more complicated for no apparent reason. For example, a plain JSON Speech may be:

{
  “creator_id”: “john-f-kennedy”,
  “text”: “Ask not what your country can do for you…”,
  “audio”: "http://example.com/audio/jfk-inauguration.ogg <http://example.com/audio/jfk-inauguration.ogg>”
}

But the JSON-LD Speech would be:

{
  “@context”: "http://www.popoloproject.com/contexts/speech.jsonld <http://www.popoloproject.com/contexts/speech.jsonld>”,
  “@type”: “Speech”,
  “@id”: “http://example.com/speeches/jfk-inauguration <http://example.com/speeches/jfk-inauguration>”,
  “creator”: “http://example.com/speakers/john-f-kennedy <http://example.com/speakers/john-f-kennedy>”,
  “text”: “Ask not what your country can do for you…”,
  “audio”: {
    "url”: "http://example.com/audio/jfk-inauguration.ogg <http://example.com/audio/jfk-inauguration.ogg>”
  }
}

The two differences are (1) that plain JSON allows using project-specific identifiers in the *_id fields, whereas JSON-LD shouldn’t use any *_id fields and (2) that “audio” must be a schema:AudioObject in JSON-LD. To my knowledge, JSON-LD contexts don’t allow you to say, “alias the property ‘audio' to a blank node of type schema:AudioObject with the property ‘url’.”

Allowing the use of internal identifiers without having to produce a URL for those identifiers has been a clear use case, hence the *_id fields. There has been no use case for adding additional properties about an audio file, which is why the plain JSON is just a direct link instead of an object as in JSON-LD.

To eliminate the difference on the audio property, I propose substituting og:audio for schema:audio.

James
Received on Monday, 13 April 2015 15:28:43 UTC