Skip to content

Examples

Xavier Badosa edited this page Jan 30, 2017 · 37 revisions

WikiExamples

A Eurostat Example

Let's assume that we must build a spreadsheet table of unemployment rate by country (columns) and year (rows).

Sample spreadsheet

Steps

1. Retrieve the unemployment rate by country in recent years from Eurostat

You'll need to find the Eurostat dataset id. Go to

http://ec.europa.eu/eurostat/data/database

and then

Tables on EU policy
  > Employment performance monitor - indicators (tesem)
    > Unemployment rate (tesem120)

Connect to the JSON-stat Eurostat API to retrieve dataset tesem120:

http://ec.europa.eu/eurostat/wdds/rest/data/v2.1/json/en/tesem120?precision=1

You can view the contents of the dataset at

http://json-stat.com/explorer/#/http%3A%2F%2Fec.europa.eu%2Feurostat%2Fwdds%2Frest%2Fdata%2Fv2.1%2Fjson%2Fen%2Ftesem120%3Fprecision%3D1

To download the dataset from the command line using Eurostat's API, run cURL:

curl http://ec.europa.eu/eurostat/wdds/rest/data/v2.1/json/en/tesem120?precision=1 -o eurostat.jsonstat

JSON-stat keeps data and metadata completely apart because it is designed to be an efficient transport format. That probably makes it human-unfriendly. Fortunately, we can use jsonstat-conv to convert JSON-stat to other formats.

2. Convert JSON-stat to a more popular JSON data structure

If you don't have jsonstat-conv installed in your system or have a version that is lower than 0.5.4, please (re)install it:

npm install -g jsonstat-conv

To check your version of jsonstat-conv:

jsonstat2csv --version

jsonstat-conv includes a command to translate JSON-stat directly into CSV (jsonstat2csv). For example,

jsonstat2csv eurostat.jsonstat eurostat-comma.csv

will produce a comma-delimited CSV using dot as the decimal mark (eurostat-comma.csv).

If you need a semicolon-delimited CSV with comma as the decimal mark, use instead:

jsonstat2csv eurostat.jsonstat eurostat-semi.csv --column ";"

jsonstat2csv comes with several options that allow us to customize the output. For example, to include status information and rename the "Status" and "Value" columns to "symbol" and "rate":

jsonstat2csv eurostat.jsonstat eurostat-status.csv --status --vlabel rate --slabel symbol

jsonstat2csv, though, does not include options to change the structure of the data. These features are available only in jsonstat2arrobj which is the main way to export JSON-stat to other formats. In particular, jsonstat2arrobj converts JSON-stat into an array of objects (arrobj, from now on), which is probably the most popular structure used to exchange data in JSON. As a consequence, there are many tools to transform arrobjs into other JSON flavors or even to convert them to CSVs.

So instead of converting Eurostat's JSON-stat directly to CSV using jsonstat2csv we are going to translate it to an arrobj using jsonstat2arrobj and then convert it to CSV using open source tools.

To convert JSON-stat into an array of objects:

jsonstat2arrobj eurostat.jsonstat eurostat.json

3. Customize the array of objects

As you probably noticed, the eurostat.json does not have the structure we need in our spreadsheet (country in columns and year in rows). By default, jsonstat2arrobj stores the value for each combination of categories in a property called value:

[
  {
    "unit": "Percentage of active population",
    "sex": "Total",
    "age": "Total",
    "time": "2005",
    "geo": "Austria",
    "value": 5.6
  },
  ...
]

Instead, we need to have a property for each category of the geo dimension (where the country information is) to store their values for each combination of the categories of the rest of the dimensions:

[
  {
    "unit": "Percentage of active population",
    "sex": "Total",
    "age": "Total",
    "time": "2005",
    "AT": 5.6,
    "BE": 8.5,
    "BG": 10.1,
    "CY": 5.3,
    ...
  },
  ...
]

That is, we need to transpose the values by geo:

jsonstat2arrobj eurostat.jsonstat eurostat-transp.json --by geo

Dataset tesem120 contains several single-category dimensions: sex and age are always "Total" and unit is always "Percentage of active population". We can remove them from the transposed JSON:

jsonstat2arrobj eurostat.jsonstat eurostat-drop.json --by geo --drop sex,age,unit

4. Convert JSON to CSV

For this task, there are many tools out there. We will be using d3-dsv, which includes a json2csv command.

npm install -g d3-dsv

To convert our last JSON to CSV:

json2csv < eurostat-drop.json > eurostat.csv

The resulting CSV has comma as the column delimiter and dot as the decimal mark. If what you need is a CSV with comma as the decimal mark, first you must use the jsonstat2arrobj --comma (decimal mark) option:

jsonstat2arrobj eurostat.jsonstat eurostat-comma.json --by geo --drop sex,age,unit --comma

And then specify in json2csv a different column delimiter (for example, a semicolon):

json2csv < eurostat-comma.json > eurostat-semi.csv -w ";"

5. Altogether now

All the process has required three lines and three files (eurostat.jsonstat, eurostat-drop.json, eurostat.csv):

curl http://ec.europa.eu/eurostat/wdds/rest/data/v2.1/json/en/tesem120?precision=1 -o eurostat.jsonstat

jsonstat2arrobj eurostat.jsonstat eurostat-drop.json --by geo --drop sex,age,unit

json2csv < eurostat-drop.json > eurostat.csv

This is not necessary, though: all the process can be done in a single line, piping the output of a program to another program. For that, we need to enable jsonstat2arrobj stream interface (--stream) which will allow us to use pipes (|) and redirects (<, >).

In the stream interface, this command

jsonstat2arrobj eurostat.jsonstat eurostat.json

must be rewritten as

jsonstat2arrobj < eurostat.jsonstat > eurostat.json --stream

So to get a comma-delimited CSV with dot as the decimal mark in a single line:

curl http://ec.europa.eu/eurostat/wdds/rest/data/v2.1/json/en/tesem120?precision=1 | jsonstat2arrobj --by geo --drop sex,age,unit --stream | json2csv > eurostat.csv

Or a little shorter:

curl http://ec.europa.eu/eurostat/wdds/rest/data/v2.1/json/en/tesem120?precision=1 | jsonstat2arrobj -b geo -d sex,age,unit -t | json2csv > eurostat.csv

And to get a semicolon-delimited CSV with comma as the decimal mark:

curl http://ec.europa.eu/eurostat/wdds/rest/data/v2.1/json/en/tesem120?precision=1 | jsonstat2arrobj -b geo -d sex,age,unit -k -t | json2csv > eurostat-semi.csv -w ";"

A UNECE Example

Let's assume that we must build a spreadsheet table of unemployment rate by country (columns) and year (rows), but this time based on UNECE's data.

The requirements are the same as in Eurostat's example, so please read it first if you haven't.

Steps

1. Retrieve the unemployment rate by country in recent years from UNECE

First, we need to find the UNECE table with such information. Visit the UNECE Statistical Database

http://w3.unece.org/PXWeb2015/pxweb/en/STAT/

and then

UNECE Statistical Database
  > Country Overviews
    > UNECE Member Countries in Figures
      > Country Overview by Country and Time

On table Country Overview by Country and Time select Unemployment rate as Indicator, all countries for the Country dimension and all years for the Year dimension. Then build the table (Table - Layout 1) and select the About table tab.

At the bottom of the page, click on Make this table available in your application to get the URL and the JSON query needed to connect to UNECE's API and retrieve the unemployment rate for all countries and all years.

URL

http://w3.unece.org/PXWeb2015/api/v1/en/STAT/10-CountryOverviews/01-Figures/ZZZ_en_CoSummary_r.px

JSON query

{
  "query": [
    {
      "code": "Indicator",
      "selection": {
        "filter": "item",
        "values": [ "c0000233" ]
      }
    },
    {
      "code": "Country",
      "selection": {
        "filter": "item",
        "values": [ "008", "051", "040", "031", "112", "056", "070", "100", "124", "191", "196", "203", "208", "233", "246", "250", "268", "276", "300", "348", "352", "372", "376", "380", "398", "417", "428", "440", "442", "470", "498", "499", "528", "578", "616", "620", "642", "643", "688", "703", "705", "724", "752", "756", "762", "807", "792", "795", "804", "826", "840", "860" ]
      }
    }
  ],
  "response": {
    "format": "px"
  }
}

The URL determines the table we are retrieving the information from. The JSON query specifies what data should be retrieved and how it should be formatted. In the example above, we are selecting Indicator c0000233 (unemployment rate) and all the available countries (008, 051, 040, 031, ...). There is no time filter, meaning all available years will be included in the response. Finally the response format by default is px. We need to change it to json-stat:

JSON query (long version)

{
  "query": [
    {
      "code": "Indicator",
      "selection": {
        "filter": "item",
        "values": [ "c0000233" ]
      }
    },
    {
      "code": "Country",
      "selection": {
        "filter": "item",
        "values": [ "008", "051", "040", "031", "112", "056", "070", "100", "124", "191", "196", "203", "208", "233", "246", "250", "268", "276", "300", "348", "352", "372", "376", "380", "398", "417", "428", "440", "442", "470", "498", "499", "528", "578", "616", "620", "642", "643", "688", "703", "705", "724", "752", "756", "762", "807", "792", "795", "804", "826", "840", "860" ]
      }
    }
  ],
  "response": {
    "format": "json-stat"
  }
}

This query can be simplified a little. The UNECE Statistical Database uses the PC-Axis API. In the example, countries are selected one by one using UNECE's country code (008, 051, 040, 031, ...), but the API supports a general syntax to select all the categories of a dimension ("filter": "all", "values": ["*"]):

JSON query (shorter version)

{
  "query": [
    {
      "code": "Indicator",
      "selection": {
        "filter": "item",
        "values": [
          "c0000233"
        ]
      }
    },
    {
      "code": "Country",
      "selection": {
        "filter": "all",
        "values": ["*"]
     }
    }
  ],
  "response": {
    "format": "json-stat"
  }
}

To download the dataset from the command line using the long version, let's run cURL:

curl -X POST -d '{ "query": [ { "code": "Indicator", "selection": { "filter": "item", "values": [ "c0000233" ] } }, { "code": "Country", "selection": { "filter": "item", "values": [ "008", "051", "040", "031", "112", "056", "070", "100", "124", "191", "196", "203", "208", "233", "246", "250", "268", "276", "300", "348", "352", "372", "376", "380", "398", "417", "428", "440", "442", "470", "498", "499", "528", "578", "616", "620", "642", "643", "688", "703", "705", "724", "752", "756", "762", "807", "792", "795", "804", "826", "840", "860" ] } } ], "response": { "format": "json-stat" } }' http://w3.unece.org/PXWeb2015/api/v1/en/STAT/10-CountryOverviews/01-Figures/ZZZ_en_CoSummary_r.px -o unece.jsonstat

Or using the shorter version of the JSON query:

curl -X POST -d '{ "query": [ { "code": "Indicator", "selection": { "filter": "item", "values": [ "c0000233" ] } }, { "code": "Country", "selection": { "filter": "all", "values": [ "*" ] } } ], "response": { "format": "json-stat" } }' http://w3.unece.org/PXWeb2015/api/v1/en/STAT/10-CountryOverviews/01-Figures/ZZZ_en_CoSummary_r.px -o unece.jsonstat

On Windows, quotes are treated differently: we'll need to rewrite the previous two lines as:

curl -X POST -d "{ \"query\": [ { \"code\": \"Indicator\", \"selection\": { \"filter\": \"item\", \"values\": [ \"c0000233\" ] } }, { \"code\": \"Country\", \"selection\": { \"filter\": \"item\", \"values\": [ \"008\", \"051\", \"040\", \"031\", \"112\", \"056\", \"070\", \"100\", \"124\", \"191\", \"196\", \"203\", \"208\", \"233\", \"246\", \"250\", \"268\", \"276\", \"300\", \"348\", \"352\", \"372\", \"376\", \"380\", \"398\", \"417\", \"428\", \"440\", \"442\", \"470\", \"498\", \"499\", \"528\", \"578\", \"616\", \"620\", \"642\", \"643\", \"688\", \"703\", \"705\", \"724\", \"752\", \"756\", \"762\", \"807\", \"792\", \"795\", \"804\", \"826\", \"840\", \"860\" ] } } ], \"response\": { \"format\": \"json-stat\" } }" http://w3.unece.org/PXWeb2015/api/v1/en/STAT/10-CountryOverviews/01-Figures/ZZZ_en_CoSummary_r.px -o unece.jsonstat
curl -X POST -d "{ \"query\": [ { \"code\": \"Indicator\", \"selection\": { \"filter\": \"item\", \"values\": [ \"c0000233\" ] } }, { \"code\": \"Country\", \"selection\": { \"filter\": \"all\", \"values\": [ \"*\"] } } ], \"response\": { \"format\": \"json-stat\" } }" http://w3.unece.org/PXWeb2015/api/v1/en/STAT/10-CountryOverviews/01-Figures/ZZZ_en_CoSummary_r.px -o unece.jsonstat

2. Convert JSON-stat to a more popular JSON data structure

This part is similar to the Eurostat one: we need to transpose by Country and drop Indicator (it's always Unemployment rate):

jsonstat2arrobj unece.jsonstat unece-transp.json --by Country --drop Indicator

Unfortunately, UNECE does not use 2-letter country codes. What country is 008? The result is not as user-friendly as Eurostat's.

Fortunately, we can use the label instead of the ID as the identifier of each category of the transposed dimension with the --bylabel (-l) option, added in jsonstat-conv version 0.5.4:

jsonstat2arrobj unece.jsonstat unece.json --by Country --bylabel --drop Indicator

If we need to use comma as the decimal mark:

jsonstat2arrobj unece.jsonstat unece-comma.json --by Country --bylabel --drop Indicator --comma

3. Convert JSON to CSV

Nothing new here: read Eurostat's example if you haven't.

json2csv < unece.json > unece.csv

If we want to use semicolon as the column delimiter:

json2csv < unece-comma.json > unece-semi.csv -w ";"

4. Altogether now

Comma-delimited CSV with dot as the decimal mark:

curl -X POST -d '{ "query": [ { "code": "Indicator", "selection": { "filter": "item", "values": [ "c0000233" ] } }, { "code": "Country", "selection": { "filter": "all", "values": [ "*" ] } } ], "response": { "format": "json-stat" } }' http://w3.unece.org/PXWeb2015/api/v1/en/STAT/10-CountryOverviews/01-Figures/ZZZ_en_CoSummary_r.px | jsonstat2arrobj -b Country -d Indicator -l -t | json2csv > unece.csv

On Windows:

curl -X POST -d "{ \"query\": [ { \"code\": \"Indicator\", \"selection\": { \"filter\": \"item\", \"values\": [ \"c0000233\" ] } }, { \"code\": \"Country\", \"selection\": { \"filter\": \"all\", \"values\": [ \"*\"] } } ], \"response\": { \"format\": \"json-stat\" } }" http://w3.unece.org/PXWeb2015/api/v1/en/STAT/10-CountryOverviews/01-Figures/ZZZ_en_CoSummary_r.px | jsonstat2arrobj -b Country -d Indicator -l -t | json2csv > unece.csv

Semicolon-delimited CSV with comma as the decimal mark:

curl -X POST -d '{ "query": [ { "code": "Indicator", "selection": { "filter": "item", "values": [ "c0000233" ] } }, { "code": "Country", "selection": { "filter": "all", "values": [ "*" ] } } ], "response": { "format": "json-stat" } }' http://w3.unece.org/PXWeb2015/api/v1/en/STAT/10-CountryOverviews/01-Figures/ZZZ_en_CoSummary_r.px | jsonstat2arrobj -b Country -d Indicator -l -k -t | json2csv > unece-semi.csv -w ";"

On Windows:

curl -X POST -d "{ \"query\": [ { \"code\": \"Indicator\", \"selection\": { \"filter\": \"item\", \"values\": [ \"c0000233\" ] } }, { \"code\": \"Country\", \"selection\": { \"filter\": \"all\", \"values\": [ \"*\"] } } ], \"response\": { \"format\": \"json-stat\" } }" http://w3.unece.org/PXWeb2015/api/v1/en/STAT/10-CountryOverviews/01-Figures/ZZZ_en_CoSummary_r.px | jsonstat2arrobj -b Country -d Indicator -l -k -t | json2csv > unece-semi.csv -w ";"
Clone this wiki locally