Udit Vashisht
Author: Udit Vashisht


Convert JSON to CSV using Python

JSON से CSV कैसे बनायें?

  • Feb. 21, 2019, 2:58 p.m.
  • 5 minutes read
  • 6547 Views
Convert JSON to CSV using Python

JSON to CSV in Python

Converting large JSON files to CSV could be a difficult task. But python is a powerhouse and it has lots of built-in and third party modules which make data processing a lot easier. You can read/write/parse large json files, csv files, dataframes, excel, pdf and many other file-types.

Recently, while helping out a friend, I came across a set of large Json data files from which a CSV file was to be generated. So, this is the post about how Python made it so easy for me to parse large json files and convert them to CSV files. I used python’s inbuilt modules called json and csv to convert JSON to CSV.

So, I have a directory called ‘json_to_csv’ which has another directory called ‘descriptions’ in it, which contains some 250+ large json files, which I need to parse and convert it to csv.We will be using an inbuilt module called json module.

json_to_csv_saralgyaan_python_3.png

JSON to CSV Python Code

First, we will check what each json file has. Create a python file call ‘json_to_csv.py’ in the base directory and write the following code:-

import json
from pprint import pprint
# we are using pprint for making the output more readable.

with open('descriptions/ISIC_0000000') as f:
    data = json.load(f)
    pprint(data)

The output data is in the form of dictionary with keys:- ‘_id’, ‘_modelType’, ‘created’, ‘creator’, ‘dataset’, ‘meta’, ‘name’, ‘notes’, ‘updated’ and few values are further in the form of nested dictionaries:-

json_to_csv_saralgyaan_python_3_1.png

We need to parse the following columns from the json file and convert it to csv file:-

_id,_modelType,creator___id,creator__name,dataset___accessLevel,dataset___id,dataset__description,dataset__name,meta__acquisition__image_type,meta__acquisition__pixelsX,meta__acquisition__pixelsY,meta__clinical__age_approx,meta__clinical__benign_malignant,meta__clinical__diagnosis,meta__clinical__diagnosis_confirm_type,meta__clinical__melanocytic,meta__clinical__sex,meta__unstructured__diagnosis,meta__unstructured__race,name

So create an empty ‘output.csv’ file and copy paste the above line as the header in it:-

$ nano output.csv

We will be making it a re-useable script and hence, we will not be doing any hard coding for it. We will be using os module to browse through the directories.

First of all, we will skim through the directory ‘descriptions’ and add all the files (filenames) into a list:-

import os
import json
import csv


def get_list_of_json_files():

    list_of_files = os.listdir('descriptions')

    return list_of_files

Now, for each json file we will extract the data of the relevant columns e.g. ‘_id’, ‘_modelType’ etc. and append it to a list. We must note that few of these columns are the keys of second level dictionaries as shown in the pic above. Create the following function:-

def create_list_from_json(jsonfile):

    with open(jsonfile) as f:
        data = json.load(f)

    data_list = []  # create an empty list

    # append the items to the list in the same order.
    data_list.append(data['_id'])
    data_list.append(data['_modelType'])
    data_list.append(data['creator']['_id'])
    data_list.append(data['creator']['name'])
    data_list.append(data['dataset']['_accessLevel'])
    data_list.append(data['dataset']['_id'])
    data_list.append(data['dataset']['description'])
    data_list.append(data['dataset']['name'])
    data_list.append(data['meta']['acquisition']['image_type'])
    data_list.append(data['meta']['acquisition']['pixelsX'])
    data_list.append(data['meta']['acquisition']['pixelsY'])
    data_list.append(data['meta']['clinical']['age_approx'])
    data_list.append(data['meta']['clinical']['benign_malignant'])
    data_list.append(data['meta']['clinical']['diagnosis'])
    data_list.append(data['meta']['clinical']['diagnosis_confirm_type'])
    data_list.append(data['meta']['clinical']['melanocytic'])
    data_list.append(data['meta']['clinical']['sex'])
    data_list.append(data['meta']['unstructured']['diagnosis'])
    # In few json files, the race was not there so using KeyError exception to add '' at the place
    try:
        data_list.append(data['meta']['unstructured']['race'])
    except KeyError:
        data_list.append("")  # will add an empty string in case race is not there.
    data_list.append(data['name'])

    return data_list

There are few things to note in respect of the above code:-

(i) The Order of the columns must be kept exactly the same as desired in the output.

(ii) For nested dictionaries, you will have to call all the keys [key1][key2][key3].

(iii) On scrutiny of the json file, I observed that there are few files which do not contain the ‘[‘meta’][‘unstructured’][‘race’]’ and hence the function threw the KeyError. Use KeyError Exception to give it desired value in the result. I have given it “” (an empty string) as default value.

Note:- We can also use .get() method to avoid key errors the syntax for the same is data.get(key,default_value).

Now, the final task. We will create a function write_csv(), which will open the ‘output.csv’ in append mode and then loop through all the json files, create the lists of desired data and then write it to the csv file.

def write_csv():

    list_of_files = get_list_of_json_files()
    for file in list_of_files:
        row = create_list_from_json(f'descriptions/{file}')  # create the row to be added to csv for each file (json-file)
        with open('output.csv', 'a') as c:
            writer = csv.writer(c)
            writer.writerow(row)
        c.close()

And finally, run the script:-

if __name__==__main__:
    write_csv()

We will get the desired Csv file in matter of a second.

The complete JSON TO CSV GITHUB code.

We need your support to keep our servers running

If you like our content and want to help us running our servers please contribute here



Related Posts

Create your Own Customizable Email Spam Filter using Python
By Udit Vashisht | 10 months, 1 week ago

Use Python and Gmail API to create your own Customizable Email Spam Filter

There was a time when you will run to the mailbox outside your home at a fixed time to check your mail and segregate the crap out of it. But today, we all live in a digital ...

Read More
How to make a Twitter Bot using Python and Tweepy
By Udit Vashisht | 7 months, 1 week ago

How to make a Twitter Bot using Python and Tweepy

If you are new to python and looking for some fun python project , a twitter bot is a must try. So in this tutorial we will make a twitter bot using python and tweepy . ...

Read More
Search