Udit Vashisht
Author: Udit Vashisht


Matplotlib Tutorial in Python | Chapter 2 | Extracting Data from CSVs and plotting Bar Charts.

  • Aug. 25, 2019, 7:03 p.m.
  • 8 minutes read
  • 50 Views
Matplotlib Tutorial in Python | Chapter 2 | Extracting Data from CSVs and plotting Bar Charts.

Matplotlib Tutorial in Python

Chapter 2 | Extracting Data from CSVs and plotting Bar Charts

In the last chapter, we learned to draw simple plots in Matplotlib and further customizing it.In this chapter we will be learning to extract data from external sources like CSV and creating Matplotlib bar charts.

Matplotlib Tutorials in Python - Creating Simple Bar Charts

Just like plt.plot() , you can simply use plt.bar() to create Matplotlib Bar Chart.

# matplotlib_barchart_tutorial.py

import matplotlib.pyplot as plt
plt.style.use('ggplot')

ages = [12, 13, 14, 15, 16, 17, 18, 19, 20, 21]

total_population = [27877307, 24280683, 25258169, 25899454, 24592293, 21217467, 27958147, 20859088, 28882735, 19978972]

plt.bar(ages, total_population, color='b', label="Total Population")
plt.legend()
plt.xlabel("Age")
plt.ylabel("Total Population")
plt.title("Age-wise population of India")
plt.show()

Just like plt.plot(), we can pass the color argument to the plt.bar() also.

matplotlib_tutorial_barchart_simple.png

Matplotlib Tutorials in Python - Adding Line Plots to Bar Charts

We can add line plots to overlap Matplotlib Bar Charts.

# matplotlib_barchart_tutorial.py

male_population = [14637892, 12563775, 13165128, 13739746, 13027935, 11349449, 15020851, 10844415, 14892165, 10532278]

female_population = [13239415, 11716908, 12093041, 12159708, 11564358, 9868018, 12937296, 10014673, 13990570, 9446694]

plt.plot(ages, male_population, color='g', linestyle='--', marker='o', label="Male Population")
plt.plot(ages, female_population, color='r', linestyle='-', marker='^', label="Female Population")

matplotlib_tutorial_barchart_simple_.png

Matplotlib Tutorials in Python - Adding more Bar Charts to the Matplotlib Plot

We can simply add more than one Bar Chart in a Matplollib Plot by running plt.bar().

matplotlib_tutorial_barchart_mutliple_.png

In the above plot, for each age range, female population < male population < total population, hence we can distinguish between the data but if the data is not linear like this, it won’t be possible to read the data from the plot that effectively.

Matplotlib Tutorials in Python - Stacking bars side-by-side in Matplotlib Bar Chart

To overcome the problem mentioned above, we will have to perform a little hack to place our Bars in the Matplotlib Charts side-by-side for better readability. For this we will be using numpy.Numpy will be installed automatically while installing matplotlib, if not, you can alternatively install it by:-

pip install numpy

Then, we will be adding following code to display the Bars side-by-side in the Matplotlib Bar Chart Plot:-

# matplotlib_barchart_tutorial.py

import numpy as np
age_indexes = np.arange(len(ages))
width = .30

plt.bar(age_indexes + width, total_population, width=width, color='b', label="Total Population")
plt.bar(age_indexes, male_population, width=width, color='g', label="Male Population")
plt.bar(age_indexes - width, female_population, width=width, color='r', label="Female Population")

plt.xticks(ticks=age_indexes, labels=ages)

Let me walk you through the code one by one:-

  1. We have imported the numpy as np.
  2. Then, we have created an array which has as many items as ages by using numpy.arange() and len() method to get the length of ages.
  3. Then we have defined the default width of each bar, this will be used to offset the bars to the left and right of the main bar chart.
  4. In plt.bar(), we have used +/- width to offset a particular bar to left and right of the male_population bar. Also, we have set the width = 0.30.
  5. Finally, to show x_indexes (i.e. from 0 to len of the list), we have set the labels of x-axis = ages.

matplotlib_tutorial_barchart_mutliple_side.png

Matplolib Tutorials in Python- Extracting data from CSV and plotting it as Matplotlib Bar Chart

In this section, we will parse data from a csv file using csv module and some other modules of python and then plot it as a Matplotlib Bar Chart using plt.bar() in matplotlib.

For this we will be using the data of services provided by various RPOs in India. I have downloaded the csv file from here and renamed it to ‘data.csv’.

We will be using the in-built csv module module to parse the data from the file. First of all let us have a look at what the data is:-

# matplotlib_barchart_tutorial.py

import csv
with open('../data/data.csv', 'r') as csv_file:
    csv_reader = csv.DictReader(csv_file)
    row = next(csv_reader)
    print(row)

# Output

OrderedDict([('ServiceName', 'Applications Received - Scheme wise'), ('RpoName', 'RPO Ahmedabad'), ('SchemeType', 'Normal'), ('LastWeekCount', '14196'), ('LastMonthCount', '68775'), ('YearTillDate', '447831'), ('Date', '2019-08-25 04:22:41.094411')])

We will be extracting the data by SchemeType and getting the numbers processed for that particular scheme from YearTillDate. Then we will be adding the said data for each RPO (RpoName). We will be creating an empty python dictionary called schemes_dict and add the keys and values using code.

# matplotlib_barchart_tutorial.py

schemes_dict = {}   #empty dictionary
with open('../data/data.csv', 'r') as csv_file:
    csv_reader = csv.DictReader(csv_file)
    for row in csv_reader:
        if row['SchemeType'] not in schemes_dict.keys():
            schemes_dict[row['SchemeType']] = int(row['YearTillDate'])
        else:
            schemes_dict[row['SchemeType']] += int(row['YearTillDate'])

This will give us the following dictionary.

# matplotlib_barchart_tutorial.py

print (schemes_dict)

# Output
{'Normal': 6839295, 'Tatkaal': 258603, 'FEMALE': 2586870, 'MALE': 4510888, 'TRANSGENDER': 249, 'Count of Applications': 7098007, '10TH PASS AND ABOVE': 2603471, '5TH PASS OR LESS': 33944, 'BETWEEN 6TH AND 9TH STANDARD': 38233, 'GRADUATE AND ABOVE': 2073126, 'Between_18_to_35': 3644579, 'Between_36_to_60': 1989386, 'GreaterThan60': 500636, 'LessThan18': 963406, 'Challan': 209902400, 'Credit/ Debit Card': 5000286450, 'Online': 3408535500, 'FRESH': 4519001, 'PCC': 333621, 'REISSUE': 2239541, 'No Verification': 931889, 'Post Verification': 1005761, 'Pre Verification': 4333366, 'More than 21 Days': 837586, 'Within 21 Days': 4210361}

Now, the values here has a very vast range, so we will sort the keys and values in descending order.

# matplotlib_barchart_tutorial.py

sorted_schemes = sorted(schemes_dict.items(), key=lambda kv: -kv[1])
print(sorted_schemes)

# Output

[('Credit/ Debit Card', 5000286450), ('Online', 3408535500), ('Challan', 209902400), ('Count of Applications', 7098007), ('Normal', 6839295), ('FRESH', 4519001), ('MALE', 4510888), ('Pre Verification', 4333366), ('Within 21 Days', 4210361), ('Between_18_to_35', 3644579), ('10TH PASS AND ABOVE', 2603471), ('FEMALE', 2586870), ('REISSUE', 2239541), ('GRADUATE AND ABOVE', 2073126), ('Between_36_to_60', 1989386), ('Post Verification', 1005761), ('LessThan18', 963406), ('No Verification', 931889), ('More than 21 Days', 837586), ('GreaterThan60', 500636), ('PCC', 333621), ('Tatkaal', 258603), ('BETWEEN 6TH AND 9TH STANDARD', 38233), ('5TH PASS OR LESS', 33944), ('TRANSGENDER', 249)]

Here we have used the sorted() function to sort the dict and have defined the key as the second item of dict (i.e. value) using a lambda function. The output here is a list of tuples with each tuple having scheme and total number. The first three/four values are way higher than the rest of them, so we will be taking 10 items starting from fifth item to get a good bar chart.

# matplotlib_barchart_tutorial.py

scheme = []
total_number = []
for item in sorted_schemes[4:14]:
    scheme.append(item[0])
    total_number.append(item[1])

So, here we have created two list of ten items each, the first one contains the scheme type and the other one contains the total number of applications processsed in the said scheme. Now we will be plotting this data.

# matplotlib_barchart_tutorial.py

import matplotlib.pyplot as plt
import csv

schemes_dict = {}
with open('../data/data.csv', 'r') as csv_file:
    csv_reader = csv.DictReader(csv_file)
    for row in csv_reader:
        if row['SchemeType'] not in schemes_dict.keys():
            schemes_dict[row['SchemeType']] = int(row['YearTillDate'])
        else:
            schemes_dict[row['SchemeType']] += int(row['YearTillDate'])
sorted_schemes = sorted(schemes_dict.items(), key=lambda kv: -kv[1])

scheme = []
total_number = []
for item in sorted_schemes[3:13]:
    scheme.append(item[0])
    total_number.append(item[1])

plt.bar(scheme, total_number)
plt.xlabel("Scheme")
plt.ylabel("Total Applications")
plt.title("Scheme-wise Passport Applications processed in India")
plt.show()

We will get the following Matplotlib plot:-

matplotlib_tutorial_barchart_csv.png

Here, the x-labels are not legible, Let us change it. We will be using rotation method in plt.xticks() to rotate it vertically i.e. 90 degrees (You can rotate it to any angle, play around). Then we will be using plt.subplots_adjust(), to adjust the plot by setting left and bottom margin.

# matplotlib_barchart_tutorial.py
plt.xticks(rotation=90)
plt.subplots_adjust(left=.15, bottom=.4)

matplotlib_tutorial_barchart_csv_.png

Let us use plt.tight_layout() to make it better:-

# matplotlib_barchart_tutorial.py

plt.tight_layout()
plt.show()

matplotlib_tutorial_barchart_csv__.png

Creating horizontal barchart in Matplotlib

Creating a horizontal bar chart is as easy as using plt.barh() instead of plt.bar() and tweaking a bit with plt.xlabel() and plt.ylabel().

# matplotlib_barchart_tutorial.py

plt.barh(scheme, total_number)
plt.xlabel("Total Applications")
plt.ylabel("Scheme")
plt.title("Scheme-wise Passport Applications processed in India")
plt.xticks(rotation=90)
plt.subplots_adjust(left=.15, bottom=.4)
plt.tight_layout()
plt.show()

matplotlib_tutorial_barchart_csv___.png

If you have liked our tutorial, there are various ways to support us, the easiest is to share this post. You can also follow us on facebook, twitter and youtube.

In case of any query, you can leave the comment below.

In the next chapter we will learn about drawing Pi-charts in Matplotlib in Python

Table of Contents of Matplotlib Tutorials for Python

Matplotlib Tutorial in Python | Chapter 1 | Introduction

Matplotlib Tutorial in Python | Chapter 2 | Extracting Data from CSVs and plotting Bar Charts

Pie Charts in Python | Matplotlib Tutorial in Python | Chapter 3

Matplotlib Stack Plots/Bars | Matplotlib Tutorial in Python | Chapter 4

Filling Area on Line Plots | Matplotlib Tutorial in Python | Chapter 5

Python Histograms | Matplotlib Tutorial in Python | Chapter 6

Scatter Plotting in Python | Matplotlib Tutorial | Chapter 7

Plot Time Series in Python | Matplotlib Tutorial | Chapter 8

Python Realtime Plotting | Matplotlib Tutorial | Chapter 9

Matplotlib Subplot in Python | Matplotlib Tutorial | Chapter 10

Python Candlestick Chart | Matplotlib Tutorial | Chapter 11

If you have liked our tutorial, there are various ways to support us, the easiest is to share this post. You can also follow us on facebook, twitter and youtube.

In case of any query, you can leave the comment below.

If you want to support our work. You can do it using Patreon.



Related Posts

Python Realtime Plotting | Matplotlib Tutorial | Chapter 9
By Udit Vashisht | 2 months ago

Python Realtime Plotting in Matplotlib

Python Realtime Plotting | Chapter 9

In this tutorial, we will learn to plot live data in python using matplotlib. In the beginning, we will be plotting realtime data from a local script and later on we will create a python live plot ...

Read More
How to host and schedule your python script on PythonAnywhere?
By Udit Vashisht | 9 months, 3 weeks ago

In our last post here, we learned how to create a python script which will automatically delete your messages from gmail account based on a query.

The complete ready-to-use code can be found here.

Follow the steps in the above tutorial to download the ‘credentials.json’ file and ...

Read More
Python unittest module - How to test your python code?
By Udit Vashisht | 1 month, 2 weeks ago

Python unittest module

Why to unit test your python source code?

All programmers want their code to be impeccable, but as the saying goes, to err is human, we make mistakes and leave bugs in our source code. Here is where the unit testing comes to our rescue. If you ...

Read More
Search