Python Pandas is the most popular and downloaded module of Python. In our previous post, we have given a detailed introduction about Python Pandas and how to install python pandas on MacOS, Windows, Linux, etc. In this post, we will learn how to set index of a Python Pandas’ Dataframe.
Python Pandas Tutorial - Setting index of a Python Pandas’ dataframe
For the purpose of this tutorial, we will be using Stack Overflow’s developer survey data for 2019. You can download the data from here. Let’s start coding. We will create a Pandas Dataframe from a CSV file using pandas.read_csv().
#python-pandas-tutorial.py
import pandas as pd
df = pd.read_csv('data/survey_results_public.csv')
print (df)
Output
Respondent ... SurveyEase
0 1 ... Neither easy nor difficult
1 2 ... Neither easy nor difficult
2 3 ... Neither easy nor difficult
3 4 ... Easy
4 5 ... Easy
[5 rows x 85 columns]
If you have a close look at the data, the first column without any column name starting with 0 is the index of the pandas dataframe. Since, we have not explicitly set the index of the pandas dataframe, the python pandas has automatically set the default index ranging from 0 to (n-1) for a n-rowed python dataframe. We can also check out the index as under:-
#python-pandas-tutorial.py
df.index
Output
RangeIndex(start=0, stop=88883, step=1)
Since, this pandas dataframe already has column ‘Respondent’ with unique values, we can set the same as index of the pandas dataframe using the following code.
#python-pandas-tutorial.py
df.set_index('Respondent')
But interestingly, if you are not using Jupyter notebook, this won’t make any difference, because this only changes the index temporarily and printing out the dataframe again will show the old dataframe. So, to make it a permanent change, you will have to use ‘inplace = True’ as an argument to the above method.
#python-pandas-tutorial.py
df.set_index('Respondent', inplace = True)
Respondent ... SurveyEase
1 ... Neither easy nor difficult
2 ... Neither easy nor difficult
3 ... Neither easy nor difficult
4 ... Easy
5 ... Easy
[5 rows x 84 columns]
Resetting the index of a Pandas dataframe
If you think, that you have accidentaly, set the index then you can use reset_index() to reset it to the original state, but in this case also, you will have to use ‘inplace = True’ argument.
#python-pandas-tutorial.py
df.reset_index(inplace = True)
Setting the index of a Pandas dataframe while reading in the CSV file
Alternatively, if you have an idea about the CSV file from which you are creating the pandas dataframe, you can set the index of the pandas dataframe while reading in the source CSV file as under:-
#python-pandas-tutorial.py
df = pd.read_csv('data/survey_results_public.csv', index_col='Respondent')
This command will set the ‘Respondent’ as the index of the pandas dataframe. In case of any doubts, feel free to leave the comment.
You can also read more about the data-structures of Python Pandas i.e. Pandas Series and Pandas Dataframes from here.
We also have a video series on Python Pandas Tips and Tricks on our youtube channel