DigitalOcean Referral Badge
Udit Vashisht
Author: Udit Vashisht


Python Pandas Objects - Pandas Series and Pandas Dataframe

  • 11 minutes read
  • 2413 Views
Python Pandas Objects - Pandas Series and Pandas Dataframe

    Table of Contents

Python Pandas Objects - Pandas Series and Pandas Dataframe

In the last post, we discussed introduction and installation of Python Pandas. In this post, we will learn about pandas’ data structures/objects. Pandas provide two type of data structures:-

  • Pandas Series
  • Pandas Dataframe

Pandas Series

Pandas Series is a one dimensional indexed data, which can hold datatypes like integer, string, boolean, float, python object etc. A Pandas Series can hold only one data type at a time. The axis label of the data is called the index of the series. The labels need not to be unique but must be a hashable type. The index of the series can be integer, string and even time-series data. In general, Pandas Series is nothing but a column of an excel sheet with row index being the index of the series.

Different ways of creating/constructing a Pandas Series

We can create a Pandas Series by using the following pandas.Series() constructor:-

pandas.Series([data, index, dtype, name, copy, …])

The parameters for the constructor of a Python Pandas Series are detailed as under:-

Parameters Remarks
data : array-like, Iterable, dict, or scalar value Contains data stored in Series. Changed in version 0.23.0: If data is a dict, argument order is maintained for Python 3.6 and later.
index : array-like or Index (1d) Values must be hashable and have the same length as data. Non-unique index values are allowed. Will default to RangeIndex (0, 1, 2, …, n) if not provided. If both a dict and index sequence are used, the index will override the keys found in the dict.
dtype : str, numpy.dtype, or ExtensionDtype, optional Data type for the output Series. If not specified, this will be inferred from data. See the user guide for more usages.
copy : bool, default False Copy input data.

How to create an empty Pandas Series?

You can create an empty Pandas Series using pandas.Series() as under:-

import pandas as pd 
empty_series = pd.Series()
print(empty_series)

# Output

Series([], dtype: float64)

How to create a Pandas Series from a list?

You can create a Pandas Series from a Python list by passing the list to Pandas.Series() as under. In this case, the pandas will set the default index of the Series:-

import pandas as pd
data = pd.Series(['a', 'b', 'c', 'd'])
print (data)

# Output

0    a
1    b
2    c
3    d
dtype: object

Values and index of a Pandas Series

A Pandas Series consists of two parts - an index and values, you can check the index and values of a Pandas Series using Series.values and Series.index as under:-

print(data.values)
print(data.index)

# Output
['a' 'b' 'c' 'd']
RangeIndex(start=0, stop=4, step=1)

Setting an explicit index of a Pandas Series

In the above example, we did not specify any index for our Pandas Series, a default index ranging from 0 to n-1 (n being the length of the data) is created. You can also explicitly define the index of a Pandas Series as under:-

import pandas as pd
data_2 = pd.Series(['One', 'Two', 'Three', 'Four'], index=['a', 'b', 'c', 'd'])
print(data_2)

# Output
a      One
b      Two
c    Three
d     Four
dtype: object

How to create a Pandas Series from Numpy array?

You can also create a Pandas Series from a numpy array by passing the Numpy array to pandas.Series() as under:-

import numpy as np
import pandas as pd

data = np.array(['a', 'b', 'c', 'd', 'e', 'f'])
a_series = pd.Series(data)
print(a_series)

# Output

0    a
1    b
2    c
3    d
4    e
5    f
dtype: object

How to create a Pandas Series from a Python Dictionary?

You can create a Pandas Series from a dictionary by passing the dictionary to pandas.Series() as under. In this case, the index of the Pandas Series will be the keys of the dictionary and the values will be the values of the dictionary. It can be inferred that a Pandas Series is like a specialisation of a Python dictionary. A dictionary is a structure that maps arbitrary keys to a set of arbitrary values, and a Pandas Series is a structure that maps typed keys to a set of typed values. :-

import pandas as pd

a_dict = {'one': 1,
          'two': 2,
          'three': 3,
          'four': 4,
          }
a_series = pd.Series(a_dict)
print(a_series)

# Output

one      1
two      2
three    3
four     4
dtype: int64

You can also create a Pandas Series only from desired/selected keys of the Python dictionary by explicitly passing only desired indexes to pd.Series() as under:-

import pandas as pd

a_dict = {'one': 1,
          'two': 2,
          'three': 3,
          'four': 4,
          }
a_series = pd.Series(a_dict, index = ['one', 'three'])
print(a_series)

# Output

one      1
three    3
dtype: int64

In the above, example though we have passed the whole dictionary to pd.Series() but the Pandas Series has ignored the keys/values pair for the keys missing in the index argument.

How to create a Pandas Series from scalar data?

You can also, create a Pandas series from a scalar data. But, if you pass a single value with multiple indexes, the value will be same for all the indexes.

a_series = pd.Series(5, index=[100, 200, 300])
print(a_series)

# Output

100    5
200    5
300    5
dtype: int64

Pandas Dataframe

Pandas dataframe is a primary data structure of pandas. Pandas dataframe is a two-dimensional size mutable array with both flexible row indices and flexible column names. In general, it is just like an excel sheet or SQL table. It can also be seen as a python’s dict-like container for series objects.

Different ways of creating a Pandas Dataframe

A Pandas Dataframe can be created/constructed using the following pandas.DataFrame() constructor:-

pd.DataFrame([data, index, columns, dtype, name, copy, …])

A Pandas Dataframe can be created from:-

  • Dict of 1D ndarrays, lists, dicts, or Series
  • 2-D numpy.ndarray
  • Structured or record ndarray
  • A Series
  • Another DataFrame

The parameters for the constuctor of a Pandas Dataframe are detailed as under:-

Parameters Remarks
data : ndarray (structured or homogeneous), Iterable, dict, or DataFrame Dict can contain Series, arrays, constants, or list-like objects Changed in version 0.23.0: If data is a dict, column order follows insertion-order for Python 3.6 and later. Changed in version 0.25.0: If data is a list of dicts, column order follows insertion-order for Python 3.6 and later.
index : Index or array-like Index to use for resulting frame. Will default to RangeIndex if no indexing information part of input data and no index provided
columns : Index or array-like Column labels to use for resulting frame. Will default to RangeIndex (0, 1, 2, …, n) if no column labels are provided
dtype, default None Data type to force. Only a single dtype is allowed. If None, infer
copy : bool, default False Copy data from inputs. Only affects DataFrame / 2d ndarray input

How to create an empty Pandas Dataframe in Python?

You can create an empty Pandas Dataframe using pandas.Dataframe() and later on you can add the columns using df.columns = [list of column names] and append rows to it.

>>> import pandas as pd
>>> df = pd.DataFrame()
>>> df
Empty DataFrame
Columns: []
Index: []
>>> 

How to create a Pandas Dataframe from a single Series object?

We can create a Pandas Dataframe from a sing Pandas Series by passing the series in pd.DataFrame(), the index of the series will become the index of the dataframe and pandas will automatically set 0 as the column name of the Dataframe:-

population_dict = {'California': 38332521, 'Texas': 26448193, 'New York': 19651127, 'Florida': 19552860, 'Illinois': 12882135}
population = pd.Series(population_dict)
df = pd.DataFrame(population)
print (df)

# Output

                   0
California  38332521
Texas       26448193
New York    19651127
Florida     19552860
Illinois    12882135

Since, we have not passed the columns argument, it has been given a default value of 0.

How to create a Pandas Dataframe from a dictionary of two or more (multiple) Pandas Series?

We can create a Pandas Dataframe from multiple Pandas Series by passing the dictionary of multiple series to pd.DataFrame() as under. The keys of the dictionary will comprise the columns of the Pandas Dataframe:-

import pandas as pd

area_dict = {'California': 423967, 'Texas': 695662, 'New York': 141297,
             'Florida': 170312, 'Illinois': 149995}
population_dict = {'California': 38332521, 'Texas': 26448193, 'New York': 19651127, 'Florida': 19552860, 'Illinois': 12882135}

area = pd.Series(area_dict)
population = pd.Series(population_dict)
states = pd.DataFrame({'population': population, 'area': area})

print(states)

# Output 

            population    area
California    38332521  423967
Texas         26448193  695662
New York      19651127  141297
Florida       19552860  170312
Illinois      12882135  149995

As you can see here, the resulting index is the union of the keys of the dictionaries and the missing value will be replaced by NaN (not a number). You can optionally pass index (row labels) and columns (column labels) arguments also. A dict of series alongwith specific index will discard all data not matching the passed index.

How to create a Pandas Dataframe from a list of Python Dictionaries?

We can create a Pandas Dataframe from python dictionaries by passing the list of the dictionaries to pd.DataFrame():-

import pandas as pd
df = pd.DataFrame([{'a': 1, 'b': 2}, {'b': 3, 'c': 4}])
print(df)

# Output
     a  b    c
0  1.0  2  NaN
1  NaN  3  4.0

Here, the Pandas Dataframe has been constructed with columns as a union of keys of the dictionaries and the missing value has been added as ‘NaN’.

How to create a Pandas Dataframe from 2D Numpy array?

A pandas dataframe can also be created from a 2 dimensional numpy array by using the following code:-

import pandas as pd
import numpy as np

df = pd.DataFrame(np.random.rand(3, 2))
print(df)

# Output

          0         1
0  0.059926  0.119440
1  0.548637  0.232405
2  0.343573  0.809589

Since, we have not passed the column and index, the default integers have been used for the same. Alternatively, we can pass the columns and index in the constructor itself:-

df = pd.DataFrame(np.random.rand(3, 2), index = ['a','b','c'], columns = ['x', 'y'])
print(df)

# Output

          x         y
a  0.854185  0.871370
b  0.419274  0.123717
c  0.989986  0.811176

How to create a Pandas Dataframe from a Dictionary of Numpy arrays or list?

Alternatively, a Pandas Dataframe can also be created from a dictionary of nd arrays or list, the keys of the dictionaries will be the columns of the dataframe and it will have the default integer index, if no index is passed.

import pandas as pd
a_dict = {'one': [1., 2., 3., 4.], 'two': [4., 3., 2., 1.]}
df = pd.DataFrame(a_dict)
print(df)

# Output

   one  two
0  1.0  4.0
1  2.0  3.0
2  3.0  2.0
3  4.0  1.0

How to create Pandas Dataframe from a Numpy structured array?

We can create a Pandas Dataframe from a numpy structured array using the following code:-

import pandas as pd
import numpy as np

data = np.zeros((2, ), dtype=[('A', 'i4'), ('B', 'f4'), ('C', 'a10')])
data[:] = [(1, 2., 'Hello'), (2, 3., "World")]
df = pd.DataFrame(data)
print(df)

# Output

   A    B         C
0  1  2.0  b'Hello'
1  2  3.0  b'World'

How to check the Index and columns of a Pandas Dataframe?

You can get the index and column of a pandas dataframe using the following codes:-

print(states.index)
print(states.columns)

# Output

Index(['California', 'Texas', 'New York', 'Florida', 'Illinois'], dtype='object')
Index(['population', 'area'], dtype='object')

Related Posts

Python Pandas Tutorial - Introduction and Installation
By Udit Vashisht

Python Pandas Tutorial - Introduction

Python Pandas or Python Data Analysis Library is an open-source library which provides high-performance, easy-to-use data structures and data analysis tools for the Python programming language. Python Pandas is also used for timeseries data analysis. Python Pandas is derived from the term ...

Read More
How to insert a new row in a Pandas Dataframe?
By Udit Vashisht

How to insert a new row to a Pandas Dataframe?

In this post, we will learn to insert/add a new row to an existing Pandas Dataframe using pandas.DataFrame.loc, pandas.concat() and numpy.insert(). Using these methods you can add multiple rows/lists to an existing or an empty Pandas ...

Read More
Python Pandas Tutorial - Creating Pandas Dataframe from CSV file and other file formats
By Udit Vashisht

Python Pandas Tutorial - Create Pandas Dataframe from a CSV File - Reading in data from various files

In the last post about python pandas, we learnt about the python pandas data objects - python pandas series and python pandas dataframe and also learned to construct a ...

Read More
Search
Tags
tech tutorials automate python beautifulsoup web scrapping webscrapping bs4 Strip Python3 programming Pythonanywhere free Online Hosting hindi til github today i learned Windows Installations Installation Learn Python in Hindi Python Tutorials Beginners macos installation guide linux SaralGyaan Saral Gyaan json in python JSON to CSV Convert json to csv python in hindi convert json csv in python remove background python mini projects background removal remove.bg tweepy Django Django tutorials Django for beginners Django Free tutorials Proxy Models User Models AbstractUser UserModel convert json to csv python json to csv python Variables Python cheats Quick tips == and is f string in python f-strings pep-498 formatting in python python f string smtplib python send email with attachment python send email automated emails python python send email gmail automated email sending passwords secrets environment variables if name == main Matplotlib tutorial Matplotlib lists pandas Scatter Plot Time Series Data Live plots Matplotlib Subplots Matplotlib Candlesticks plots Tutorial Logging unittest testing python test Object Oriented Programming Python OOP Database Database Migration Python 3.8 Walrus Operator Data Analysis Pandas Dataframe Pandas Series Dataframe index pandas index python pandas tutorial python pandas python pandas dataframe python f-strings padding how to flatten a nested json nested json to csv json to csv python pandas Pandas Tutorial insert rows pandas pandas append list line charts line plots in python Django proxy user model django custom user model django user model matplotlib marker size pytplot legends scatter plot python pandas python virtual environment virtualenv venv python python venv virtual environment in python python decorators bioinformatics fastafiles Fasta python list append append raspberry pi editor cron crontab Cowin Cowin api python dictionary Python basics dictionary python list list ios development listview navigationview swiftui ios mvvm swift environmentobject property wrapper @State @Environm popup @State ios15 alert automation instagram instaloader texteditor youtubeshorts textfield multi-line star rating reusable swift selenium selenium driver requests-html youtube youtube shorts python automation python tutorial algo trading nifty 50 nifty50 stock list nifty50 telegram telegram bot dictionary in Python how to learn python learn python