Udit Vashisht
Author: Udit Vashisht


Python Pandas Objects - Pandas Series and Pandas Dataframe

  • 11 minutes read
  • 113 Views
Python Pandas Objects - Pandas Series and Pandas Dataframe

    Table of Contents

Python Pandas Objects - Pandas Series and Pandas Dataframe

In the last post, we discussed introduction and installation of Python Pandas. In this post, we will learn about pandas’ data structures/objects. Pandas provide two type of data structures:-

  • Pandas Series
  • Pandas Dataframe

Pandas Series

Pandas Series is a one dimensional indexed data, which can hold datatypes like integer, string, boolean, float, python object etc. A Pandas Series can hold only one data type at a time. The axis label of the data is called the index of the series. The labels need not to be unique but must be a hashable type. The index of the series can be integer, string and even time-series data. In general, Pandas Series is nothing but a column of an excel sheet with row index being the index of the series.

Constructing a Pandas Series

A pandas series can be constructed using the following constructor :-

pandas.Series([data, index, dtype, name, copy, …])

The parameters for the constructor of a Python Pandas Series are detailed as under:-

Parameters Remarks
data : array-like, Iterable, dict, or scalar value Contains data stored in Series. Changed in version 0.23.0: If data is a dict, argument order is maintained for Python 3.6 and later.
index : array-like or Index (1d) Values must be hashable and have the same length as data. Non-unique index values are allowed. Will default to RangeIndex (0, 1, 2, …, n) if not provided. If both a dict and index sequence are used, the index will override the keys found in the dict.
dtype : str, numpy.dtype, or ExtensionDtype, optional Data type for the output Series. If not specified, this will be inferred from data. See the user guide for more usages.
copy : bool, default False Copy input data.

How to create an empty Pandas Series?

You can create an empty Pandas Series using pandas.Series() as under:-

import pandas as pd 
empty_series = pd.Series()
print(empty_series)

# Output

Series([], dtype: float64)

How to construct a Pandas Series from a list?

You can create a Pandas Series from a Python list by passing the list to Pandas.Series() as under:-

import pandas as pd
data = pd.Series(['a', 'b', 'c', 'd'])
print (data)

# Output

0    a
1    b
2    c
3    d
dtype: object

Values and index of a Pandas Series

A Pandas Series consists of two parts - an index and values, you can check the index and values of a Pandas Series using Series.values and Series.index as under:-

print(data.values)
print(data.index)

# Output
['a' 'b' 'c' 'd']
RangeIndex(start=0, stop=4, step=1)

Setting an explicit index of a Pandas Series

In the above example, we did not specify any index for our Pandas Series, a default index ranging from 0 to n-1 (n being the length of the data) is created. You can also explicitly define the index of a Pandas Series as under:-

import pandas as pd
data_2 = pd.Series(['One', 'Two', 'Three', 'Four'], index=['a', 'b', 'c', 'd'])
print(data_2)

# Output
a      One
b      Two
c    Three
d     Four
dtype: object

Constructing a Pandas Series from Numpy array

You can also create a Pandas Series from a numpy array by passing the array to pandas.Series() as under:-

import numpy as np
import pandas as pd

data = np.array(['a', 'b', 'c', 'd', 'e', 'f'])
a_series = pd.Series(data)
print(a_series)

# Output

0    a
1    b
2    c
3    d
4    e
5    f
dtype: object

Creating a Pandas Series from a Python Dictionary

You can create a Pandas Series from a dictionary by passing the dictionary to pandas.Series() as under. In this case, the index of the Pandas Series will be the keys of the dictionary and the values will be the values of the dictionary. It can be inferred that a Pandas Series is like a specialisation of a Python dictionary. A dictionary is a structure that maps arbitrary keys to a set of arbitrary values, and a Pandas Series is a structure that maps typed keys to a set of typed values. :-

import pandas as pd

a_dict = {'one': 1,
          'two': 2,
          'three': 3,
          'four': 4,
          }
a_series = pd.Series(a_dict)
print(a_series)

# Output

one      1
two      2
three    3
four     4
dtype: int64

You can also create a Pandas Series only from desired keys of the Python dictionary by explicitly passing only desired indexes to pd.Series() as under:-

import pandas as pd

a_dict = {'one': 1,
          'two': 2,
          'three': 3,
          'four': 4,
          }
a_series = pd.Series(a_dict, index = ['one', 'three'])
print(a_series)

# Output

one      1
three    3
dtype: int64

In the above, example though we have passed the whole dictionary to pd.Series() but the Pandas Series has ignored the keys/values pair for the keys missing in the index argument.

Constructing a Pandas Series from scalar data

If you pass a single value with multiple indexes, the value will be same for all the indexes

a_series = pd.Series(5, index=[100, 200, 300])
print(a_series)

# Output

100    5
200    5
300    5
dtype: int64

Pandas dataframe

Pandas dataframe is a primary data structure of pandas. Pandas dataframe is a two-dimensional size mutable array with both flexible row indices and flexible column names. In general, it is just like an excel sheet or SQL table. It can also be seen as a python’s dict-like container for series objects.

Constructing a pandas dataframe

A pandas dataframe can be constructed using the following constructor:-

pd.DataFrame([data, index, columns, dtype, name, copy, …])

The parameters for the constuctor are detailed as under:-

Parameters Remarks
data : ndarray (structured or homogeneous), Iterable, dict, or DataFrame Dict can contain Series, arrays, constants, or list-like objects Changed in version 0.23.0: If data is a dict, column order follows insertion-order for Python 3.6 and later. Changed in version 0.25.0: If data is a list of dicts, column order follows insertion-order for Python 3.6 and later.
index : Index or array-like Index to use for resulting frame. Will default to RangeIndex if no indexing information part of input data and no index provided
columns : Index or array-like Column labels to use for resulting frame. Will default to RangeIndex (0, 1, 2, …, n) if no column labels are provided
dtype, default None Data type to force. Only a single dtype is allowed. If None, infer
copy : bool, default False Copy data from inputs. Only affects DataFrame / 2d ndarray input

A dataframe can be constructed from:-

  • Dict of 1D ndarrays, lists, dicts, or Series
  • 2-D numpy.ndarray
  • Structured or record ndarray
  • A Series
  • Another DataFrame

Constructing a pandas dataframe from a single series object

A single series object can also be converted to a dataframe as under:-

population_dict = {'California': 38332521, 'Texas': 26448193, 'New York': 19651127, 'Florida': 19552860, 'Illinois': 12882135}
population = pd.Series(population_dict)
df = pd.DataFrame(population)
print (df)

# Output

                   0
California  38332521
Texas       26448193
New York    19651127
Florida     19552860
Illinois    12882135

Since, we have not passed the columns argument, it has been given a default value of 0.

Constructing a pandas dataframe from dictionary of series

We can construct a pandas dataframe from a dictionary of series as under :-

import pandas as pd

area_dict = {'California': 423967, 'Texas': 695662, 'New York': 141297,
             'Florida': 170312, 'Illinois': 149995}
population_dict = {'California': 38332521, 'Texas': 26448193, 'New York': 19651127, 'Florida': 19552860, 'Illinois': 12882135}

area = pd.Series(area_dict)
population = pd.Series(population_dict)
states = pd.DataFrame({'population': population, 'area': area})

print(states)

# Output 

            population    area
California    38332521  423967
Texas         26448193  695662
New York      19651127  141297
Florida       19552860  170312
Illinois      12882135  149995

As you can see here the resulting index is the union of the keys of the dictionaries and the missing value will be replaced by NaN (not a number). You can optionally pass index (row labels) and columns (column labels) arguments also. A dict of series alongwith specific index will discard all data not matching the passed index.

Index and columns of the pandas dataframe

You can fetch the index and column of a pandas dataframe using the following codes:-

print(states.index)
print(states.columns)

# Output

Index(['California', 'Texas', 'New York', 'Florida', 'Illinois'], dtype='object')
Index(['population', 'area'], dtype='object')

Constructing a pandas dataframe from list of dictionaries

import pandas as pd
df = pd.DataFrame([{'a': 1, 'b': 2}, {'b': 3, 'c': 4}])
print(df)

# Output
     a  b    c
0  1.0  2  NaN
1  NaN  3  4.0

Here, the dataframe has been constructed with columns as a union of keys of the dictionaries and the missing value has been added as ‘NaN’.

Constructing a pandas dataframe from 2D numpy array

A pandas dataframe can also be constructed from a 2 dimenstional numpy array.

import pandas as pd
import numpy as np

df = pd.DataFrame(np.random.rand(3, 2))
print(df)

# Output

          0         1
0  0.059926  0.119440
1  0.548637  0.232405
2  0.343573  0.809589

Since, we have not passed the column and index, the default integers have been used for the same. Alternatively, we can pass the columns and index in the constructor itself:-

df = pd.DataFrame(np.random.rand(3, 2), index = ['a','b','c'], columns = ['x', 'y'])
print(df)

# Output

          x         y
a  0.854185  0.871370
b  0.419274  0.123717
c  0.989986  0.811176

Constructing a pandas dataframe from a dictionary of nd arrays or list

Alternatively, a pandas dataframe can also be constructed from a dictionary of nd arrays or list, the key of the dictionaries will be the columns of the dataframe and it will have the default integer index, if no index is passed.

import pandas as pd
a_dict = {'one': [1., 2., 3., 4.], 'two': [4., 3., 2., 1.]}
df = pd.DataFrame(a_dict)
print(df)

# Output

   one  two
0  1.0  4.0
1  2.0  3.0
2  3.0  2.0
3  4.0  1.0

Constructing a pandas dictionary from numpy structured array

import pandas as pd
import numpy as np

data = np.zeros((2, ), dtype=[('A', 'i4'), ('B', 'f4'), ('C', 'a10')])
data[:] = [(1, 2., 'Hello'), (2, 3., "World")]
df = pd.DataFrame(data)
print(df)

# Output

   A    B         C
0  1  2.0  b'Hello'
1  2  3.0  b'World'

If you want to dive more into data science and know more about pandas, numpy, matplotlib, etc., you can check out the following book:-



Related Posts

How to insert a new row in a Pandas Dataframe?
By Udit Vashisht

How to insert a new row to a Pandas Dataframe?

In this post, we will learn to insert/add a new row to an existing Pandas Dataframe using pandas.DataFrame.loc, pandas.concat() and numpy.insert(). Using these methods you can add multiple rows/lists to an existing or an empty Pandas ...

Read More
Python Pandas Tutorial - How to set index of a Python Pandas Dataframe?
By Udit Vashisht

Python Pandas is the most popular and downloaded module of Python. In our previous post, we have given a detailed introduction about Python Pandas and how to install python pandas on MacOS, Windows, Linux, etc. In this post, we will learn how to set index of ...

Read More
Python Pandas Tutorial - Creating Pandas Dataframe from CSV file and other file formats
By Udit Vashisht

Python Pandas Tutorial - Create Pandas Dataframe from a CSV File - Reading in data from various files

In the last post about python pandas, we learnt about the python pandas data objects - python pandas series and python pandas dataframe and also learned to construct a ...

Read More
Search
Tags
tech tutorials automate python beautifulsoup web scrapping webscrapping bs4 Strip Python3 programming Pythonanywhere free Online Hosting hindi til github today i learned Windows Installations Installation Learn Python in Hindi Python Tutorials Beginners macos installation guide linux SaralGyaan Saral Gyaan json in python JSON to CSV Convert json to csv python in hindi convert json csv in python remove background python mini projects background removal remove.bg tweepy Django Django tutorials Django for beginners Django Free tutorials Proxy Models User Models AbstractUser UserModel convert json to csv python json to csv python Variables Python cheats Quick tips == and is f string in python f-strings pep-498 formatting in python python f string smtplib python send email with attachment python send email automated emails python python send email gmail automated email sending passwords secrets environment variables if name == main Matplotlib tutorial Matplotlib lists pandas Scatter Plot Time Series Data Live plots Matplotlib Subplots Matplotlib Candlesticks plots Tutorial Logging unittest testing python test Object Oriented Programming Python OOP Database Database Migration Python 3.8 Walrus Operator Data Analysis Pandas Dataframe Pandas Series Dataframe index pandas index python pandas tutorial python pandas python pandas dataframe python f-strings padding how to flatten a nested json nested json to csv json to csv python pandas Pandas Tutorial insert rows pandas pandas append list