Author: Udit Vashisht

Pandas

Python Pandas Objects - Pandas Series and Pandas Dataframe

11 minutes read
2413 Views

Python Pandas Objects - Pandas Series and Pandas Dataframe

In the last post, we discussed introduction and installation of Python Pandas. In this post, we will learn about pandas’ data structures/objects. Pandas provide two type of data structures:-

Pandas Series
Pandas Dataframe

Pandas Series

Pandas Series is a one dimensional indexed data, which can hold datatypes like integer, string, boolean, float, python object etc. A Pandas Series can hold only one data type at a time. The axis label of the data is called the index of the series. The labels need not to be unique but must be a hashable type. The index of the series can be integer, string and even time-series data. In general, Pandas Series is nothing but a column of an excel sheet with row index being the index of the series.

Different ways of creating/constructing a Pandas Series

We can create a Pandas Series by using the following pandas.Series() constructor:-

pandas.Series([data, index, dtype, name, copy, …])

The parameters for the constructor of a Python Pandas Series are detailed as under:-

Parameters	Remarks
data : array-like, Iterable, dict, or scalar value	Contains data stored in Series. Changed in version 0.23.0: If data is a dict, argument order is maintained for Python 3.6 and later.
index : array-like or Index (1d)	Values must be hashable and have the same length as data. Non-unique index values are allowed. Will default to RangeIndex (0, 1, 2, …, n) if not provided. If both a dict and index sequence are used, the index will override the keys found in the dict.
dtype : str, numpy.dtype, or ExtensionDtype, optional	Data type for the output Series. If not specified, this will be inferred from data. See the user guide for more usages.
copy : bool, default False	Copy input data.

How to create an empty Pandas Series?

You can create an empty Pandas Series using pandas.Series() as under:-

import pandas as pd 
empty_series = pd.Series()
print(empty_series)

# Output

Series([], dtype: float64)

How to create a Pandas Series from a list?

You can create a Pandas Series from a Python list by passing the list to Pandas.Series() as under. In this case, the pandas will set the default index of the Series:-

import pandas as pd
data = pd.Series(['a', 'b', 'c', 'd'])
print (data)

# Output

0    a
1    b
2    c
3    d
dtype: object

Values and index of a Pandas Series

A Pandas Series consists of two parts - an index and values, you can check the index and values of a Pandas Series using Series.values and Series.index as under:-

print(data.values)
print(data.index)

# Output
['a' 'b' 'c' 'd']
RangeIndex(start=0, stop=4, step=1)

Setting an explicit index of a Pandas Series

In the above example, we did not specify any index for our Pandas Series, a default index ranging from 0 to n-1 (n being the length of the data) is created. You can also explicitly define the index of a Pandas Series as under:-

import pandas as pd
data_2 = pd.Series(['One', 'Two', 'Three', 'Four'], index=['a', 'b', 'c', 'd'])
print(data_2)

# Output
a      One
b      Two
c    Three
d     Four
dtype: object

How to create a Pandas Series from Numpy array?

You can also create a Pandas Series from a numpy array by passing the Numpy array to pandas.Series() as under:-

import numpy as np
import pandas as pd

data = np.array(['a', 'b', 'c', 'd', 'e', 'f'])
a_series = pd.Series(data)
print(a_series)

# Output

0    a
1    b
2    c
3    d
4    e
5    f
dtype: object

How to create a Pandas Series from a Python Dictionary?

You can create a Pandas Series from a dictionary by passing the dictionary to pandas.Series() as under. In this case, the index of the Pandas Series will be the keys of the dictionary and the values will be the values of the dictionary. It can be inferred that a Pandas Series is like a specialisation of a Python dictionary. A dictionary is a structure that maps arbitrary keys to a set of arbitrary values, and a Pandas Series is a structure that maps typed keys to a set of typed values. :-

import pandas as pd

a_dict = {'one': 1,
          'two': 2,
          'three': 3,
          'four': 4,
          }
a_series = pd.Series(a_dict)
print(a_series)

# Output

one      1
two      2
three    3
four     4
dtype: int64

You can also create a Pandas Series only from desired/selected keys of the Python dictionary by explicitly passing only desired indexes to pd.Series() as under:-

import pandas as pd

a_dict = {'one': 1,
          'two': 2,
          'three': 3,
          'four': 4,
          }
a_series = pd.Series(a_dict, index = ['one', 'three'])
print(a_series)

# Output

one      1
three    3
dtype: int64

In the above, example though we have passed the whole dictionary to pd.Series() but the Pandas Series has ignored the keys/values pair for the keys missing in the index argument.

How to create a Pandas Series from scalar data?

You can also, create a Pandas series from a scalar data. But, if you pass a single value with multiple indexes, the value will be same for all the indexes.

a_series = pd.Series(5, index=[100, 200, 300])
print(a_series)

# Output

100    5
200    5
300    5
dtype: int64

Pandas Dataframe

Pandas dataframe is a primary data structure of pandas. Pandas dataframe is a two-dimensional size mutable array with both flexible row indices and flexible column names. In general, it is just like an excel sheet or SQL table. It can also be seen as a python’s dict-like container for series objects.

Different ways of creating a Pandas Dataframe

A Pandas Dataframe can be created/constructed using the following pandas.DataFrame() constructor:-

pd.DataFrame([data, index, columns, dtype, name, copy, …])

A Pandas Dataframe can be created from:-

Dict of 1D ndarrays, lists, dicts, or Series
2-D numpy.ndarray
Structured or record ndarray
A Series
Another DataFrame

The parameters for the constuctor of a Pandas Dataframe are detailed as under:-

Parameters	Remarks
data : ndarray (structured or homogeneous), Iterable, dict, or DataFrame	Dict can contain Series, arrays, constants, or list-like objects Changed in version 0.23.0: If data is a dict, column order follows insertion-order for Python 3.6 and later. Changed in version 0.25.0: If data is a list of dicts, column order follows insertion-order for Python 3.6 and later.
index : Index or array-like	Index to use for resulting frame. Will default to RangeIndex if no indexing information part of input data and no index provided
columns : Index or array-like	Column labels to use for resulting frame. Will default to RangeIndex (0, 1, 2, …, n) if no column labels are provided
dtype, default None	Data type to force. Only a single dtype is allowed. If None, infer
copy : bool, default False	Copy data from inputs. Only affects DataFrame / 2d ndarray input

How to create an empty Pandas Dataframe in Python?

You can create an empty Pandas Dataframe using pandas.Dataframe() and later on you can add the columns using df.columns = [list of column names] and append rows to it.

>>> import pandas as pd
>>> df = pd.DataFrame()
>>> df
Empty DataFrame
Columns: []
Index: []
>>>

How to create a Pandas Dataframe from a single Series object?

We can create a Pandas Dataframe from a sing Pandas Series by passing the series in pd.DataFrame(), the index of the series will become the index of the dataframe and pandas will automatically set 0 as the column name of the Dataframe:-

population_dict = {'California': 38332521, 'Texas': 26448193, 'New York': 19651127, 'Florida': 19552860, 'Illinois': 12882135}
population = pd.Series(population_dict)
df = pd.DataFrame(population)
print (df)

# Output

                   0
California  38332521
Texas       26448193
New York    19651127
Florida     19552860
Illinois    12882135

Since, we have not passed the columns argument, it has been given a default value of 0.

How to create a Pandas Dataframe from a dictionary of two or more (multiple) Pandas Series?

We can create a Pandas Dataframe from multiple Pandas Series by passing the dictionary of multiple series to pd.DataFrame() as under. The keys of the dictionary will comprise the columns of the Pandas Dataframe:-

import pandas as pd

area_dict = {'California': 423967, 'Texas': 695662, 'New York': 141297,
             'Florida': 170312, 'Illinois': 149995}
population_dict = {'California': 38332521, 'Texas': 26448193, 'New York': 19651127, 'Florida': 19552860, 'Illinois': 12882135}

area = pd.Series(area_dict)
population = pd.Series(population_dict)
states = pd.DataFrame({'population': population, 'area': area})

print(states)

# Output 

            population    area
California    38332521  423967
Texas         26448193  695662
New York      19651127  141297
Florida       19552860  170312
Illinois      12882135  149995

As you can see here, the resulting index is the union of the keys of the dictionaries and the missing value will be replaced by NaN (not a number). You can optionally pass index (row labels) and columns (column labels) arguments also. A dict of series alongwith specific index will discard all data not matching the passed index.

How to create a Pandas Dataframe from a list of Python Dictionaries?

We can create a Pandas Dataframe from python dictionaries by passing the list of the dictionaries to pd.DataFrame():-

import pandas as pd
df = pd.DataFrame([{'a': 1, 'b': 2}, {'b': 3, 'c': 4}])
print(df)

# Output
     a  b    c
0  1.0  2  NaN
1  NaN  3  4.0

Here, the Pandas Dataframe has been constructed with columns as a union of keys of the dictionaries and the missing value has been added as ‘NaN’.

How to create a Pandas Dataframe from 2D Numpy array?

A pandas dataframe can also be created from a 2 dimensional numpy array by using the following code:-

import pandas as pd
import numpy as np

df = pd.DataFrame(np.random.rand(3, 2))
print(df)

# Output

          0         1
0  0.059926  0.119440
1  0.548637  0.232405
2  0.343573  0.809589

Since, we have not passed the column and index, the default integers have been used for the same. Alternatively, we can pass the columns and index in the constructor itself:-

df = pd.DataFrame(np.random.rand(3, 2), index = ['a','b','c'], columns = ['x', 'y'])
print(df)

# Output

          x         y
a  0.854185  0.871370
b  0.419274  0.123717
c  0.989986  0.811176

How to create a Pandas Dataframe from a Dictionary of Numpy arrays or list?

Alternatively, a Pandas Dataframe can also be created from a dictionary of nd arrays or list, the keys of the dictionaries will be the columns of the dataframe and it will have the default integer index, if no index is passed.

import pandas as pd
a_dict = {'one': [1., 2., 3., 4.], 'two': [4., 3., 2., 1.]}
df = pd.DataFrame(a_dict)
print(df)

# Output

   one  two
0  1.0  4.0
1  2.0  3.0
2  3.0  2.0
3  4.0  1.0

How to create Pandas Dataframe from a Numpy structured array?

We can create a Pandas Dataframe from a numpy structured array using the following code:-

import pandas as pd
import numpy as np

data = np.zeros((2, ), dtype=[('A', 'i4'), ('B', 'f4'), ('C', 'a10')])
data[:] = [(1, 2., 'Hello'), (2, 3., "World")]
df = pd.DataFrame(data)
print(df)

# Output

   A    B         C
0  1  2.0  b'Hello'
1  2  3.0  b'World'

How to check the Index and columns of a Pandas Dataframe?

You can get the index and column of a pandas dataframe using the following codes:-

print(states.index)
print(states.columns)

# Output

Index(['California', 'Texas', 'New York', 'Florida', 'Illinois'], dtype='object')
Index(['population', 'area'], dtype='object')

Python Pandas Tutorial - Introduction

Python Pandas or Python Data Analysis Library is an open-source library which provides high-performance, easy-to-use data structures and data analysis tools for the Python programming language. Python Pandas is also used for timeseries data analysis. Python Pandas is derived from the term ...

Python Pandas Tutorial - Creating Pandas Dataframe from CSV file and other file formats

Python Pandas Tutorial - Create Pandas Dataframe from a CSV File - Reading in data from various files

In the last post about python pandas, we learnt about the python pandas data objects - python pandas series and python pandas dataframe and also learned to construct a ...

How to insert a new row in a Pandas Dataframe?

How to insert a new row to a Pandas Dataframe?

In this post, we will learn to insert/add a new row to an existing Pandas Dataframe using pandas.DataFrame.loc, pandas.concat() and numpy.insert(). Using these methods you can add multiple rows/lists to an existing or an empty Pandas ...

Author: Udit Vashisht

Pandas

Python Pandas Objects - Pandas Series and Pandas Dataframe

Table of Contents

Python Pandas Objects - Pandas Series and Pandas Dataframe

Pandas Series

Different ways of creating/constructing a Pandas Series

How to create an empty Pandas Series?

How to create a Pandas Series from a list?

Values and index of a Pandas Series

Setting an explicit index of a Pandas Series

How to create a Pandas Series from Numpy array?

How to create a Pandas Series from a Python Dictionary?

How to create a Pandas Series from scalar data?

Pandas Dataframe

Different ways of creating a Pandas Dataframe

How to create an empty Pandas Dataframe in Python?

How to create a Pandas Dataframe from a single Series object?

How to create a Pandas Dataframe from a dictionary of two or more (multiple) Pandas Series?

How to create a Pandas Dataframe from a list of Python Dictionaries?

How to create a Pandas Dataframe from 2D Numpy array?

How to create a Pandas Dataframe from a Dictionary of Numpy arrays or list?

How to create Pandas Dataframe from a Numpy structured array?

How to check the Index and columns of a Pandas Dataframe?

Related Posts

Python Pandas Tutorial - Introduction and Installation

Python Pandas Tutorial - Introduction

Python Pandas Tutorial - Creating Pandas Dataframe from CSV file and other file formats

Python Pandas Tutorial - Create Pandas Dataframe from a CSV File - Reading in data from various files

How to insert a new row in a Pandas Dataframe?

How to insert a new row to a Pandas Dataframe?

Search

Categories

Tags