Udit Vashisht
Author: Udit Vashisht


Pandas Objects - Series and Dataframe

  • 11 minutes read
  • 73 Views
Pandas Objects - Series and Dataframe

Pandas Objects - Series and Dataframe

In the last post, we discussed introduction and installation of pandas. In this post, we will learn about pandas’ data structures/objects. Pandas provide two type of data structures:-

  • Series
  • Dataframe

Pandas series

Pandas series is a one dimensional indexed data, which can hold datatypes like integer, string, boolean, float, python object etc. You must note that a series can hold only one data type at a time. The axis label of the data is called the index of the series. The labels need not be unique but must be a hashable type. The index of the series can be integer, string and even time-series. In general, series is nothing but a column of an excel sheet with row index being the index of the series.

Constructing a pandas series

A pandas series can be constructed using the following constructor :-

pandas.Series([data, index, dtype, name, copy, …])

The parameters for the constuctor are detailed as under:-

Parameters Remarks
data : array-like, Iterable, dict, or scalar value Contains data stored in Series. Changed in version 0.23.0: If data is a dict, argument order is maintained for Python 3.6 and later.
index : array-like or Index (1d) Values must be hashable and have the same length as data. Non-unique index values are allowed. Will default to RangeIndex (0, 1, 2, …, n) if not provided. If both a dict and index sequence are used, the index will override the keys found in the dict.
dtype : str, numpy.dtype, or ExtensionDtype, optional Data type for the output Series. If not specified, this will be inferred from data. See the user guide for more usages.
copy : bool, default False Copy input data.

Empty pandas series

You can create an empty pandas series as under:-

import pandas as pd 
empty_series = pd.Series()
print(empty_series)

# Output

Series([], dtype: float64)

Constructing pandas series from a list

A pandas series can be constructed from a list as under:-

import pandas as pd
data = pd.Series(['a', 'b', 'c', 'd'])
print (data)

# Output

0    a
1    b
2    c
3    d
dtype: object

Values and index of a pandas series

The above series consists of two parts - an index and values, you can check them as under

print(data.values)
print(data.index)

# Output
['a' 'b' 'c' 'd']
RangeIndex(start=0, stop=4, step=1)

Setting explicit index for a pandas series

In the above example we did not specify any index for our series, a default index ranging from 0 to n-1 (n being the length of the data) is created. You can also explicitly define the index as under:-

import pandas as pd
data_2 = pd.Series(['One', 'Two', 'Three', 'Four'], index=['a', 'b', 'c', 'd'])
print(data_2)

# Output
a      One
b      Two
c    Three
d     Four
dtype: object

Constructing a pandas series from numpy array

You can also create a pandas series from a numpy array using the following code:-

import numpy as np
import pandas as pd

data = np.array(['a', 'b', 'c', 'd', 'e', 'f'])
a_series = pd.Series(data)
print(a_series)

# Output

0    a
1    b
2    c
3    d
4    e
5    f
dtype: object

Creating a pandas series from a python dictionary

We can also create a pandas series from a dictionary as under:-

import pandas as pd

a_dict = {'one': 1,
          'two': 2,
          'three': 3,
          'four': 4,
          }
a_series = pd.Series(a_dict)
print(a_series)

# Output

one      1
two      2
three    3
four     4
dtype: int64

In this case, the index of the series will be the keys of dictionary and the values will be the values of the dictionary. It can be inferred that a pandas series is like a specialisation of a Python dictionary. A dictionary is a structure that maps arbitrary keys to a set of arbitrary values, and a Series is a structure that maps typed keys to a set of typed values.

You can explicitly pass only desired indexes to create the series as under:-

import pandas as pd

a_dict = {'one': 1,
          'two': 2,
          'three': 3,
          'four': 4,
          }
a_series = pd.Series(a_dict, index = ['one', 'three'])
print(a_series)

# Output

one      1
three    3
dtype: int64

In the above, example though we have passed the whole dictionary but the pandas series have ignored the keys/values pair for keys missing in index argument.

Constructing a pandas series from scalar data

If you pass a single value with multiple indexes, the value will be same for all the indexes

a_series = pd.Series(5, index=[100, 200, 300])
print(a_series)

# Output

100    5
200    5
300    5
dtype: int64

Pandas dataframe

Pandas dataframe is a primary data structure of pandas. Pandas dataframe is a two-dimensional size mutable array with both flexible row indices and flexible column names. In general, it is just like an excel sheet or SQL table. It can also be seen as a python’s dict-like container for series objects.

Constructing a pandas dataframe

A pandas dataframe can be constructed using the following constructor:-

pd.DataFrame([data, index, columns, dtype, name, copy, …])

The parameters for the constuctor are detailed as under:-

Parameters Remarks
data : ndarray (structured or homogeneous), Iterable, dict, or DataFrame Dict can contain Series, arrays, constants, or list-like objects Changed in version 0.23.0: If data is a dict, column order follows insertion-order for Python 3.6 and later. Changed in version 0.25.0: If data is a list of dicts, column order follows insertion-order for Python 3.6 and later.
index : Index or array-like Index to use for resulting frame. Will default to RangeIndex if no indexing information part of input data and no index provided
columns : Index or array-like Column labels to use for resulting frame. Will default to RangeIndex (0, 1, 2, …, n) if no column labels are provided
dtype, default None Data type to force. Only a single dtype is allowed. If None, infer
copy : bool, default False Copy data from inputs. Only affects DataFrame / 2d ndarray input

A dataframe can be constructed from:-

  • Dict of 1D ndarrays, lists, dicts, or Series
  • 2-D numpy.ndarray
  • Structured or record ndarray
  • A Series
  • Another DataFrame

Constructing a pandas dataframe from a single series object

A single series object can also be converted to a dataframe as under:-

population_dict = {'California': 38332521, 'Texas': 26448193, 'New York': 19651127, 'Florida': 19552860, 'Illinois': 12882135}
population = pd.Series(population_dict)
df = pd.DataFrame(population)
print (df)

# Output

                   0
California  38332521
Texas       26448193
New York    19651127
Florida     19552860
Illinois    12882135

Since, we have not passed the columns argument, it has been given a default value of 0.

Constructing a pandas dataframe from dictionary of series

We can construct a pandas dataframe from a dictionary of series as under :-

import pandas as pd

area_dict = {'California': 423967, 'Texas': 695662, 'New York': 141297,
             'Florida': 170312, 'Illinois': 149995}
population_dict = {'California': 38332521, 'Texas': 26448193, 'New York': 19651127, 'Florida': 19552860, 'Illinois': 12882135}

area = pd.Series(area_dict)
population = pd.Series(population_dict)
states = pd.DataFrame({'population': population, 'area': area})

print(states)

# Output 

            population    area
California    38332521  423967
Texas         26448193  695662
New York      19651127  141297
Florida       19552860  170312
Illinois      12882135  149995

As you can see here the resulting index is the union of the keys of the dictionaries and the missing value will be replaced by NaN (not a number). You can optionally pass index (row labels) and columns (column labels) arguments also. A dict of series alongwith specific index will discard all data not matching the passed index.

Index and columns of the pandas dataframe

You can fetch the index and column of a pandas dataframe using the following codes:-

print(states.index)
print(states.columns)

# Output

Index(['California', 'Texas', 'New York', 'Florida', 'Illinois'], dtype='object')
Index(['population', 'area'], dtype='object')

Constructing a pandas dataframe from list of dictionaries

import pandas as pd
df = pd.DataFrame([{'a': 1, 'b': 2}, {'b': 3, 'c': 4}])
print(df)

# Output
     a  b    c
0  1.0  2  NaN
1  NaN  3  4.0

Here, the dataframe has been constructed with columns as a union of keys of the dictionaries and the missing value has been added as ‘NaN’.

Constructing a pandas dataframe from 2D numpy array

A pandas dataframe can also be constructed from a 2 dimenstional numpy array.

import pandas as pd
import numpy as np

df = pd.DataFrame(np.random.rand(3, 2))
print(df)

# Output

          0         1
0  0.059926  0.119440
1  0.548637  0.232405
2  0.343573  0.809589

Since, we have not passed the column and index, the default integers have been used for the same. Alternatively, we can pass the columns and index in the constructor itself:-

df = pd.DataFrame(np.random.rand(3, 2), index = ['a','b','c'], columns = ['x', 'y'])
print(df)

# Output

          x         y
a  0.854185  0.871370
b  0.419274  0.123717
c  0.989986  0.811176

Constructing a pandas dataframe from a dictionary of nd arrays or list

Alternatively, a pandas dataframe can also be constructed from a dictionary of nd arrays or list, the key of the dictionaries will be the columns of the dataframe and it will have the default integer index, if no index is passed.

import pandas as pd
a_dict = {'one': [1., 2., 3., 4.], 'two': [4., 3., 2., 1.]}
df = pd.DataFrame(a_dict)
print(df)

# Output

   one  two
0  1.0  4.0
1  2.0  3.0
2  3.0  2.0
3  4.0  1.0

Constructing a pandas dictionary from numpy structured array

import pandas as pd
import numpy as np

data = np.zeros((2, ), dtype=[('A', 'i4'), ('B', 'f4'), ('C', 'a10')])
data[:] = [(1, 2., 'Hello'), (2, 3., "World")]
df = pd.DataFrame(data)
print(df)

# Output

   A    B         C
0  1  2.0  b'Hello'
1  2  3.0  b'World'

If you want to dive more into data science and know more about pandas, numpy, matplotlib, etc., you can check out the following book:-



Related Posts

Pandas - Reading in data from various files
By Udit Vashisht

Pandas - Reading in data from various files

In the last post, we learnt about the pandas data objects - pandas series and pandas dataframe and also learned to construct a series or dataframe from scratch. In this post, we will learn to read tabular data from ...

Read More
Pandas - Introduction and Installation
By Udit Vashisht

Introduction to Pandas

Pandas or Python Data Analysis Library is an open source library which provides high-performance, easy-to-use data structures and data analysis tools for the Python programming language. Pandas is also used for timeseries data analysis. Pandas is derived from the term “panel data”, an econometrics ...

Read More
Search