# Pandas Objects - Series and Dataframe

In the last post, we discussed introduction and installation of pandas. In this post, we will learn about pandas’ data structures/objects. Pandas provide two type of data structures:-

- Series
- Dataframe

## Pandas series

Pandas series is a one dimensional indexed data, which can hold datatypes like integer, string, boolean, float, python object etc. You must note that a series can hold only one data type at a time. The axis label of the data is called the index of the series. The labels need not be unique but must be a hashable type. The index of the series can be integer, string and even time-series. In general, series is nothing but a column of an excel sheet with row index being the index of the series.

### Constructing a pandas series

A pandas series can be constructed using the following constructor :-

`pandas.Series([data, index, dtype, name, copy, …])`

The parameters for the constuctor are detailed as under:-

Parameters | Remarks |
---|---|

data : array-like, Iterable, dict, or scalar value | Contains data stored in Series. Changed in version 0.23.0: If data is a dict, argument order is maintained for Python 3.6 and later. |

index : array-like or Index (1d) | Values must be hashable and have the same length as data. Non-unique index values are allowed. Will default to RangeIndex (0, 1, 2, …, n) if not provided. If both a dict and index sequence are used, the index will override the keys found in the dict. |

dtype : str, numpy.dtype, or ExtensionDtype, optional | Data type for the output Series. If not specified, this will be inferred from data. See the user guide for more usages. |

copy : bool, default False | Copy input data. |

### Empty pandas series

You can create an empty pandas series as under:-

```
import pandas as pd
empty_series = pd.Series()
print(empty_series)
# Output
Series([], dtype: float64)
```

### Constructing pandas series from a list

A pandas series can be constructed from a list as under:-

```
import pandas as pd
data = pd.Series(['a', 'b', 'c', 'd'])
print (data)
# Output
0 a
1 b
2 c
3 d
dtype: object
```

### Values and index of a pandas series

The above series consists of two parts - an index and values, you can check them as under

```
print(data.values)
print(data.index)
# Output
['a' 'b' 'c' 'd']
RangeIndex(start=0, stop=4, step=1)
```

### Setting explicit index for a pandas series

In the above example we did not specify any index for our series, a default index ranging from 0 to n-1 (n being the length of the data) is created. You can also explicitly define the index as under:-

```
import pandas as pd
data_2 = pd.Series(['One', 'Two', 'Three', 'Four'], index=['a', 'b', 'c', 'd'])
print(data_2)
# Output
a One
b Two
c Three
d Four
dtype: object
```

### Constructing a pandas series from numpy array

You can also create a pandas series from a numpy array using the following code:-

```
import numpy as np
import pandas as pd
data = np.array(['a', 'b', 'c', 'd', 'e', 'f'])
a_series = pd.Series(data)
print(a_series)
# Output
0 a
1 b
2 c
3 d
4 e
5 f
dtype: object
```

### Creating a pandas series from a python dictionary

We can also create a pandas series from a dictionary as under:-

```
import pandas as pd
a_dict = {'one': 1,
'two': 2,
'three': 3,
'four': 4,
}
a_series = pd.Series(a_dict)
print(a_series)
# Output
one 1
two 2
three 3
four 4
dtype: int64
```

In this case, the index of the series will be the keys of dictionary and the values will be the values of the dictionary. It can be inferred that a pandas series is like a specialisation of a Python dictionary. A dictionary is a structure that maps arbitrary keys to a set of arbitrary values, and a Series is a structure that maps typed keys to a set of typed values.

You can explicitly pass only desired indexes to create the series as under:-

```
import pandas as pd
a_dict = {'one': 1,
'two': 2,
'three': 3,
'four': 4,
}
a_series = pd.Series(a_dict, index = ['one', 'three'])
print(a_series)
# Output
one 1
three 3
dtype: int64
```

In the above, example though we have passed the whole dictionary but the pandas series have ignored the keys/values pair for keys missing in index argument.

### Constructing a pandas series from scalar data

If you pass a single value with multiple indexes, the value will be same for all the indexes

```
a_series = pd.Series(5, index=[100, 200, 300])
print(a_series)
# Output
100 5
200 5
300 5
dtype: int64
```

## Pandas dataframe

Pandas dataframe is a primary data structure of pandas. Pandas dataframe is a two-dimensional size mutable array with both flexible row indices and flexible column names. In general, it is just like an excel sheet or SQL table. It can also be seen as a python’s dict-like container for series objects.

### Constructing a pandas dataframe

A pandas dataframe can be constructed using the following constructor:-

`pd.DataFrame([data, index, columns, dtype, name, copy, …])`

The parameters for the constuctor are detailed as under:-

Parameters | Remarks |
---|---|

data : ndarray (structured or homogeneous), Iterable, dict, or DataFrame | Dict can contain Series, arrays, constants, or list-like objects Changed in version 0.23.0: If data is a dict, column order follows insertion-order for Python 3.6 and later. Changed in version 0.25.0: If data is a list of dicts, column order follows insertion-order for Python 3.6 and later. |

index : Index or array-like | Index to use for resulting frame. Will default to RangeIndex if no indexing information part of input data and no index provided |

columns : Index or array-like | Column labels to use for resulting frame. Will default to RangeIndex (0, 1, 2, …, n) if no column labels are provided |

dtype, default None | Data type to force. Only a single dtype is allowed. If None, infer |

copy : bool, default False | Copy data from inputs. Only affects DataFrame / 2d ndarray input |

A dataframe can be constructed from:-

- Dict of 1D ndarrays, lists, dicts, or Series
- 2-D numpy.ndarray
- Structured or record ndarray
- A Series
- Another DataFrame

### Constructing a pandas dataframe from a single series object

A single series object can also be converted to a dataframe as under:-

```
population_dict = {'California': 38332521, 'Texas': 26448193, 'New York': 19651127, 'Florida': 19552860, 'Illinois': 12882135}
population = pd.Series(population_dict)
df = pd.DataFrame(population)
print (df)
# Output
0
California 38332521
Texas 26448193
New York 19651127
Florida 19552860
Illinois 12882135
```

Since, we have not passed the columns argument, it has been given a default value of 0.

### Constructing a pandas dataframe from dictionary of series

We can construct a pandas dataframe from a dictionary of series as under :-

```
import pandas as pd
area_dict = {'California': 423967, 'Texas': 695662, 'New York': 141297,
'Florida': 170312, 'Illinois': 149995}
population_dict = {'California': 38332521, 'Texas': 26448193, 'New York': 19651127, 'Florida': 19552860, 'Illinois': 12882135}
area = pd.Series(area_dict)
population = pd.Series(population_dict)
states = pd.DataFrame({'population': population, 'area': area})
print(states)
# Output
population area
California 38332521 423967
Texas 26448193 695662
New York 19651127 141297
Florida 19552860 170312
Illinois 12882135 149995
```

As you can see here the resulting index is the union of the keys of the dictionaries and the missing value will be replaced by NaN (not a number). You can optionally pass index (row labels) and columns (column labels) arguments also. A dict of series alongwith specific index will discard all data not matching the passed index.

### Index and columns of the pandas dataframe

You can fetch the index and column of a pandas dataframe using the following codes:-

```
print(states.index)
print(states.columns)
# Output
Index(['California', 'Texas', 'New York', 'Florida', 'Illinois'], dtype='object')
Index(['population', 'area'], dtype='object')
```

### Constructing a pandas dataframe from list of dictionaries

```
import pandas as pd
df = pd.DataFrame([{'a': 1, 'b': 2}, {'b': 3, 'c': 4}])
print(df)
# Output
a b c
0 1.0 2 NaN
1 NaN 3 4.0
```

Here, the dataframe has been constructed with columns as a union of keys of the dictionaries and the missing value has been added as ‘NaN’.

### Constructing a pandas dataframe from 2D numpy array

A pandas dataframe can also be constructed from a 2 dimenstional numpy array.

```
import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.rand(3, 2))
print(df)
# Output
0 1
0 0.059926 0.119440
1 0.548637 0.232405
2 0.343573 0.809589
```

Since, we have not passed the column and index, the default integers have been used for the same. Alternatively, we can pass the columns and index in the constructor itself:-

```
df = pd.DataFrame(np.random.rand(3, 2), index = ['a','b','c'], columns = ['x', 'y'])
print(df)
# Output
x y
a 0.854185 0.871370
b 0.419274 0.123717
c 0.989986 0.811176
```

### Constructing a pandas dataframe from a dictionary of nd arrays or list

Alternatively, a pandas dataframe can also be constructed from a dictionary of nd arrays or list, the key of the dictionaries will be the columns of the dataframe and it will have the default integer index, if no index is passed.

```
import pandas as pd
a_dict = {'one': [1., 2., 3., 4.], 'two': [4., 3., 2., 1.]}
df = pd.DataFrame(a_dict)
print(df)
# Output
one two
0 1.0 4.0
1 2.0 3.0
2 3.0 2.0
3 4.0 1.0
```

### Constructing a pandas dictionary from numpy structured array

```
import pandas as pd
import numpy as np
data = np.zeros((2, ), dtype=[('A', 'i4'), ('B', 'f4'), ('C', 'a10')])
data[:] = [(1, 2., 'Hello'), (2, 3., "World")]
df = pd.DataFrame(data)
print(df)
# Output
A B C
0 1 2.0 b'Hello'
1 2 3.0 b'World'
```

If you want to dive more into data science and know more about pandas, numpy, matplotlib, etc., you can check out the following book:-