Pandas DataFrame Essentials: A Code-Driven Tutorial

A pandas dataframe is a two-dimensional data structure used to handle tabular data in Python. This tutorial discusses basic pandas dataframe operations like creating, reading, writing, and updating using code examples.

Create a Pandas Dataframe in Python

We can create a Pandas dataframe using the DataFrame() function. Alternatively, we can use functions like read_csv() and read_excel() to create dataframes from CSV and Excel files. Let us discuss all the approaches to creating pandas dataframes one by one.

Create an Empty Dataframe in Python

In pandas, we can create an empty dataframe using the DataFrame() function as shown below:

import pandas as pd

# Create an empty dataframe
df=pd.DataFrame()
print("The empty dataframe is:")
print(df)

Output:

The empty dataframe is:
Empty DataFrame
Columns: []
Index: []

The above example shows that the DataFrame() function returns an empty dataframe with empty Columns list and Index list. In a dataframe, the Columns list contains the column names for the dataframe. The Index list contains the row identifiers. As the dataframe is empty, both these lists are empty.

You can also create a dataframe with column names but no data. For this, you can use the columns parameter in the DataFrame() function. When we pass a list of column names as an input argument to the columns parameter while executing the DataFrame() function, we get an empty dataframe with desired column names, as shown in the following example:

import pandas as pd

# Define a list of column names
column_names=["Name","Age","Height"]

# Specify column names using the columns parameter
df=pd.DataFrame(columns=column_names)
print("The dataframe with column names is:")
print(df)

Output:

The dataframe with column names is:
Empty DataFrame
Columns: [Name, Age, Height]
Index: []

In this example, we passed the list ["Name", "Age", "Height"] to the DataFrame() function as input to the columns parameter. Hence, the output dataframe contains three columns.

Instead of empty dataframes, we can use dataframes with data given in tabular format. We can use a list of lists or dictionaries, a CSV file, or an Excel file, as discussed in the following subsections.

List of Lists to Pandas Dataframe in Python

To create a pandas dataframe using a list of lists, we pass the list to the DataFrame() function as its first input argument. Each list in the input data constitutes a row in the output dataframe. The number of columns in the dataframe is equal to the length of the longest list in the input list of lists. You can observe this in the following example:

import pandas as pd

# Define a list of lists 
data=[["Aditya", 21, 181],
     ["Joel", 30, 176],
      ["Sam", 23, 156],
     ["Tina", 26, 155],
     ["Chris", 32, 167]]

# Define a list of column names
column_names=["Name","Age","Height"]

# Pass the list of lists to the DataFrame() function
df=pd.DataFrame(data, columns=column_names)
print("The dataframe is:")
print(df)

Output:

The dataframe is:
     Name  Age  Height
0  Aditya   21     181
1    Joel   30     176
2     Sam   23     156
3    Tina   26     155
4   Chris   32     167

In this example, we passed a list containing five inner lists. Hence, the output dataframe has five rows. The output dataframe contains three columns, as each list in the input list of lists contains three elements.

If we don’t pass the column names to the DataFrame() function, it automatically assigns numbers starting from 0 as column names. For instance, if the output dataframe contains three columns, they will be named 0, 1, and 2.

import pandas as pd

# Define a list of Lists
data=[["Aditya", 21, 181],
     ["Joel", 30, 176],
      ["Sam", 23, 156],
     ["Tina", 26, 155],
     ["Chris", 32, 167]]
# Create Dataframe
df=pd.DataFrame(data)
print("The dataframe is:")
print(df)

Output:

The dataframe is:
        0   1    2
0  Aditya  21  181
1    Joel  30  176
2     Sam  23  156
3    Tina  26  155
4   Chris  32  167

In this example, we didn’t specify the column names in the DataFrame() function. Hence, we get 0, 1, and 2 as column names.

While creating a dataframe with column names, the number of column names given to the DataFrame() function must be equal to or greater than the longest list in the input list of lists. If any input list contains more elements than the column names provided to the DataFrame() function, the program will run into a ValueError exception.

import pandas as pd

#Define a list of lists
data=[["Aditya", 21, 181],
     ["Joel", 30, 176],
      ["Sam", 23, 156],
     ["Tina", 26, 155],
     ["Chris", 32, 167, 70]]

#Define column names
column_names=["Name","Age","Height"]

# Create dataframe
df=pd.DataFrame(data, columns=column_names)
print("The dataframe is:")
print(df)

Output:

ValueError: 3 columns passed, passed data had 4 columns

If some of the lists in the input data contain fewer elements than the number of columns passed to the DataFrame() function, we get a NaN value in the extra columns for the rows created by such lists.

import pandas as pd

# Define a list of lists
data=[["Aditya", 21, 181],
     ["Joel", 30, 176],
      ["Sam", 23, 156],
     ["Tina", 26, 155],
     ["Chris", 32, 167, 70]]
# Define column names
column_names=["Name","Age","Height", "Weight"]

# Create dataframe
df=pd.DataFrame(data, columns=column_names)
print("The dataframe is:")
print(df)

Output:

The dataframe is:
     Name  Age  Height  Weight
0  Aditya   21     181     NaN
1    Joel   30     176     NaN
2     Sam   23     156     NaN
3    Tina   26     155     NaN
4   Chris   32     167    70.0

We passed four column names to the DataFrame() function in this example. Hence, the rows corresponding to the lists with three elements in the input data contain the value NaN in the last column.

List of Dictionaries to Dataframe in Python

To create a dataframe from a list of lists, you must know the number of columns and the column names in the output dataframe. If you don’t know the maximum number of columns in the input data, your program will run into error. Using a list of dictionaries to create a pandas dataframe doesn’t require such information.

When we create a pandas dataframe using a list of dictionaries, the column names in the output dataframe are created using the keys in the dictionaries. The values in each dictionary of the input data constitute a row of the output dataframe. You can observe this in the following example.

import pandas as pd

# Define a list of dictionaries
data=[{"Name":"Aditya","Age": 21,"Height": 181},
     {"Name":"Joel","Age": 30,"Height": 176},
      {"Name":"Sam","Age": 23,"Height": 156},
     {"Name":"Tina","Age": 26,"Height": 155},
    {"Name":"Chris","Age": 32,"Height": 167}]

# Create a dataframe
df=pd.DataFrame(data)
print("The dataframe is:")
print(df)

Output:

The dataframe is:
     Name  Age  Height
0  Aditya   21     181
1    Joel   30     176
2     Sam   23     156
3    Tina   26     155
4   Chris   32     167

In this example, we passed a list containing five dictionaries to the DataFrame() function. Hence, we get a dataframe containing five rows. All the dictionaries in the input data have the keys "Name", "Age", and "Height". Hence, the output dataframe contains three columns, i.e. "Name", "Age", and "Height".

If the dictionaries in the input data contain different keys, the output dataframe contains a union of all the keys as its columns. As the data from each dictionary is converted into a row, the values in a given row column will contain NaN values if the input dictionary for the row doesn’t have the column name as its key. You can observe this in the following example:

import pandas as pd

# Define a list of dictionaries
data=[{"Name":"Aditya","Age": 21,"Weight": 75},
     {"Name":"Joel","Height": 176},
     {"Name":"Sam","Age": 23,"Height": 156},
     {"Name":"Tina","Age": 26,"Height": 155},
    {"Name":"Chris","Age": 32,"Height": 167, "Weight":70}]

#Create a dataframe
df=pd.DataFrame(data)
print("The dataframe is:")
print(df)

Output:

The dataframe is:
     Name   Age  Weight  Height
0  Aditya  21.0    75.0     NaN
1    Joel   NaN     NaN   176.0
2     Sam  23.0     NaN   156.0
3    Tina  26.0     NaN   155.0
4   Chris  32.0    70.0   167.0

In this example, the dictionaries in the input data contain different keys. Hence, the column names in the output dataframe are the union of the keys in all the dictionaries. If a dictionary doesn’t have a key, its corresponding row contains NaN values in the columns for the particular keys.

Import CSV File Into Pandas Dataframe in Python

We can also import data from CSV or Excel files into a pandas dataframe. To create a pandas dataframe from a CSV file, we use the read_csv() function. The read_csv() function takes the CSV file name as its input argument and returns a pandas dataframe containing data in the file.

import pandas as pd

# Read csv file into a dataframe
df=pd.read_csv("input_file.csv")
print("The dataframe is:")
print(df)

In the above code, we import the following CSV file.

The output of the code looks as follows:

The dataframe is:
     Name  Age  Height  Weight
0  Aditya   21     181      76
1    Joel   30     176      64
2     Sam   23     156      66
3    Tina   26     155      56
4   Chris   32     167      72

Due to the nature of data, CSV files don’t always contain a comma as the separator. For example, the following file contains the | character as its separator, whereas the data in the file contains the comma character.

If you read the above file, the program will run into a ParserError exception, as shown below:

import pandas as pd

# Read csv file into a dataframe
df=pd.read_csv("data_with_sep.csv")
print("The dataframe is:")
print(df)

Output:

ParserError: Error tokenizing data. C error: Expected 3 fields in line 6, saw 4

To avoid the error, you can specify the column separator explicitly using the sep parameter in the read_csv() function. After this, you can read a CSV file containing any separator into a pandas dataframe, as shown in the following example.

import pandas as pd
df=pd.read_csv("data_with_sep.csv",sep="|")
print("The dataframe is:")
print(df)

Output:

The dataframe is:
     Name  Roll                         Subjects
0  Aditya    21          Maths,Physics,Chemistry
1    Joel    30        Biology,Physics,Chemistry
2     Sam    23         History,Geography,Civics
3    Tina    26          Maths,Physics,Chemistry
4   Chris    32  Biology,Maths,Physics,Chemistry

In this example, we passed the | character to the sep parameter in the read_csv() method. Hence, the program runs successfully. In the output dataframe, you can observe that the Subjects column contains the comma character in its values.

Import Excel Data Into Pandas Dataframe

To create a pandas dataframe using an Excel file, you can use the read_excel() function, as shown below.

import pandas as pd
df=pd.read_excel("input_file.xlsx")
print("The dataframe is:")
print(df)

In the above code, we used the following Excel file.

The output of the code looks as follows.

The dataframe is:
     Name  Age  Height  Weight
0  Aditya   21     181      76
1    Joel   30     176      64
2     Sam   23     156      66
3    Tina   26     155      56
4   Chris   32     167      72

The read_excel() function reads only the first sheet of the Excel file into the dataframe. To learn more about how to read Excel files into pandas dataframes, you can read this tutorial on reading Excel data in Python.

Select One or More Rows From a Pandas Dataframe

We can use the iloc attribute and the indexing operator to select a row at a given position. The iloc attribute contains an _iLocIndexer object that you can use to select rows from a pandas dataframe by position, as shown below:

import pandas as pd

# Create a list of dictionaries
data=[{"Name":"Aditya","Age": 21,"Height": 181, "Weight":76},
     {"Name":"Joel","Age": 30,"Height": 176, "Weight":64},
      {"Name":"Sam","Age": 23,"Height": 156, "Weight":66},
     {"Name":"Tina","Age": 26,"Height": 155, "Weight":56},
    {"Name":"Chris","Age": 32,"Height": 167, "Weight":72}]

# Create a dataframe
df=pd.DataFrame(data)
print("The dataframe is:")
print(df)

# Select a row
row=df.iloc[2]
print("The selected row is:")
print(row)

Output:

The dataframe is:
     Name  Age  Height  Weight
0  Aditya   21     181      76
1    Joel   30     176      64
2     Sam   23     156      66
3    Tina   26     155      56
4   Chris   32     167      72
The selected row is:
Name      Sam
Age        23
Height    156
Weight     66
Name: 2, dtype: object

In the above example, we used the iloc attribute to select the row at position 2. Hence, we get the third row from the dataframe in the row variable.

You can also select rows at multiple positions using the iloc attribute. For this, you can pass the list of positions to the indexing operator with the iloc attribute. For example, we can select the rows at positions 1, 2, and 4 from a given dataframe, as shown below.

import pandas as pd

# Create a list of dictionaries
data=[{"Name":"Aditya","Age": 21,"Height": 181, "Weight":76},
     {"Name":"Joel","Age": 30,"Height": 176, "Weight":64},
      {"Name":"Sam","Age": 23,"Height": 156, "Weight":66},
     {"Name":"Tina","Age": 26,"Height": 155, "Weight":56},
    {"Name":"Chris","Age": 32,"Height": 167, "Weight":72}]

# Create a dataframe
df=pd.DataFrame(data)
print("The dataframe is:")
print(df)

# Select rows at position 1, 2, and 4
rows=df.iloc[[1,2,4]]
print("The selected rows are:")
print(rows)

Output:

The dataframe is:
     Name  Age  Height  Weight
0  Aditya   21     181      76
1    Joel   30     176      64
2     Sam   23     156      66
3    Tina   26     155      56
4   Chris   32     167      72
The selected rows are:
    Name  Age  Height  Weight
1   Joel   30     176      64
2    Sam   23     156      66
4  Chris   32     167      72

Instead of specific positions, you can select rows in a given range using the iloc attribute. For this, you can pass the starting and final positions of the rows that you want to select from the dataframe using the following syntax.

rows=df.iloc[[starting_position:final_position]]


After executing the above statement, the rows from starting_position to final_position-1 position are assigned to the rows variable. You can observe this in the following example.

import pandas as pd

# Create a list of dictionaries
data=[{"Name":"Aditya","Age": 21,"Height": 181, "Weight":76},
     {"Name":"Joel","Age": 30,"Height": 176, "Weight":64},
      {"Name":"Sam","Age": 23,"Height": 156, "Weight":66},
     {"Name":"Tina","Age": 26,"Height": 155, "Weight":56},
    {"Name":"Chris","Age": 32,"Height": 167, "Weight":72}]

# Create a dataframe
df=pd.DataFrame(data)
print("The dataframe is:")
print(df)

# Select rows from position 1 to 4
rows=df.iloc[1:4]
print("The selected rows are:")
print(rows)

Output:

The dataframe is:
     Name  Age  Height  Weight
0  Aditya   21     181      76
1    Joel   30     176      64
2     Sam   23     156      66
3    Tina   26     155      56
4   Chris   32     167      72
The selected rows are:
   Name  Age  Height  Weight
1  Joel   30     176      64
2   Sam   23     156      66
3  Tina   26     155      56

Select One or More Columns From a Dataframe in Python

To select a column from a dataframe, you can use the indexing operator with the column name as shown below.

import pandas as pd

# Create a list of dictionaries
data=[{"Name":"Aditya","Age": 21,"Height": 181, "Weight":76},
     {"Name":"Joel","Age": 30,"Height": 176, "Weight":64},
      {"Name":"Sam","Age": 23,"Height": 156, "Weight":66},
     {"Name":"Tina","Age": 26,"Height": 155, "Weight":56},
    {"Name":"Chris","Age": 32,"Height": 167, "Weight":72}]

# Create a dataframe
df=pd.DataFrame(data)
print("The dataframe is:")
print(df)

# Select a column
column=df["Name"]
print("The selected column is:")
print(column)

Output:

The dataframe is:
     Name  Age  Height  Weight
0  Aditya   21     181      76
1    Joel   30     176      64
2     Sam   23     156      66
3    Tina   26     155      56
4   Chris   32     167      72
The selected column is:
0    Aditya
1      Joel
2       Sam
3      Tina
4     Chris
Name: Name, dtype: object

To select multiple columns from a pandas dataframe, you can pass the list of column names to the indexing operator.

import pandas as pd

# Create a list of dictionaries
data=[{"Name":"Aditya","Age": 21,"Height": 181, "Weight":76},
     {"Name":"Joel","Age": 30,"Height": 176, "Weight":64},
      {"Name":"Sam","Age": 23,"Height": 156, "Weight":66},
     {"Name":"Tina","Age": 26,"Height": 155, "Weight":56},
    {"Name":"Chris","Age": 32,"Height": 167, "Weight":72}]

# Create a dataframe
df=pd.DataFrame(data)
print("The dataframe is:")
print(df)

# Select columns
columns=df[["Name", "Height"]]
print("The selected columns are:")
print(columns)

Output:

The dataframe is:
     Name  Age  Height  Weight
0  Aditya   21     181      76
1    Joel   30     176      64
2     Sam   23     156      66
3    Tina   26     155      56
4   Chris   32     167      72
The selected columns are:
     Name  Height
0  Aditya     181
1    Joel     176
2     Sam     156
3    Tina     155
4   Chris     167

Select a Value From a Pandas Dataframe

To select a value in a specific cell of a given dataframe, you must first select the required column. Then, you can select the value at the given row in the selected column using the iloc attribute.

import pandas as pd

# Create a list of dictionaries
data=[{"Name":"Aditya","Age": 21,"Height": 181, "Weight":76},
     {"Name":"Joel","Age": 30,"Height": 176, "Weight":64},
      {"Name":"Sam","Age": 23,"Height": 156, "Weight":66},
     {"Name":"Tina","Age": 26,"Height": 155, "Weight":56},
    {"Name":"Chris","Age": 32,"Height": 167, "Weight":72}]

# Create a dataframe
df=pd.DataFrame(data)
print("The dataframe is:")
print(df)

# Select a value
column=df["Name"].iloc[3]
print("The selected value is:")
print(column)

Output:

The dataframe is:
     Name  Age  Height  Weight
0  Aditya   21     181      76
1    Joel   30     176      64
2     Sam   23     156      66
3    Tina   26     155      56
4   Chris   32     167      72
The selected value is:
Tina

Alternatively, you can first select the row from which you want to read the data. Then, you can select the value in a particular column of the select row as shown below:

import pandas as pd

# Create a list of dictionaries
data=[{"Name":"Aditya","Age": 21,"Height": 181, "Weight":76},
     {"Name":"Joel","Age": 30,"Height": 176, "Weight":64},
      {"Name":"Sam","Age": 23,"Height": 156, "Weight":66},
     {"Name":"Tina","Age": 26,"Height": 155, "Weight":56},
    {"Name":"Chris","Age": 32,"Height": 167, "Weight":72}]

# Create a dataframe
df=pd.DataFrame(data)
print("The dataframe is:")
print(df)

# Select a value
column=df.iloc[3]["Name"]
print("The selected value is:")
print(column)

Output:

The dataframe is:
     Name  Age  Height  Weight
0  Aditya   21     181      76
1    Joel   30     176      64
2     Sam   23     156      66
3    Tina   26     155      56
4   Chris   32     167      72
The selected value is:
Tina

Add Rows to a Pandas Dataframe in Python

To add a new row to a pandas dataframe, we use the concat() function defined in the pandas module. The concat() function has the following syntax:

concat(objs,ignore_index=False, axis=0, join='outer')

Here,

  • The objs parameter takes a list of dataframes that we want to concatenate.
  • By default, the indices of the input dataframes are carried into the output dataframe. If you want to reset the indices in the output dataframe, you can set the ignore_index parameter to True.
  • The axis parameter decides whether to concatenate the dataframes horizontally or vertically. By default, the axis parameter is set to 0. Hence, the input dataframes given to the objs parameter are concatenated vertically. To concatenate the dataframes horizontally, you can set the axis parameter to 1.
  • The join parameter is used only when the axis parameter is set to 1, i.e., when we concatenate the dataframes horizontally. Here, the join parameter takes values such as "left", "right", "inner", and "outer" to decide on SQL joins in merging dataframes passed to the objs parameter based on one or more columns.

After execution, the concat() function returns a new dataframe containing data from all the input dataframes.

Add a Single Row to a Pandas Dataframe

To add a new row to a pandas dataframe, we first create a new dataframe using the new data and the DataFrame() function. Next, we use the concat() function to concatenate the existing dataframe and the dataframe containing the new row. After this, we get the concatenated dataframe as shown below:

import pandas as pd

# Create a list of dictionaries
data=[{"Name":"Aditya","Age": 21,"Height": 181},
     {"Name":"Joel","Age": 30,"Height": 176},
      {"Name":"Sam","Age": 23,"Height": 156},
     {"Name":"Tina","Age": 26,"Height": 155},
    {"Name":"Chris","Age": 32,"Height": 167}]

# Create a dataframe
existing_df=pd.DataFrame(data)
print("The original dataframe is:")
print(existing_df)

# Define a new list of list to create a row
new_row=[["Joe", 80, 180]]

# Create a dataframe with a single row
new_df=pd.DataFrame(new_row, columns=["Name", "Age", "Height"])

# Concatenate the new dataframe to the previous one
df=pd.concat([existing_df,new_df])
print("The updated dataframe is:")
print(df)

Output:

The original dataframe is:
     Name  Age  Height
0  Aditya   21     181
1    Joel   30     176
2     Sam   23     156
3    Tina   26     155
4   Chris   32     167
The updated dataframe is:
     Name  Age  Height
0  Aditya   21     181
1    Joel   30     176
2     Sam   23     156
3    Tina   26     155
4   Chris   32     167
0     Joe   80     180

In the above example, the last row in the output dataframe contains the index 0. This is because the indices of the input dataframes are also concatenated by the concat() function. To reset the index in the output dataframe, you can set the ignore_index parameter to True, as shown below:

import pandas as pd

# Create a list of dictionaries
data=[{"Name":"Aditya","Age": 21,"Height": 181},
     {"Name":"Joel","Age": 30,"Height": 176},
      {"Name":"Sam","Age": 23,"Height": 156},
     {"Name":"Tina","Age": 26,"Height": 155},
    {"Name":"Chris","Age": 32,"Height": 167}]

# Create a dataframe
existing_df=pd.DataFrame(data)
print("The original dataframe is:")
print(existing_df)

# Create a list of lists for new row
new_row=[["Joe", 80, 180]]

# Create a dataframe with a single row
new_df=pd.DataFrame(new_row, columns=["Name", "Age", "Height"])

# Concatenate the new dataframe to the original dataframe
df=pd.concat([existing_df,new_df], ignore_index=True)
print("The updated dataframe is:")
print(df)

Output:

The original dataframe is:
     Name  Age  Height
0  Aditya   21     181
1    Joel   30     176
2     Sam   23     156
3    Tina   26     155
4   Chris   32     167
The updated dataframe is:
     Name  Age  Height
0  Aditya   21     181
1    Joel   30     176
2     Sam   23     156
3    Tina   26     155
4   Chris   32     167
5     Joe   80     180

Instead of using lists to create the new dataframe for inserting rows, you can also use dictionaries, as shown in the following example:

import pandas as pd
# Create a list of dictionaries for original dataframe
data=[{"Name":"Aditya","Age": 21,"Height": 181},
     {"Name":"Joel","Age": 30,"Height": 176},
      {"Name":"Sam","Age": 23,"Height": 156},
     {"Name":"Tina","Age": 26,"Height": 155},
    {"Name":"Chris","Age": 32,"Height": 167}]

# Create the original dataframe
existing_df=pd.DataFrame(data)
print("The dataframe is:")
print(existing_df)
# Create a list of dictionaries for new row
new_row=[{"Name":"Joe","Age": 80,"Height": 180}]

# Create a dataframe with a single row
new_df=pd.DataFrame(new_row)

# Concatenate the new and original dataframes
df=pd.concat([existing_df,new_df], ignore_index=True)
print("The updated dataframe is:")
print(df)

Output:

The dataframe is:
     Name  Age  Height
0  Aditya   21     181
1    Joel   30     176
2     Sam   23     156
3    Tina   26     155
4   Chris   32     167
The updated dataframe is:
     Name  Age  Height
0  Aditya   21     181
1    Joel   30     176
2     Sam   23     156
3    Tina   26     155
4   Chris   32     167
5     Joe   80     180

We passed the list [existing_df,new_df] to the concat() function in the previous examples. Hence, the new row is added to the bottom of the existing dataframe.

If you want to add a new row to the top of the existing dataframe, you can reverse the order of the dataframes in the list to [new_df, existing_df] as shown below:

import pandas as pd
# Create a list of dictionaries for original dataframe
data=[{"Name":"Aditya","Age": 21,"Height": 181},
     {"Name":"Joel","Age": 30,"Height": 176},
      {"Name":"Sam","Age": 23,"Height": 156},
     {"Name":"Tina","Age": 26,"Height": 155},
    {"Name":"Chris","Age": 32,"Height": 167}]

# Create the original dataframe
existing_df=pd.DataFrame(data)
print("The dataframe is:")
print(existing_df)
# Create a list of dictionaries for new row
new_row=[{"Name":"Joe","Age": 80,"Height": 180}]

# Create a dataframe with a single row
new_df=pd.DataFrame(new_row)

# Concatenate the new and original dataframes
df=pd.concat([new_df, existing_df], ignore_index=True)
print("The updated dataframe is:")
print(df)

Output:

The dataframe is:
     Name  Age  Height
0  Aditya   21     181
1    Joel   30     176
2     Sam   23     156
3    Tina   26     155
4   Chris   32     167
The updated dataframe is:
     Name  Age  Height
0     Joe   80     180
1  Aditya   21     181
2    Joel   30     176
3     Sam   23     156
4    Tina   26     155
5   Chris   32     167

Add Multiple Rows to a Pandas Dataframe

To add multiple rows to a dataframe, you can create a dataframe using the new rows and add it to the existing dataframe as shown below:

import pandas as pd

# Create list of dictionaries for original dataframe
data=[{"Name":"Aditya","Age": 21,"Height": 181},
     {"Name":"Joel","Age": 30,"Height": 176},
      {"Name":"Sam","Age": 23,"Height": 156},
     {"Name":"Tina","Age": 26,"Height": 155},
    {"Name":"Chris","Age": 32,"Height": 167}]

# Create the original dataframe
existing_df=pd.DataFrame(data)
print("The dataframe is:")
print(existing_df)

# Create a list of lists
new_rows=[["Joe", 80, 180],
        ["Samantha", 24, 165]]

# Create a dataframe with multiple rows
new_df=pd.DataFrame(new_rows, columns=["Name", "Age", "Height"])

# Concatenate the new dataframe to the original dataframe
df=pd.concat([existing_df,new_df], ignore_index=True)
print("The updated dataframe is:")
print(df)

Output:

The dataframe is:
     Name  Age  Height
0  Aditya   21     181
1    Joel   30     176
2     Sam   23     156
3    Tina   26     155
4   Chris   32     167
The updated dataframe is:
       Name  Age  Height
0    Aditya   21     181
1      Joel   30     176
2       Sam   23     156
3      Tina   26     155
4     Chris   32     167
5       Joe   80     180
6  Samantha   24     165

Instead of a list of lists, you can use the list of dictionaries to add multiple rows to an existing pandas dataframe.

import pandas as pd

# Create list of dictionaries for original dataframe
data=[{"Name":"Aditya","Age": 21,"Height": 181},
     {"Name":"Joel","Age": 30,"Height": 176},
      {"Name":"Sam","Age": 23,"Height": 156},
     {"Name":"Tina","Age": 26,"Height": 155},
    {"Name":"Chris","Age": 32,"Height": 167}]

# Create the original dataframe
existing_df=pd.DataFrame(data)
print("The dataframe is:")
print(existing_df)

# Create a list of dictionaries
new_rows=[{"Name":"Joe","Age": 80,"Height": 180},
     {"Name":"Samantha","Age": 24,"Height": 165}]

# Create a dataframe with multiple rows
new_df=pd.DataFrame(new_rows, columns=["Name", "Age", "Height"])

# Concatenate the new dataframe to the original dataframe
df=pd.concat([existing_df,new_df], ignore_index=True)
print("The updated dataframe is:")
print(df)

Output:

The dataframe is:
     Name  Age  Height
0  Aditya   21     181
1    Joel   30     176
2     Sam   23     156
3    Tina   26     155
4   Chris   32     167
The updated dataframe is:
       Name  Age  Height
0    Aditya   21     181
1      Joel   30     176
2       Sam   23     156
3      Tina   26     155
4     Chris   32     167
5       Joe   80     180
6  Samantha   24     165

Again, you can reverse the order of values in the list passed to the concat() function to add new rows to the top of the existing dataframe.

import pandas as pd

# Create list of dictionaries for original dataframe
data=[{"Name":"Aditya","Age": 21,"Height": 181},
     {"Name":"Joel","Age": 30,"Height": 176},
      {"Name":"Sam","Age": 23,"Height": 156},
     {"Name":"Tina","Age": 26,"Height": 155},
    {"Name":"Chris","Age": 32,"Height": 167}]

# Create the original dataframe
existing_df=pd.DataFrame(data)
print("The dataframe is:")
print(existing_df)

# Create a list of dictionaries
new_rows=[{"Name":"Joe","Age": 80,"Height": 180},
     {"Name":"Samantha","Age": 24,"Height": 165}]

# Create a dataframe with multiple rows
new_df=pd.DataFrame(new_rows, columns=["Name", "Age", "Height"])

# Concatenate the new dataframe to the original dataframe
df=pd.concat([new_df, existing_df], ignore_index=True)
print("The updated dataframe is:")
print(df)

Output:

The dataframe is:
     Name  Age  Height
0  Aditya   21     181
1    Joel   30     176
2     Sam   23     156
3    Tina   26     155
4   Chris   32     167
The updated dataframe is:
       Name  Age  Height
0       Joe   80     180
1  Samantha   24     165
2    Aditya   21     181
3      Joel   30     176
4       Sam   23     156
5      Tina   26     155
6     Chris   32     167

Add Columns to a Pandas Dataframe

To add a new column to a dataframe, you can create a list of values in the new column. Then, you can add the list as a column to the dataframe using the indexing operator as shown below:

import pandas as pd

# Create a list of dictionaries
data=[{"Name":"Aditya","Age": 21,"Height": 181},
     {"Name":"Joel","Age": 30,"Height": 176},
      {"Name":"Sam","Age": 23,"Height": 156},
     {"Name":"Tina","Age": 26,"Height": 155},
    {"Name":"Chris","Age": 32,"Height": 167}]

# Create the original dataframe
df=pd.DataFrame(data)
print("The original dataframe is:")
print(df)

# Create a new column using column name and a list
df["Weight"]=[76, 64, 66, 56, 72]
print("The updated dataframe is:")
print(df)

Output:

The original dataframe is:
     Name  Age  Height
0  Aditya   21     181
1    Joel   30     176
2     Sam   23     156
3    Tina   26     155
4   Chris   32     167
The updated dataframe is:
     Name  Age  Height  Weight
0  Aditya   21     181      76
1    Joel   30     176      64
2     Sam   23     156      66
3    Tina   26     155      56
4   Chris   32     167      72

While adding a list as a new column to a pandas dataframe, you must ensure that the list’s length must equal the number of existing rows in the dataframe. Otherwise, the program will run into a ValueError exception. For instance, consider the following example.

import pandas as pd

# Create a list of dictionaries
data=[{"Name":"Aditya","Age": 21,"Height": 181},
     {"Name":"Joel","Age": 30,"Height": 176},
      {"Name":"Sam","Age": 23,"Height": 156},
     {"Name":"Tina","Age": 26,"Height": 155},
    {"Name":"Chris","Age": 32,"Height": 167}]

#Create the original dataframe
df=pd.DataFrame(data)
print("The dataframe is:")
print(df)

# Add a new column with fewer values than rows
df["Weight"]=[76, 64, 66, 56]
print("The updated dataframe is:")
print(df)

Output:

The dataframe is:
     Name  Age  Height
0  Aditya   21     181
1    Joel   30     176
2     Sam   23     156
3    Tina   26     155
4   Chris   32     167

ValueError: Length of values (4) does not match length of index (5)

In the above example, we tried to assign a list containing four elements to a dataframe containing five rows. Hence, the program runs into error. Similarly, if we pass a list containing more elements than the number of rows in the dataframe, the program will run into an error.

import pandas as pd

# Create a list of dictionaries
data=[{"Name":"Aditya","Age": 21,"Height": 181},
     {"Name":"Joel","Age": 30,"Height": 176},
      {"Name":"Sam","Age": 23,"Height": 156},
     {"Name":"Tina","Age": 26,"Height": 155},
    {"Name":"Chris","Age": 32,"Height": 167}]

#Create the original dataframe
df=pd.DataFrame(data)
print("The dataframe is:")
print(df)

# Add a new column with more values than rows
df["Weight"]=[76, 64, 66, 56, 72, 85]
print("The updated dataframe is:")
print(df)

Output:

The dataframe is:
     Name  Age  Height
0  Aditya   21     181
1    Joel   30     176
2     Sam   23     156
3    Tina   26     155
4   Chris   32     167

ValueError: Length of values (6) does not match length of index (5)

You can also use one or more existing dataframe columns to create a new column. For this, you can perform arithmetic operations on existing columns, as shown in the following example:

import pandas as pd

# Create a list of dictionaries
data=[{"Name":"Aditya","Age": 21,"Height": 181, "Weight":76},
     {"Name":"Joel","Age": 30,"Height": 176, "Weight":64},
      {"Name":"Sam","Age": 23,"Height": 156, "Weight":66},
     {"Name":"Tina","Age": 26,"Height": 155, "Weight":56},
    {"Name":"Chris","Age": 32,"Height": 167, "Weight":72}]

# Create the original dataframe
df=pd.DataFrame(data)
print("The original dataframe is:")
print(df)

# Create a column using an existing column
df["Height_in_m"]=df["Height"]/100

#Create a column using two columns
df["BMI"]=df["Weight"]/df["Height_in_m"]**2

print("The updated dataframe is:")
print(df)

Output:

The original dataframe is:
     Name  Age  Height  Weight
0  Aditya   21     181      76
1    Joel   30     176      64
2     Sam   23     156      66
3    Tina   26     155      56
4   Chris   32     167      72
The updated dataframe is:
     Name  Age  Height  Weight  Height_in_m        BMI
0  Aditya   21     181      76         1.81  23.198315
1    Joel   30     176      64         1.76  20.661157
2     Sam   23     156      66         1.56  27.120316
3    Tina   26     155      56         1.55  23.309053
4   Chris   32     167      72         1.67  25.816630

Iterate Through Rows of a Pandas Dataframe

To iterate through the rows of a pandas dataframe, you can use the iterrows() function. The iterrows() function, when invoked on a dataframe, returns an iterator object. The iterator contains tuples containing the index and the corresponding row that we can read using a for loop, as shown below:

import pandas as pd

# Create a list of dictionaries
data=[{"Name":"Aditya","Age": 21,"Height": 181, "Weight":76},
     {"Name":"Joel","Age": 30,"Height": 176, "Weight":64},
      {"Name":"Sam","Age": 23,"Height": 156, "Weight":66}]

# Create the dataframe
df=pd.DataFrame(data)

# Create an iterator
row_iterator=df.iterrows()

# Use for loop to iterate through the rows
for index, row in row_iterator:
    print("The index is:",index)
    print(row)

Output:

The index is: 0
Name      Aditya
Age           21
Height       181
Weight        76
Name: 0, dtype: object
The index is: 1
Name      Joel
Age         30
Height     176
Weight      64
Name: 1, dtype: object
The index is: 2
Name      Sam
Age        23
Height    156
Weight     66
Name: 2, dtype: object

Drop Rows or Columns From a Pandas Dataframe

To drop a row or a column from a pandas dataframe, we use the drop() method. The drop() method has the following syntax.

df.drop(labels, axis=0, inplace=False)


Here,

  • The labels parameter takes a list of indices or column names we want to drop from the dataframe df.
  • We use the axis parameter to decide if we need to drop the rows or columns of the dataframe. By default, the axis parameter is set to 0. Hence, the rows of the dataframe are dropped. To drop dataframe columns, you can set the axis parameter to 1.
  • The inplace parameter is used to decide if we want to modify the existing dataframe or create a new dataframe by dropping a row or column from the existing dataframe. By default, the inplace parameter is set to False. Hence, the original dataframe is not modified, and a new dataframe is created after dropping rows or columns. To alter the existing dataframe, you can set the inplace parameter to True.

After execution, the drop() method returns the modified dataframe if the inplace parameter is False. Otherwise, it returns None.

Drop One or More Rows From a Dataframe in Python

To drop a row from a pandas dataframe, you can use the drop() method and the axis parameter, as shown below.

import pandas as pd

# Create a list of dictionaries
data=[{"Name":"Aditya","Age": 21,"Height": 181, "Weight":76},
     {"Name":"Joel","Age": 30,"Height": 176, "Weight":64},
      {"Name":"Sam","Age": 23,"Height": 156, "Weight":66},
     {"Name":"Tina","Age": 26,"Height": 155, "Weight":56},
    {"Name":"Chris","Age": 32,"Height": 167, "Weight":72}]

# Create the original dataframe
df=pd.DataFrame(data)
print("The original dataframe is:")
print(df)

# Drop the row at index 0
new_df=df.drop([0],axis=0)
print("The modified dataframe is:")
print(new_df)

Output:

The original dataframe is:
     Name  Age  Height  Weight
0  Aditya   21     181      76
1    Joel   30     176      64
2     Sam   23     156      66
3    Tina   26     155      56
4   Chris   32     167      72
The modified dataframe is:
    Name  Age  Height  Weight
1   Joel   30     176      64
2    Sam   23     156      66
3   Tina   26     155      56
4  Chris   32     167      72

In the above example, we passed the list [0] as the first input argument to the drop() method. This axis parameter is also set to 0. Hence, the row at index 0 is removed from the original dataframe, and the drop() method returns a new dataframe. Here, the original dataframe hasn’t been modified.

If you want to modify the input dataframe instead of creating a new dataframe, you can set the inplace parameter to True as shown below:

import pandas as pd

# Create a list of dictionaries
data=[{"Name":"Aditya","Age": 21,"Height": 181, "Weight":76},
     {"Name":"Joel","Age": 30,"Height": 176, "Weight":64},
      {"Name":"Sam","Age": 23,"Height": 156, "Weight":66},
     {"Name":"Tina","Age": 26,"Height": 155, "Weight":56},
    {"Name":"Chris","Age": 32,"Height": 167, "Weight":72}]

#Create the original dataframe
df=pd.DataFrame(data)
print("The original dataframe is:")
print(df)

# Drop row from the dataframe inplace
df.drop([0],axis=0, inplace=True)
print("The modified dataframe is:")
print(df)

Output:

The original dataframe is:
     Name  Age  Height  Weight
0  Aditya   21     181      76
1    Joel   30     176      64
2     Sam   23     156      66
3    Tina   26     155      56
4   Chris   32     167      72
The modified dataframe is:
    Name  Age  Height  Weight
1   Joel   30     176      64
2    Sam   23     156      66
3   Tina   26     155      56
4  Chris   32     167      72

To delete multiple rows from a pandas dataframe, you can pass a list containing all the indices you want to drop as input to the drop() method, as shown below.

import pandas as pd

# Create a list of dictionaries
data=[{"Name":"Aditya","Age": 21,"Height": 181, "Weight":76},
     {"Name":"Joel","Age": 30,"Height": 176, "Weight":64},
      {"Name":"Sam","Age": 23,"Height": 156, "Weight":66},
     {"Name":"Tina","Age": 26,"Height": 155, "Weight":56},
    {"Name":"Chris","Age": 32,"Height": 167, "Weight":72}]
# Create the original dataframe
df=pd.DataFrame(data)
print("The original dataframe is:")
print(df)

# Drop multiple rows from the dataframe
df.drop([0, 2, 3],axis=0, inplace=True)
print("The modified dataframe is:")
print(df)

Output:

The original dataframe is:
     Name  Age  Height  Weight
0  Aditya   21     181      76
1    Joel   30     176      64
2     Sam   23     156      66
3    Tina   26     155      56
4   Chris   32     167      72
The modified dataframe is:
    Name  Age  Height  Weight
1   Joel   30     176      64
4  Chris   32     167      72

In the above example, we passed the list [0, 2, 3] to the drop() method as input. Hence, the rows at indices 0, 2, and 3 are dropped from the input dataframe.

Drop One or More Columns From a Pandas Dataframe

To drop a column from a pandas dataframe, you can pass the column name in a list to the drop() method as the first input argument and set the axis parameter to 1. After execution, the drop() method returns a new dataframe by removing the specified column from the input dataframe.

import pandas as pd

# Create a list of dictionaries
data=[{"Name":"Aditya","Age": 21,"Height": 181, "Weight":76},
     {"Name":"Joel","Age": 30,"Height": 176, "Weight":64},
      {"Name":"Sam","Age": 23,"Height": 156, "Weight":66},
     {"Name":"Tina","Age": 26,"Height": 155, "Weight":56},
    {"Name":"Chris","Age": 32,"Height": 167, "Weight":72}]

# Create the original dataframe
df=pd.DataFrame(data)
print("The original dataframe is:")
print(df)

# Drop a row from the dataframe
df=df.drop(["Weight"],axis=1)
print("The modified dataframe is:")
print(df)

Output:

The original dataframe is:
     Name  Age  Height  Weight
0  Aditya   21     181      76
1    Joel   30     176      64
2     Sam   23     156      66
3    Tina   26     155      56
4   Chris   32     167      72
The modified dataframe is:
     Name  Age  Height
0  Aditya   21     181
1    Joel   30     176
2     Sam   23     156
3    Tina   26     155
4   Chris   32     167

To drop a column inplace from a pandas dataframe, you can set the inplace parameter to True as shown below:

import pandas as pd

# Create a list of dictionaries
data=[{"Name":"Aditya","Age": 21,"Height": 181, "Weight":76},
     {"Name":"Joel","Age": 30,"Height": 176, "Weight":64},
      {"Name":"Sam","Age": 23,"Height": 156, "Weight":66},
     {"Name":"Tina","Age": 26,"Height": 155, "Weight":56},
    {"Name":"Chris","Age": 32,"Height": 167, "Weight":72}]

# Create the original dataframe
df=pd.DataFrame(data)
print("The original dataframe is:")
print(df)

# Drop a column inplace
df.drop(["Weight"],axis=1, inplace=True)
print("The modified dataframe is:")
print(df)

Output:

The original dataframe is:
     Name  Age  Height  Weight
0  Aditya   21     181      76
1    Joel   30     176      64
2     Sam   23     156      66
3    Tina   26     155      56
4   Chris   32     167      72
The modified dataframe is:
     Name  Age  Height
0  Aditya   21     181
1    Joel   30     176
2     Sam   23     156
3    Tina   26     155
4   Chris   32     167

To drop multiple columns from a dataframe, you can pass all the column names you want to drop in an input list to the drop() method as shown below:

import pandas as pd

# Create a list of dictionaries
data=[{"Name":"Aditya","Age": 21,"Height": 181, "Weight":76},
     {"Name":"Joel","Age": 30,"Height": 176, "Weight":64},
      {"Name":"Sam","Age": 23,"Height": 156, "Weight":66},
     {"Name":"Tina","Age": 26,"Height": 155, "Weight":56},
    {"Name":"Chris","Age": 32,"Height": 167, "Weight":72}]

# Create the original dataframe
df=pd.DataFrame(data)
print("The original dataframe is:")
print(df)

# Drop multiple rows
df.drop(["Weight", "Height"],axis=1, inplace=True)
print("The modified dataframe is:")
print(df)

Output:

The original dataframe is:
     Name  Age  Height  Weight
0  Aditya   21     181      76
1    Joel   30     176      64
2     Sam   23     156      66
3    Tina   26     155      56
4   Chris   32     167      72
The modified dataframe is:
     Name  Age
0  Aditya   21
1    Joel   30
2     Sam   23
3    Tina   26
4   Chris   32

Count Rows and Columns in a Dataframe

To find the number of rows in a pandas dataframe, we use the len() function. The len() function takes the dataframe as its input and returns the number of rows in the dataframe.

import pandas as pd

# Create a list of dictionaries
data=[{"Name":"Aditya","Age": 21,"Height": 181, "Weight":76},
     {"Name":"Joel","Age": 30,"Height": 176, "Weight":64},
      {"Name":"Sam","Age": 23,"Height": 156, "Weight":66},
     {"Name":"Tina","Age": 26,"Height": 155, "Weight":56},
    {"Name":"Chris","Age": 32,"Height": 167, "Weight":72}]

# Create the dataframe
df=pd.DataFrame(data)
print("The dataframe is:")
print(df)
# Count rows in the dataframe
row_count=len(df)
print("The number of rows in the dataframe is:", row_count)

Output:

The dataframe is:
     Name  Age  Height  Weight
0  Aditya   21     181      76
1    Joel   30     176      64
2     Sam   23     156      66
3    Tina   26     155      56
4   Chris   32     167      72
The number of rows in the dataframe is: 5

To find the number of columns in a pandas dataframe, we use the columns attribute of the dataframe with the len() function. The columns.values attribute of a pandas dataframe contains a list of column names. We can find the length of this list to find the number of columns in a given dataframe as shown in the following example:

import pandas as pd

# Create a list of dictionaries
data=[{"Name":"Aditya","Age": 21,"Height": 181, "Weight":76},
     {"Name":"Joel","Age": 30,"Height": 176, "Weight":64},
      {"Name":"Sam","Age": 23,"Height": 156, "Weight":66},
     {"Name":"Tina","Age": 26,"Height": 155, "Weight":56},
    {"Name":"Chris","Age": 32,"Height": 167, "Weight":72}]

# Create the dataframe
df=pd.DataFrame(data)
print("The dataframe is:")
print(df)

# Get column names in a list
column_names=df.columns.values

 # Find the number of columns
column_count=len(column_names)
print("The columns in the dataframe are:",column_names)
print("The number of columns in the dataframe is:", column_count)

Output:

The dataframe is:
     Name  Age  Height  Weight
0  Aditya   21     181      76
1    Joel   30     176      64
2     Sam   23     156      66
3    Tina   26     155      56
4   Chris   32     167      72
The columns in the dataframe are: ['Name' 'Age' 'Height' 'Weight']
The number of columns in the dataframe is: 4

Rename a Column in a Pandas Dataframe

To rename a column in a pandas dataframe, you can use the rename() method. The rename() method, when invoked on a pandas dataframe, takes a dictionary as its input argument. The dictionary should contain the existing column names we want to rename as its keys and the corresponding new ones as values. After execution, it returns a new dataframe with columns renamed according to the input dictionary.

import pandas as pd

# Create a list of dictionaries
data=[{"Name":"Aditya","Age": 21,"Height": 181, "Weight":76},
     {"Name":"Joel","Age": 30,"Height": 176, "Weight":64},
      {"Name":"Sam","Age": 23,"Height": 156, "Weight":66},
     {"Name":"Tina","Age": 26,"Height": 155, "Weight":56},
    {"Name":"Chris","Age": 32,"Height": 167, "Weight":72}]

# Create the original dataframe
df=pd.DataFrame(data)
print("The original dataframe is:")
print(df)

# Rename the Height column
df=df.rename(columns={"Height":"Height_in_cm"})
print("The updated dataframe is:")
print(df)

Output:

The original dataframe is:
     Name  Age  Height  Weight
0  Aditya   21     181      76
1    Joel   30     176      64
2     Sam   23     156      66
3    Tina   26     155      56
4   Chris   32     167      72
The updated dataframe is:
     Name  Age  Height_in_cm  Weight
0  Aditya   21           181      76
1    Joel   30           176      64
2     Sam   23           156      66
3    Tina   26           155      56
4   Chris   32           167      72

In the above example, we passed the dictionary {"Height": "Height_in_cm"} to the rename() method. Hence, The "Height" column is renamed to "Height_in_cm". To rename multiple columns at once, you can pass the existing and new column names as key-value pairs in the dictionary, as shown below:

import pandas as pd

# Create a list of dictionaries
data=[{"Name":"Aditya","Age": 21,"Height": 181, "Weight":76},
     {"Name":"Joel","Age": 30,"Height": 176, "Weight":64},
      {"Name":"Sam","Age": 23,"Height": 156, "Weight":66},
     {"Name":"Tina","Age": 26,"Height": 155, "Weight":56},
    {"Name":"Chris","Age": 32,"Height": 167, "Weight":72}]

# Create the original dataframe
df=pd.DataFrame(data)
print("The original dataframe is:")
print(df)

# Rename multiple columns
df=df.rename(columns={"Height":"Height_in_cm","Weight":"Weight_in_kg"})
print("The updated dataframe is:")
print(df)

Output:

The original dataframe is:
     Name  Age  Height  Weight
0  Aditya   21     181      76
1    Joel   30     176      64
2     Sam   23     156      66
3    Tina   26     155      56
4   Chris   32     167      72
The updated dataframe is:
     Name  Age  Height_in_cm  Weight_in_kg
0  Aditya   21           181            76
1    Joel   30           176            64
2     Sam   23           156            66
3    Tina   26           155            56
4   Chris   32           167            72

By default, the rename() method returns a new dataframe after renaming the columns. You can set the inplace parameter to True to modify the original dataframe instead of creating a new one.

import pandas as pd

# Create a list of dictionaries
data=[{"Name":"Aditya","Age": 21,"Height": 181, "Weight":76},
     {"Name":"Joel","Age": 30,"Height": 176, "Weight":64},
      {"Name":"Sam","Age": 23,"Height": 156, "Weight":66},
     {"Name":"Tina","Age": 26,"Height": 155, "Weight":56},
    {"Name":"Chris","Age": 32,"Height": 167, "Weight":72}]

# Create the original dataframe
df=pd.DataFrame(data)
print("The original dataframe is:")
print(df)

# Rename columns inplace
df.rename(columns={"Height":"Height_in_cm","Weight":"Weight_in_kg"}, inplace=True)
print("The updated dataframe is:")
print(df)

Output:

The original dataframe is:
     Name  Age  Height  Weight
0  Aditya   21     181      76
1    Joel   30     176      64
2     Sam   23     156      66
3    Tina   26     155      56
4   Chris   32     167      72
The updated dataframe is:
     Name  Age  Height_in_cm  Weight_in_kg
0  Aditya   21           181            76
1    Joel   30           176            64
2     Sam   23           156            66
3    Tina   26           155            56
4   Chris   32           167            72

Sort a Pandas Dataframe in Python

We can sort a pandas dataframe using the sort_values() method. The sort_values() method has the following syntax.

df.sort_values(by, ascending=True, inplace=False)

Here,

  • The by parameter takes a list of column names by which we want to sort the dataframe df.
  • By default, the ascending parameter is set to True. Hence, the sort() method sorts the dataframe by values in a given column in ascending order. To sort the dataframe in descending order, you can set the ascending parameter to False.
  • The sort() method returns a new sorted dataframe instead of modifying the original dataframe.To sort the original dataframe, you can set the inplace parameter to True.

We can sort a pandas dataframe by a given column in ascending order, as shown below:

import pandas as pd

# Create a list of dictionaries
data=[{"Name":"Aditya","Age": 21,"Height": 181, "Weight":76},
     {"Name":"Joel","Age": 30,"Height": 176, "Weight":64},
      {"Name":"Sam","Age": 23,"Height": 156, "Weight":66},
     {"Name":"Tina","Age": 26,"Height": 155, "Weight":56},
    {"Name":"Chris","Age": 32,"Height": 167, "Weight":72}]

# Create the original dataframe
df=pd.DataFrame(data)
print("The original dataframe is:")
print(df)

# Sort the dataframe by the Age column 
df=df.sort_values(by=["Age"])
print("The sorted dataframe is:")
print(df)

Output:

The original dataframe is:
     Name  Age  Height  Weight
0  Aditya   21     181      76
1    Joel   30     176      64
2     Sam   23     156      66
3    Tina   26     155      56
4   Chris   32     167      72
The sorted dataframe is:
     Name  Age  Height  Weight
0  Aditya   21     181      76
2     Sam   23     156      66
3    Tina   26     155      56
1    Joel   30     176      64
4   Chris   32     167      72

In this example, we sorted the dataframe by the "Age" column in ascending order.

To sort the dataframe in descending order, you can set the ascending parameter to False.

import pandas as pd

# Create a list of dictionaries
data=[{"Name":"Aditya","Age": 21,"Height": 181, "Weight":76},
     {"Name":"Joel","Age": 30,"Height": 176, "Weight":64},
      {"Name":"Sam","Age": 23,"Height": 156, "Weight":66},
     {"Name":"Tina","Age": 26,"Height": 155, "Weight":56},
    {"Name":"Chris","Age": 32,"Height": 167, "Weight":72}]

# Create the original dataframe
df=pd.DataFrame(data)
print("The original dataframe is:")
print(df)

# Sort the dataframe in descending order
df=df.sort_values(by=["Age"], ascending=False)
print("The sorted dataframe is:")
print(df)

Output:

The original dataframe is:
     Name  Age  Height  Weight
0  Aditya   21     181      76
1    Joel   30     176      64
2     Sam   23     156      66
3    Tina   26     155      56
4   Chris   32     167      72
The sorted dataframe is:
     Name  Age  Height  Weight
4   Chris   32     167      72
1    Joel   30     176      64
3    Tina   26     155      56
2     Sam   23     156      66
0  Aditya   21     181      76

To sort the pandas dataframe inplace, set the inplace parameter to True.

import pandas as pd

# Create a list of dictionaries
data=[{"Name":"Aditya","Age": 21,"Height": 181, "Weight":76},
     {"Name":"Joel","Age": 30,"Height": 176, "Weight":64},
      {"Name":"Sam","Age": 23,"Height": 156, "Weight":66},
     {"Name":"Tina","Age": 26,"Height": 155, "Weight":56},
    {"Name":"Chris","Age": 32,"Height": 167, "Weight":72}]

# Create the original dataframe
df=pd.DataFrame(data)
print("The original dataframe is:")
print(df)

#Sort the dataframe inplace
df.sort_values(by=["Age"], inplace=True)
print("The sorted dataframe is:")
print(df)

Output:

The original dataframe is:
     Name  Age  Height  Weight
0  Aditya   21     181      76
1    Joel   30     176      64
2     Sam   23     156      66
3    Tina   26     155      56
4   Chris   32     167      72
The sorted dataframe is:
     Name  Age  Height  Weight
0  Aditya   21     181      76
2     Sam   23     156      66
3    Tina   26     155      56
1    Joel   30     176      64
4   Chris   32     167      72

Filter Rows in a Pandas Dataframe by Condition

We use the indexing operator with comparison operators to filter rows in a pandas dataframe by a given condition. You can apply a condition on a given column and filter rows from the dataframe as shown in the following example:

import pandas as pd

# Create a list of dictionaries
data=[{"Name":"Aditya","Age": 21,"Height": 181, "Weight":76},
     {"Name":"Joel","Age": 30,"Height": 176, "Weight":64},
      {"Name":"Sam","Age": 23,"Height": 156, "Weight":66},
     {"Name":"Tina","Age": 26,"Height": 155, "Weight":56},
    {"Name":"Chris","Age": 32,"Height": 167, "Weight":72}]

# Create a dataframe
df=pd.DataFrame(data)
print("The dataframe is:")
print(df)

# Filter rows based on condition
filtered_data=df[df["Age"]>25]
print("The filtered dataframe is:")
print(filtered_data)

Output:

The dataframe is:
     Name  Age  Height  Weight
0  Aditya   21     181      76
1    Joel   30     176      64
2     Sam   23     156      66
3    Tina   26     155      56
4   Chris   32     167      72
The filtered dataframe is:
    Name  Age  Height  Weight
1   Joel   30     176      64
3   Tina   26     155      56
4  Chris   32     167      72

To apply multiple conditions while filtering rows in a dataframe, you can use logical operators such as the AND operator or OR operator, as shown below.

import pandas as pd

# Create a list of Dictionaries
data=[{"Name":"Aditya","Age": 21,"Height": 181, "Weight":76},
     {"Name":"Joel","Age": 30,"Height": 176, "Weight":64},
      {"Name":"Sam","Age": 23,"Height": 156, "Weight":66},
     {"Name":"Tina","Age": 26,"Height": 155, "Weight":56},
    {"Name":"Chris","Age": 32,"Height": 167, "Weight":72}]

# Create the dataframe
df=pd.DataFrame(data)
print("The dataframe is:")
print(df)

# Filter data using multiple conditions
filtered_data=df[(df["Age"]>25) & (df["Height"]>160)]
print("The filtered dataframe is:")
print(filtered_data)

Output:

The dataframe is:
     Name  Age  Height  Weight
0  Aditya   21     181      76
1    Joel   30     176      64
2     Sam   23     156      66
3    Tina   26     155      56
4   Chris   32     167      72
The filtered dataframe is:
    Name  Age  Height  Weight
1   Joel   30     176      64
4  Chris   32     167      72

In the above example, we used the AND operator to filter rows in the dataframe using two conditions. Observe that the individual conditions are put into parentheses while using multiple conditions. If you don’t put the individual conditions inside parentheses and use the & operator, the program will run into a ValueError exception.

import pandas as pd

# Create a list of dictionaries
data=[{"Name":"Aditya","Age": 21,"Height": 181, "Weight":76},
     {"Name":"Joel","Age": 30,"Height": 176, "Weight":64},
      {"Name":"Sam","Age": 23,"Height": 156, "Weight":66},
     {"Name":"Tina","Age": 26,"Height": 155, "Weight":56},
    {"Name":"Chris","Age": 32,"Height": 167, "Weight":72}]

# Create the dataframe
df=pd.DataFrame(data)
print("The dataframe is:")
print(df)

# Filter rows by multiple conditions
filtered_data=df[df["Age"]>25 & df["Height"]>160]
print("The filtered dataframe is:")
print(filtered_data)

Output:

The dataframe is:
     Name  Age  Height  Weight
0  Aditya   21     181      76
1    Joel   30     176      64
2     Sam   23     156      66
3    Tina   26     155      56
4   Chris   32     167      72

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

Export Pandas Dataframe Into a CSV or Excel File

You can export a pandas dataframe to a CSV file using the to_csv() method. The to_csv() method, when invoked on a pandas dataframe, takes the filename of the output file and saves the data in the dataframe to the CSV file.

import pandas as pd

# Create a list of dictionaries
data=[{"Name":"Aditya","Age": 21,"Height": 181, "Weight":76},
     {"Name":"Joel","Age": 30,"Height": 176, "Weight":64},
      {"Name":"Sam","Age": 23,"Height": 156, "Weight":66},
     {"Name":"Tina","Age": 26,"Height": 155, "Weight":56},
    {"Name":"Chris","Age": 32,"Height": 167, "Weight":72}]

# Create a dataframe
df=pd.DataFrame(data)

# Export the dataframe to a csv file
df.to_csv("output_file.csv")

Output:

In the above example, the output CSV file contains an unnamed column containing the indices of the dataframe.To avoid the unnamed column in the output file, you can set the index parameter to False in the to_csv() method.

import pandas as pd

# Create a list of dictionaries
data=[{"Name":"Aditya","Age": 21,"Height": 181, "Weight":76},
     {"Name":"Joel","Age": 30,"Height": 176, "Weight":64},
      {"Name":"Sam","Age": 23,"Height": 156, "Weight":66},
     {"Name":"Tina","Age": 26,"Height": 155, "Weight":56},
    {"Name":"Chris","Age": 32,"Height": 167, "Weight":72}]

# Create a dataframe
df=pd.DataFrame(data)

# Export the dataframe to a csv file
df.to_csv("output_without_index.csv", index=False)

Output:

By default, the to_csv() method uses the comma character "," as the separator for the values in the output file. If the data in the dataframe contains the comma character, the output CSV file will be corrupted in such a case.

To avoid this, we can specify a different character for the column separator in the CSV file using the sep parameter in the to_csv() method. For instance, we can use the "|" character as the separator in the CSV file while exporting a pandas dataframe as shown below:

import pandas as pd

# Create a list of dictionaries
data=[{'Name': 'Aditya', 'Roll': 21, 'Subjects': 'Maths,Physics,Chemistry'},
 {'Name': 'Joel', 'Roll': 30, 'Subjects': 'Biology,Physics,Chemistry'},
 {'Name': 'Sam', 'Roll': 23, 'Subjects': 'History,Geography,Civics'},
 {'Name': 'Tina', 'Roll': 26, 'Subjects': 'Maths,Physics,Chemistry'},
 {'Name': 'Chris', 'Roll': 32, 'Subjects': 'Biology,Maths,Physics,Chemistry'}]

# Create a dataframe
df=pd.DataFrame(data)

# Export dataframe to csv file with | separator
df.to_csv("output_with_sep.csv",sep="|", index=False)

Output:

You can also export a pandas dataframe to an Excel file. For this, you can use the to_excel() method, as shown below.

import pandas as pd

# Create a list of dictionaries
data=[{"Name":"Aditya","Age": 21,"Height": 181, "Weight":76},
     {"Name":"Joel","Age": 30,"Height": 176, "Weight":64},
      {"Name":"Sam","Age": 23,"Height": 156, "Weight":66},
     {"Name":"Tina","Age": 26,"Height": 155, "Weight":56},
    {"Name":"Chris","Age": 32,"Height": 167, "Weight":72}]

# Create a dataframe
df=pd.DataFrame(data)

#Export dataframe to Excel file
df.to_excel("output_file.xlsx", index=False)

Output:

Conclusion

In this tutorial, we discussed basic operations on a pandas dataframe in Python, how to create and modify pandas dataframes using various approaches, sorting and filtering dataframes by one or more columns, and exporting pandas dataframes into CSV and Excel files. To learn more about python programming, you can read this article on SimpleHTTPServer in Python. You might also like this article on working with random module in Python.

I hope you enjoyed reading this tutorial. Stay tuned for more informative content.

Happy Learning!

Leave a Reply