A pandas dataframe is a two-dimensional data structure used to handle tabular data in Python. This tutorial discusses basic pandas dataframe operations like creating, reading, writing, and updating using code examples.
- Create a Pandas Dataframe in Python
- Select One or More Rows From a Pandas Dataframe
- Select One or More Columns From a Dataframe in Python
- Select a Value From a Pandas Dataframe
- Add Rows to a Pandas Dataframe in Python
- Add Columns to a Pandas Dataframe
- Iterate Through Rows of a Pandas Dataframe
- Drop Rows or Columns From a Pandas Dataframe
- Count Rows and Columns in a Dataframe
- Rename a Column in a Pandas Dataframe
- Sort a Pandas Dataframe in Python
- Filter Rows in a Pandas Dataframe by Condition
- Export Pandas Dataframe Into a CSV or Excel File
- Conclusion
Create a Pandas Dataframe in Python
We can create a Pandas dataframe using the DataFrame()
function. Alternatively, we can use functions like read_csv()
and read_excel()
to create dataframes from CSV and Excel files. Let us discuss all the approaches to creating pandas dataframes one by one.
Create an Empty Dataframe in Python
In pandas, we can create an empty dataframe using the DataFrame()
function as shown below:
import pandas as pd
# Create an empty dataframe
df=pd.DataFrame()
print("The empty dataframe is:")
print(df)
Output:
The empty dataframe is:
Empty DataFrame
Columns: []
Index: []
The above example shows that the DataFrame()
function returns an empty dataframe with empty Columns
list and Index
list. In a dataframe, the Columns
list contains the column names for the dataframe. The Index
list contains the row identifiers. As the dataframe is empty, both these lists are empty.
You can also create a dataframe with column names but no data. For this, you can use the columns
parameter in the DataFrame()
function. When we pass a list of column names as an input argument to the columns
parameter while executing the DataFrame()
function, we get an empty dataframe with desired column names, as shown in the following example:
import pandas as pd
# Define a list of column names
column_names=["Name","Age","Height"]
# Specify column names using the columns parameter
df=pd.DataFrame(columns=column_names)
print("The dataframe with column names is:")
print(df)
Output:
The dataframe with column names is:
Empty DataFrame
Columns: [Name, Age, Height]
Index: []
In this example, we passed the list ["Name", "Age", "Height"]
to the DataFrame()
function as input to the columns
parameter. Hence, the output dataframe contains three columns.
Instead of empty dataframes, we can use dataframes with data given in tabular format. We can use a list of lists or dictionaries, a CSV file, or an Excel file, as discussed in the following subsections.
List of Lists to Pandas Dataframe in Python
To create a pandas dataframe using a list of lists, we pass the list to the DataFrame()
function as its first input argument. Each list in the input data constitutes a row in the output dataframe. The number of columns in the dataframe is equal to the length of the longest list in the input list of lists. You can observe this in the following example:
import pandas as pd
# Define a list of lists
data=[["Aditya", 21, 181],
["Joel", 30, 176],
["Sam", 23, 156],
["Tina", 26, 155],
["Chris", 32, 167]]
# Define a list of column names
column_names=["Name","Age","Height"]
# Pass the list of lists to the DataFrame() function
df=pd.DataFrame(data, columns=column_names)
print("The dataframe is:")
print(df)
Output:
The dataframe is:
Name Age Height
0 Aditya 21 181
1 Joel 30 176
2 Sam 23 156
3 Tina 26 155
4 Chris 32 167
In this example, we passed a list containing five inner lists. Hence, the output dataframe has five rows. The output dataframe contains three columns, as each list in the input list of lists contains three elements.
If we don’t pass the column names to the DataFrame()
function, it automatically assigns numbers starting from 0 as column names. For instance, if the output dataframe contains three columns, they will be named 0, 1, and 2.
import pandas as pd
# Define a list of Lists
data=[["Aditya", 21, 181],
["Joel", 30, 176],
["Sam", 23, 156],
["Tina", 26, 155],
["Chris", 32, 167]]
# Create Dataframe
df=pd.DataFrame(data)
print("The dataframe is:")
print(df)
Output:
The dataframe is:
0 1 2
0 Aditya 21 181
1 Joel 30 176
2 Sam 23 156
3 Tina 26 155
4 Chris 32 167
In this example, we didn’t specify the column names in the DataFrame()
function. Hence, we get 0, 1, and 2 as column names.
While creating a dataframe with column names, the number of column names given to the DataFrame()
function must be equal to or greater than the longest list in the input list of lists. If any input list contains more elements than the column names provided to the DataFrame()
function, the program will run into a ValueError exception.
import pandas as pd
#Define a list of lists
data=[["Aditya", 21, 181],
["Joel", 30, 176],
["Sam", 23, 156],
["Tina", 26, 155],
["Chris", 32, 167, 70]]
#Define column names
column_names=["Name","Age","Height"]
# Create dataframe
df=pd.DataFrame(data, columns=column_names)
print("The dataframe is:")
print(df)
Output:
ValueError: 3 columns passed, passed data had 4 columns
If some of the lists in the input data contain fewer elements than the number of columns passed to the DataFrame()
function, we get a NaN value in the extra columns for the rows created by such lists.
import pandas as pd
# Define a list of lists
data=[["Aditya", 21, 181],
["Joel", 30, 176],
["Sam", 23, 156],
["Tina", 26, 155],
["Chris", 32, 167, 70]]
# Define column names
column_names=["Name","Age","Height", "Weight"]
# Create dataframe
df=pd.DataFrame(data, columns=column_names)
print("The dataframe is:")
print(df)
Output:
The dataframe is:
Name Age Height Weight
0 Aditya 21 181 NaN
1 Joel 30 176 NaN
2 Sam 23 156 NaN
3 Tina 26 155 NaN
4 Chris 32 167 70.0
We passed four column names to the DataFrame()
function in this example. Hence, the rows corresponding to the lists with three elements in the input data contain the value NaN in the last column.
List of Dictionaries to Dataframe in Python
To create a dataframe from a list of lists, you must know the number of columns and the column names in the output dataframe. If you don’t know the maximum number of columns in the input data, your program will run into error. Using a list of dictionaries to create a pandas dataframe doesn’t require such information.
When we create a pandas dataframe using a list of dictionaries, the column names in the output dataframe are created using the keys in the dictionaries. The values in each dictionary of the input data constitute a row of the output dataframe. You can observe this in the following example.
import pandas as pd
# Define a list of dictionaries
data=[{"Name":"Aditya","Age": 21,"Height": 181},
{"Name":"Joel","Age": 30,"Height": 176},
{"Name":"Sam","Age": 23,"Height": 156},
{"Name":"Tina","Age": 26,"Height": 155},
{"Name":"Chris","Age": 32,"Height": 167}]
# Create a dataframe
df=pd.DataFrame(data)
print("The dataframe is:")
print(df)
Output:
The dataframe is:
Name Age Height
0 Aditya 21 181
1 Joel 30 176
2 Sam 23 156
3 Tina 26 155
4 Chris 32 167
In this example, we passed a list containing five dictionaries to the DataFrame()
function. Hence, we get a dataframe containing five rows. All the dictionaries in the input data have the keys "Name"
, "Age"
, and "Height"
. Hence, the output dataframe contains three columns, i.e. "Name"
, "Age"
, and "Height"
.
If the dictionaries in the input data contain different keys, the output dataframe contains a union of all the keys as its columns. As the data from each dictionary is converted into a row, the values in a given row column will contain NaN values if the input dictionary for the row doesn’t have the column name as its key. You can observe this in the following example:
import pandas as pd
# Define a list of dictionaries
data=[{"Name":"Aditya","Age": 21,"Weight": 75},
{"Name":"Joel","Height": 176},
{"Name":"Sam","Age": 23,"Height": 156},
{"Name":"Tina","Age": 26,"Height": 155},
{"Name":"Chris","Age": 32,"Height": 167, "Weight":70}]
#Create a dataframe
df=pd.DataFrame(data)
print("The dataframe is:")
print(df)
Output:
The dataframe is:
Name Age Weight Height
0 Aditya 21.0 75.0 NaN
1 Joel NaN NaN 176.0
2 Sam 23.0 NaN 156.0
3 Tina 26.0 NaN 155.0
4 Chris 32.0 70.0 167.0
In this example, the dictionaries in the input data contain different keys. Hence, the column names in the output dataframe are the union of the keys in all the dictionaries. If a dictionary doesn’t have a key, its corresponding row contains NaN values in the columns for the particular keys.
Import CSV File Into Pandas Dataframe in Python
We can also import data from CSV or Excel files into a pandas dataframe. To create a pandas dataframe from a CSV file, we use the read_csv()
function. The read_csv()
function takes the CSV file name as its input argument and returns a pandas dataframe containing data in the file.
import pandas as pd
# Read csv file into a dataframe
df=pd.read_csv("input_file.csv")
print("The dataframe is:")
print(df)
In the above code, we import the following CSV file.
The output of the code looks as follows:
The dataframe is:
Name Age Height Weight
0 Aditya 21 181 76
1 Joel 30 176 64
2 Sam 23 156 66
3 Tina 26 155 56
4 Chris 32 167 72
Due to the nature of data, CSV files don’t always contain a comma as the separator. For example, the following file contains the | character as its separator, whereas the data in the file contains the comma character.
If you read the above file, the program will run into a ParserError
exception, as shown below:
import pandas as pd
# Read csv file into a dataframe
df=pd.read_csv("data_with_sep.csv")
print("The dataframe is:")
print(df)
Output:
ParserError: Error tokenizing data. C error: Expected 3 fields in line 6, saw 4
To avoid the error, you can specify the column separator explicitly using the sep
parameter in the read_csv()
function. After this, you can read a CSV file containing any separator into a pandas dataframe, as shown in the following example.
import pandas as pd
df=pd.read_csv("data_with_sep.csv",sep="|")
print("The dataframe is:")
print(df)
Output:
The dataframe is:
Name Roll Subjects
0 Aditya 21 Maths,Physics,Chemistry
1 Joel 30 Biology,Physics,Chemistry
2 Sam 23 History,Geography,Civics
3 Tina 26 Maths,Physics,Chemistry
4 Chris 32 Biology,Maths,Physics,Chemistry
In this example, we passed the |
character to the sep
parameter in the read_csv()
method. Hence, the program runs successfully. In the output dataframe, you can observe that the Subjects
column contains the comma character in its values.
Import Excel Data Into Pandas Dataframe
To create a pandas dataframe using an Excel file, you can use the read_excel()
function, as shown below.
import pandas as pd
df=pd.read_excel("input_file.xlsx")
print("The dataframe is:")
print(df)
In the above code, we used the following Excel file.
The output of the code looks as follows.
The dataframe is:
Name Age Height Weight
0 Aditya 21 181 76
1 Joel 30 176 64
2 Sam 23 156 66
3 Tina 26 155 56
4 Chris 32 167 72
The read_excel()
function reads only the first sheet of the Excel file into the dataframe. To learn more about how to read Excel files into pandas dataframes, you can read this tutorial on reading Excel data in Python.
Select One or More Rows From a Pandas Dataframe
We can use the iloc
attribute and the indexing operator to select a row at a given position. The iloc
attribute contains an _iLocIndexer
object that you can use to select rows from a pandas dataframe by position, as shown below:
import pandas as pd
# Create a list of dictionaries
data=[{"Name":"Aditya","Age": 21,"Height": 181, "Weight":76},
{"Name":"Joel","Age": 30,"Height": 176, "Weight":64},
{"Name":"Sam","Age": 23,"Height": 156, "Weight":66},
{"Name":"Tina","Age": 26,"Height": 155, "Weight":56},
{"Name":"Chris","Age": 32,"Height": 167, "Weight":72}]
# Create a dataframe
df=pd.DataFrame(data)
print("The dataframe is:")
print(df)
# Select a row
row=df.iloc[2]
print("The selected row is:")
print(row)
Output:
The dataframe is:
Name Age Height Weight
0 Aditya 21 181 76
1 Joel 30 176 64
2 Sam 23 156 66
3 Tina 26 155 56
4 Chris 32 167 72
The selected row is:
Name Sam
Age 23
Height 156
Weight 66
Name: 2, dtype: object
In the above example, we used the iloc
attribute to select the row at position 2. Hence, we get the third row from the dataframe in the row variable.
You can also select rows at multiple positions using the iloc
attribute. For this, you can pass the list of positions to the indexing operator with the iloc
attribute. For example, we can select the rows at positions 1, 2, and 4 from a given dataframe, as shown below.
import pandas as pd
# Create a list of dictionaries
data=[{"Name":"Aditya","Age": 21,"Height": 181, "Weight":76},
{"Name":"Joel","Age": 30,"Height": 176, "Weight":64},
{"Name":"Sam","Age": 23,"Height": 156, "Weight":66},
{"Name":"Tina","Age": 26,"Height": 155, "Weight":56},
{"Name":"Chris","Age": 32,"Height": 167, "Weight":72}]
# Create a dataframe
df=pd.DataFrame(data)
print("The dataframe is:")
print(df)
# Select rows at position 1, 2, and 4
rows=df.iloc[[1,2,4]]
print("The selected rows are:")
print(rows)
Output:
The dataframe is:
Name Age Height Weight
0 Aditya 21 181 76
1 Joel 30 176 64
2 Sam 23 156 66
3 Tina 26 155 56
4 Chris 32 167 72
The selected rows are:
Name Age Height Weight
1 Joel 30 176 64
2 Sam 23 156 66
4 Chris 32 167 72
Instead of specific positions, you can select rows in a given range using the iloc
attribute. For this, you can pass the starting and final positions of the rows that you want to select from the dataframe using the following syntax.
rows=df.iloc[[starting_position:final_position]]
After executing the above statement, the rows from starting_position
to final_position-1
position are assigned to the rows variable. You can observe this in the following example.
import pandas as pd
# Create a list of dictionaries
data=[{"Name":"Aditya","Age": 21,"Height": 181, "Weight":76},
{"Name":"Joel","Age": 30,"Height": 176, "Weight":64},
{"Name":"Sam","Age": 23,"Height": 156, "Weight":66},
{"Name":"Tina","Age": 26,"Height": 155, "Weight":56},
{"Name":"Chris","Age": 32,"Height": 167, "Weight":72}]
# Create a dataframe
df=pd.DataFrame(data)
print("The dataframe is:")
print(df)
# Select rows from position 1 to 4
rows=df.iloc[1:4]
print("The selected rows are:")
print(rows)
Output:
The dataframe is:
Name Age Height Weight
0 Aditya 21 181 76
1 Joel 30 176 64
2 Sam 23 156 66
3 Tina 26 155 56
4 Chris 32 167 72
The selected rows are:
Name Age Height Weight
1 Joel 30 176 64
2 Sam 23 156 66
3 Tina 26 155 56
Select One or More Columns From a Dataframe in Python
To select a column from a dataframe, you can use the indexing operator with the column name as shown below.
import pandas as pd
# Create a list of dictionaries
data=[{"Name":"Aditya","Age": 21,"Height": 181, "Weight":76},
{"Name":"Joel","Age": 30,"Height": 176, "Weight":64},
{"Name":"Sam","Age": 23,"Height": 156, "Weight":66},
{"Name":"Tina","Age": 26,"Height": 155, "Weight":56},
{"Name":"Chris","Age": 32,"Height": 167, "Weight":72}]
# Create a dataframe
df=pd.DataFrame(data)
print("The dataframe is:")
print(df)
# Select a column
column=df["Name"]
print("The selected column is:")
print(column)
Output:
The dataframe is:
Name Age Height Weight
0 Aditya 21 181 76
1 Joel 30 176 64
2 Sam 23 156 66
3 Tina 26 155 56
4 Chris 32 167 72
The selected column is:
0 Aditya
1 Joel
2 Sam
3 Tina
4 Chris
Name: Name, dtype: object
To select multiple columns from a pandas dataframe, you can pass the list of column names to the indexing operator.
import pandas as pd
# Create a list of dictionaries
data=[{"Name":"Aditya","Age": 21,"Height": 181, "Weight":76},
{"Name":"Joel","Age": 30,"Height": 176, "Weight":64},
{"Name":"Sam","Age": 23,"Height": 156, "Weight":66},
{"Name":"Tina","Age": 26,"Height": 155, "Weight":56},
{"Name":"Chris","Age": 32,"Height": 167, "Weight":72}]
# Create a dataframe
df=pd.DataFrame(data)
print("The dataframe is:")
print(df)
# Select columns
columns=df[["Name", "Height"]]
print("The selected columns are:")
print(columns)
Output:
The dataframe is:
Name Age Height Weight
0 Aditya 21 181 76
1 Joel 30 176 64
2 Sam 23 156 66
3 Tina 26 155 56
4 Chris 32 167 72
The selected columns are:
Name Height
0 Aditya 181
1 Joel 176
2 Sam 156
3 Tina 155
4 Chris 167
Select a Value From a Pandas Dataframe
To select a value in a specific cell of a given dataframe, you must first select the required column. Then, you can select the value at the given row in the selected column using the iloc
attribute.
import pandas as pd
# Create a list of dictionaries
data=[{"Name":"Aditya","Age": 21,"Height": 181, "Weight":76},
{"Name":"Joel","Age": 30,"Height": 176, "Weight":64},
{"Name":"Sam","Age": 23,"Height": 156, "Weight":66},
{"Name":"Tina","Age": 26,"Height": 155, "Weight":56},
{"Name":"Chris","Age": 32,"Height": 167, "Weight":72}]
# Create a dataframe
df=pd.DataFrame(data)
print("The dataframe is:")
print(df)
# Select a value
column=df["Name"].iloc[3]
print("The selected value is:")
print(column)
Output:
The dataframe is:
Name Age Height Weight
0 Aditya 21 181 76
1 Joel 30 176 64
2 Sam 23 156 66
3 Tina 26 155 56
4 Chris 32 167 72
The selected value is:
Tina
Alternatively, you can first select the row from which you want to read the data. Then, you can select the value in a particular column of the select row as shown below:
import pandas as pd
# Create a list of dictionaries
data=[{"Name":"Aditya","Age": 21,"Height": 181, "Weight":76},
{"Name":"Joel","Age": 30,"Height": 176, "Weight":64},
{"Name":"Sam","Age": 23,"Height": 156, "Weight":66},
{"Name":"Tina","Age": 26,"Height": 155, "Weight":56},
{"Name":"Chris","Age": 32,"Height": 167, "Weight":72}]
# Create a dataframe
df=pd.DataFrame(data)
print("The dataframe is:")
print(df)
# Select a value
column=df.iloc[3]["Name"]
print("The selected value is:")
print(column)
Output:
The dataframe is:
Name Age Height Weight
0 Aditya 21 181 76
1 Joel 30 176 64
2 Sam 23 156 66
3 Tina 26 155 56
4 Chris 32 167 72
The selected value is:
Tina
Add Rows to a Pandas Dataframe in Python
To add a new row to a pandas dataframe, we use the concat()
function defined in the pandas module. The concat()
function has the following syntax:
concat(objs,ignore_index=False, axis=0, join='outer')
Here,
- The
objs
parameter takes a list of dataframes that we want to concatenate. - By default, the indices of the input dataframes are carried into the output dataframe. If you want to reset the indices in the output dataframe, you can set the
ignore_index
parameter toTrue
. - The
axis
parameter decides whether to concatenate the dataframes horizontally or vertically. By default, theaxis
parameter is set to 0. Hence, the input dataframes given to theobjs
parameter are concatenated vertically. To concatenate the dataframes horizontally, you can set the axis parameter to 1. - The
join
parameter is used only when theaxis
parameter is set to 1, i.e., when we concatenate the dataframes horizontally. Here, thejoin
parameter takes values such as"left"
,"right"
,"inner"
, and"outer"
to decide on SQL joins in merging dataframes passed to theobjs
parameter based on one or more columns.
After execution, the concat()
function returns a new dataframe containing data from all the input dataframes.
Add a Single Row to a Pandas Dataframe
To add a new row to a pandas dataframe, we first create a new dataframe using the new data and the DataFrame()
function. Next, we use the concat()
function to concatenate the existing dataframe and the dataframe containing the new row. After this, we get the concatenated dataframe as shown below:
import pandas as pd
# Create a list of dictionaries
data=[{"Name":"Aditya","Age": 21,"Height": 181},
{"Name":"Joel","Age": 30,"Height": 176},
{"Name":"Sam","Age": 23,"Height": 156},
{"Name":"Tina","Age": 26,"Height": 155},
{"Name":"Chris","Age": 32,"Height": 167}]
# Create a dataframe
existing_df=pd.DataFrame(data)
print("The original dataframe is:")
print(existing_df)
# Define a new list of list to create a row
new_row=[["Joe", 80, 180]]
# Create a dataframe with a single row
new_df=pd.DataFrame(new_row, columns=["Name", "Age", "Height"])
# Concatenate the new dataframe to the previous one
df=pd.concat([existing_df,new_df])
print("The updated dataframe is:")
print(df)
Output:
The original dataframe is:
Name Age Height
0 Aditya 21 181
1 Joel 30 176
2 Sam 23 156
3 Tina 26 155
4 Chris 32 167
The updated dataframe is:
Name Age Height
0 Aditya 21 181
1 Joel 30 176
2 Sam 23 156
3 Tina 26 155
4 Chris 32 167
0 Joe 80 180
In the above example, the last row in the output dataframe contains the index 0. This is because the indices of the input dataframes are also concatenated by the concat()
function. To reset the index in the output dataframe, you can set the ignore_index
parameter to True
, as shown below:
import pandas as pd
# Create a list of dictionaries
data=[{"Name":"Aditya","Age": 21,"Height": 181},
{"Name":"Joel","Age": 30,"Height": 176},
{"Name":"Sam","Age": 23,"Height": 156},
{"Name":"Tina","Age": 26,"Height": 155},
{"Name":"Chris","Age": 32,"Height": 167}]
# Create a dataframe
existing_df=pd.DataFrame(data)
print("The original dataframe is:")
print(existing_df)
# Create a list of lists for new row
new_row=[["Joe", 80, 180]]
# Create a dataframe with a single row
new_df=pd.DataFrame(new_row, columns=["Name", "Age", "Height"])
# Concatenate the new dataframe to the original dataframe
df=pd.concat([existing_df,new_df], ignore_index=True)
print("The updated dataframe is:")
print(df)
Output:
The original dataframe is:
Name Age Height
0 Aditya 21 181
1 Joel 30 176
2 Sam 23 156
3 Tina 26 155
4 Chris 32 167
The updated dataframe is:
Name Age Height
0 Aditya 21 181
1 Joel 30 176
2 Sam 23 156
3 Tina 26 155
4 Chris 32 167
5 Joe 80 180
Instead of using lists to create the new dataframe for inserting rows, you can also use dictionaries, as shown in the following example:
import pandas as pd
# Create a list of dictionaries for original dataframe
data=[{"Name":"Aditya","Age": 21,"Height": 181},
{"Name":"Joel","Age": 30,"Height": 176},
{"Name":"Sam","Age": 23,"Height": 156},
{"Name":"Tina","Age": 26,"Height": 155},
{"Name":"Chris","Age": 32,"Height": 167}]
# Create the original dataframe
existing_df=pd.DataFrame(data)
print("The dataframe is:")
print(existing_df)
# Create a list of dictionaries for new row
new_row=[{"Name":"Joe","Age": 80,"Height": 180}]
# Create a dataframe with a single row
new_df=pd.DataFrame(new_row)
# Concatenate the new and original dataframes
df=pd.concat([existing_df,new_df], ignore_index=True)
print("The updated dataframe is:")
print(df)
Output:
The dataframe is:
Name Age Height
0 Aditya 21 181
1 Joel 30 176
2 Sam 23 156
3 Tina 26 155
4 Chris 32 167
The updated dataframe is:
Name Age Height
0 Aditya 21 181
1 Joel 30 176
2 Sam 23 156
3 Tina 26 155
4 Chris 32 167
5 Joe 80 180
We passed the list [existing_df,new_df]
to the concat()
function in the previous examples. Hence, the new row is added to the bottom of the existing dataframe.
If you want to add a new row to the top of the existing dataframe, you can reverse the order of the dataframes in the list to [new_df, existing_df]
as shown below:
import pandas as pd
# Create a list of dictionaries for original dataframe
data=[{"Name":"Aditya","Age": 21,"Height": 181},
{"Name":"Joel","Age": 30,"Height": 176},
{"Name":"Sam","Age": 23,"Height": 156},
{"Name":"Tina","Age": 26,"Height": 155},
{"Name":"Chris","Age": 32,"Height": 167}]
# Create the original dataframe
existing_df=pd.DataFrame(data)
print("The dataframe is:")
print(existing_df)
# Create a list of dictionaries for new row
new_row=[{"Name":"Joe","Age": 80,"Height": 180}]
# Create a dataframe with a single row
new_df=pd.DataFrame(new_row)
# Concatenate the new and original dataframes
df=pd.concat([new_df, existing_df], ignore_index=True)
print("The updated dataframe is:")
print(df)
Output:
The dataframe is:
Name Age Height
0 Aditya 21 181
1 Joel 30 176
2 Sam 23 156
3 Tina 26 155
4 Chris 32 167
The updated dataframe is:
Name Age Height
0 Joe 80 180
1 Aditya 21 181
2 Joel 30 176
3 Sam 23 156
4 Tina 26 155
5 Chris 32 167
Add Multiple Rows to a Pandas Dataframe
To add multiple rows to a dataframe, you can create a dataframe using the new rows and add it to the existing dataframe as shown below:
import pandas as pd
# Create list of dictionaries for original dataframe
data=[{"Name":"Aditya","Age": 21,"Height": 181},
{"Name":"Joel","Age": 30,"Height": 176},
{"Name":"Sam","Age": 23,"Height": 156},
{"Name":"Tina","Age": 26,"Height": 155},
{"Name":"Chris","Age": 32,"Height": 167}]
# Create the original dataframe
existing_df=pd.DataFrame(data)
print("The dataframe is:")
print(existing_df)
# Create a list of lists
new_rows=[["Joe", 80, 180],
["Samantha", 24, 165]]
# Create a dataframe with multiple rows
new_df=pd.DataFrame(new_rows, columns=["Name", "Age", "Height"])
# Concatenate the new dataframe to the original dataframe
df=pd.concat([existing_df,new_df], ignore_index=True)
print("The updated dataframe is:")
print(df)
Output:
The dataframe is:
Name Age Height
0 Aditya 21 181
1 Joel 30 176
2 Sam 23 156
3 Tina 26 155
4 Chris 32 167
The updated dataframe is:
Name Age Height
0 Aditya 21 181
1 Joel 30 176
2 Sam 23 156
3 Tina 26 155
4 Chris 32 167
5 Joe 80 180
6 Samantha 24 165
Instead of a list of lists, you can use the list of dictionaries to add multiple rows to an existing pandas dataframe.
import pandas as pd
# Create list of dictionaries for original dataframe
data=[{"Name":"Aditya","Age": 21,"Height": 181},
{"Name":"Joel","Age": 30,"Height": 176},
{"Name":"Sam","Age": 23,"Height": 156},
{"Name":"Tina","Age": 26,"Height": 155},
{"Name":"Chris","Age": 32,"Height": 167}]
# Create the original dataframe
existing_df=pd.DataFrame(data)
print("The dataframe is:")
print(existing_df)
# Create a list of dictionaries
new_rows=[{"Name":"Joe","Age": 80,"Height": 180},
{"Name":"Samantha","Age": 24,"Height": 165}]
# Create a dataframe with multiple rows
new_df=pd.DataFrame(new_rows, columns=["Name", "Age", "Height"])
# Concatenate the new dataframe to the original dataframe
df=pd.concat([existing_df,new_df], ignore_index=True)
print("The updated dataframe is:")
print(df)
Output:
The dataframe is:
Name Age Height
0 Aditya 21 181
1 Joel 30 176
2 Sam 23 156
3 Tina 26 155
4 Chris 32 167
The updated dataframe is:
Name Age Height
0 Aditya 21 181
1 Joel 30 176
2 Sam 23 156
3 Tina 26 155
4 Chris 32 167
5 Joe 80 180
6 Samantha 24 165
Again, you can reverse the order of values in the list passed to the concat()
function to add new rows to the top of the existing dataframe.
import pandas as pd
# Create list of dictionaries for original dataframe
data=[{"Name":"Aditya","Age": 21,"Height": 181},
{"Name":"Joel","Age": 30,"Height": 176},
{"Name":"Sam","Age": 23,"Height": 156},
{"Name":"Tina","Age": 26,"Height": 155},
{"Name":"Chris","Age": 32,"Height": 167}]
# Create the original dataframe
existing_df=pd.DataFrame(data)
print("The dataframe is:")
print(existing_df)
# Create a list of dictionaries
new_rows=[{"Name":"Joe","Age": 80,"Height": 180},
{"Name":"Samantha","Age": 24,"Height": 165}]
# Create a dataframe with multiple rows
new_df=pd.DataFrame(new_rows, columns=["Name", "Age", "Height"])
# Concatenate the new dataframe to the original dataframe
df=pd.concat([new_df, existing_df], ignore_index=True)
print("The updated dataframe is:")
print(df)
Output:
The dataframe is:
Name Age Height
0 Aditya 21 181
1 Joel 30 176
2 Sam 23 156
3 Tina 26 155
4 Chris 32 167
The updated dataframe is:
Name Age Height
0 Joe 80 180
1 Samantha 24 165
2 Aditya 21 181
3 Joel 30 176
4 Sam 23 156
5 Tina 26 155
6 Chris 32 167
Add Columns to a Pandas Dataframe
To add a new column to a dataframe, you can create a list of values in the new column. Then, you can add the list as a column to the dataframe using the indexing operator as shown below:
import pandas as pd
# Create a list of dictionaries
data=[{"Name":"Aditya","Age": 21,"Height": 181},
{"Name":"Joel","Age": 30,"Height": 176},
{"Name":"Sam","Age": 23,"Height": 156},
{"Name":"Tina","Age": 26,"Height": 155},
{"Name":"Chris","Age": 32,"Height": 167}]
# Create the original dataframe
df=pd.DataFrame(data)
print("The original dataframe is:")
print(df)
# Create a new column using column name and a list
df["Weight"]=[76, 64, 66, 56, 72]
print("The updated dataframe is:")
print(df)
Output:
The original dataframe is:
Name Age Height
0 Aditya 21 181
1 Joel 30 176
2 Sam 23 156
3 Tina 26 155
4 Chris 32 167
The updated dataframe is:
Name Age Height Weight
0 Aditya 21 181 76
1 Joel 30 176 64
2 Sam 23 156 66
3 Tina 26 155 56
4 Chris 32 167 72
While adding a list as a new column to a pandas dataframe, you must ensure that the list’s length must equal the number of existing rows in the dataframe. Otherwise, the program will run into a ValueError
exception. For instance, consider the following example.
import pandas as pd
# Create a list of dictionaries
data=[{"Name":"Aditya","Age": 21,"Height": 181},
{"Name":"Joel","Age": 30,"Height": 176},
{"Name":"Sam","Age": 23,"Height": 156},
{"Name":"Tina","Age": 26,"Height": 155},
{"Name":"Chris","Age": 32,"Height": 167}]
#Create the original dataframe
df=pd.DataFrame(data)
print("The dataframe is:")
print(df)
# Add a new column with fewer values than rows
df["Weight"]=[76, 64, 66, 56]
print("The updated dataframe is:")
print(df)
Output:
The dataframe is:
Name Age Height
0 Aditya 21 181
1 Joel 30 176
2 Sam 23 156
3 Tina 26 155
4 Chris 32 167
ValueError: Length of values (4) does not match length of index (5)
In the above example, we tried to assign a list containing four elements to a dataframe containing five rows. Hence, the program runs into error. Similarly, if we pass a list containing more elements than the number of rows in the dataframe, the program will run into an error.
import pandas as pd
# Create a list of dictionaries
data=[{"Name":"Aditya","Age": 21,"Height": 181},
{"Name":"Joel","Age": 30,"Height": 176},
{"Name":"Sam","Age": 23,"Height": 156},
{"Name":"Tina","Age": 26,"Height": 155},
{"Name":"Chris","Age": 32,"Height": 167}]
#Create the original dataframe
df=pd.DataFrame(data)
print("The dataframe is:")
print(df)
# Add a new column with more values than rows
df["Weight"]=[76, 64, 66, 56, 72, 85]
print("The updated dataframe is:")
print(df)
Output:
The dataframe is:
Name Age Height
0 Aditya 21 181
1 Joel 30 176
2 Sam 23 156
3 Tina 26 155
4 Chris 32 167
ValueError: Length of values (6) does not match length of index (5)
You can also use one or more existing dataframe columns to create a new column. For this, you can perform arithmetic operations on existing columns, as shown in the following example:
import pandas as pd
# Create a list of dictionaries
data=[{"Name":"Aditya","Age": 21,"Height": 181, "Weight":76},
{"Name":"Joel","Age": 30,"Height": 176, "Weight":64},
{"Name":"Sam","Age": 23,"Height": 156, "Weight":66},
{"Name":"Tina","Age": 26,"Height": 155, "Weight":56},
{"Name":"Chris","Age": 32,"Height": 167, "Weight":72}]
# Create the original dataframe
df=pd.DataFrame(data)
print("The original dataframe is:")
print(df)
# Create a column using an existing column
df["Height_in_m"]=df["Height"]/100
#Create a column using two columns
df["BMI"]=df["Weight"]/df["Height_in_m"]**2
print("The updated dataframe is:")
print(df)
Output:
The original dataframe is:
Name Age Height Weight
0 Aditya 21 181 76
1 Joel 30 176 64
2 Sam 23 156 66
3 Tina 26 155 56
4 Chris 32 167 72
The updated dataframe is:
Name Age Height Weight Height_in_m BMI
0 Aditya 21 181 76 1.81 23.198315
1 Joel 30 176 64 1.76 20.661157
2 Sam 23 156 66 1.56 27.120316
3 Tina 26 155 56 1.55 23.309053
4 Chris 32 167 72 1.67 25.816630
Iterate Through Rows of a Pandas Dataframe
To iterate through the rows of a pandas dataframe, you can use the iterrows()
function. The iterrows()
function, when invoked on a dataframe, returns an iterator object. The iterator contains tuples containing the index and the corresponding row that we can read using a for loop, as shown below:
import pandas as pd
# Create a list of dictionaries
data=[{"Name":"Aditya","Age": 21,"Height": 181, "Weight":76},
{"Name":"Joel","Age": 30,"Height": 176, "Weight":64},
{"Name":"Sam","Age": 23,"Height": 156, "Weight":66}]
# Create the dataframe
df=pd.DataFrame(data)
# Create an iterator
row_iterator=df.iterrows()
# Use for loop to iterate through the rows
for index, row in row_iterator:
print("The index is:",index)
print(row)
Output:
The index is: 0
Name Aditya
Age 21
Height 181
Weight 76
Name: 0, dtype: object
The index is: 1
Name Joel
Age 30
Height 176
Weight 64
Name: 1, dtype: object
The index is: 2
Name Sam
Age 23
Height 156
Weight 66
Name: 2, dtype: object
Drop Rows or Columns From a Pandas Dataframe
To drop a row or a column from a pandas dataframe, we use the drop()
method. The drop()
method has the following syntax.
df.drop(labels, axis=0, inplace=False)
Here,
- The
labels
parameter takes a list of indices or column names we want to drop from the dataframedf
. - We use the
axis
parameter to decide if we need to drop the rows or columns of the dataframe. By default, theaxis
parameter is set to 0. Hence, the rows of the dataframe are dropped. To drop dataframe columns, you can set theaxis
parameter to 1. - The
inplace
parameter is used to decide if we want to modify the existing dataframe or create a new dataframe by dropping a row or column from the existing dataframe. By default, theinplace
parameter is set toFalse
. Hence, the original dataframe is not modified, and a new dataframe is created after dropping rows or columns. To alter the existing dataframe, you can set theinplace
parameter toTrue
.
After execution, the drop()
method returns the modified dataframe if the inplace
parameter is False
. Otherwise, it returns None
.
Drop One or More Rows From a Dataframe in Python
To drop a row from a pandas dataframe, you can use the drop()
method and the axis
parameter, as shown below.
import pandas as pd
# Create a list of dictionaries
data=[{"Name":"Aditya","Age": 21,"Height": 181, "Weight":76},
{"Name":"Joel","Age": 30,"Height": 176, "Weight":64},
{"Name":"Sam","Age": 23,"Height": 156, "Weight":66},
{"Name":"Tina","Age": 26,"Height": 155, "Weight":56},
{"Name":"Chris","Age": 32,"Height": 167, "Weight":72}]
# Create the original dataframe
df=pd.DataFrame(data)
print("The original dataframe is:")
print(df)
# Drop the row at index 0
new_df=df.drop([0],axis=0)
print("The modified dataframe is:")
print(new_df)
Output:
The original dataframe is:
Name Age Height Weight
0 Aditya 21 181 76
1 Joel 30 176 64
2 Sam 23 156 66
3 Tina 26 155 56
4 Chris 32 167 72
The modified dataframe is:
Name Age Height Weight
1 Joel 30 176 64
2 Sam 23 156 66
3 Tina 26 155 56
4 Chris 32 167 72
In the above example, we passed the list [0]
as the first input argument to the drop()
method. This axis
parameter is also set to 0. Hence, the row at index 0 is removed from the original dataframe, and the drop()
method returns a new dataframe. Here, the original dataframe hasn’t been modified.
If you want to modify the input dataframe instead of creating a new dataframe, you can set the inplace
parameter to True
as shown below:
import pandas as pd
# Create a list of dictionaries
data=[{"Name":"Aditya","Age": 21,"Height": 181, "Weight":76},
{"Name":"Joel","Age": 30,"Height": 176, "Weight":64},
{"Name":"Sam","Age": 23,"Height": 156, "Weight":66},
{"Name":"Tina","Age": 26,"Height": 155, "Weight":56},
{"Name":"Chris","Age": 32,"Height": 167, "Weight":72}]
#Create the original dataframe
df=pd.DataFrame(data)
print("The original dataframe is:")
print(df)
# Drop row from the dataframe inplace
df.drop([0],axis=0, inplace=True)
print("The modified dataframe is:")
print(df)
Output:
The original dataframe is:
Name Age Height Weight
0 Aditya 21 181 76
1 Joel 30 176 64
2 Sam 23 156 66
3 Tina 26 155 56
4 Chris 32 167 72
The modified dataframe is:
Name Age Height Weight
1 Joel 30 176 64
2 Sam 23 156 66
3 Tina 26 155 56
4 Chris 32 167 72
To delete multiple rows from a pandas dataframe, you can pass a list containing all the indices you want to drop as input to the drop()
method, as shown below.
import pandas as pd
# Create a list of dictionaries
data=[{"Name":"Aditya","Age": 21,"Height": 181, "Weight":76},
{"Name":"Joel","Age": 30,"Height": 176, "Weight":64},
{"Name":"Sam","Age": 23,"Height": 156, "Weight":66},
{"Name":"Tina","Age": 26,"Height": 155, "Weight":56},
{"Name":"Chris","Age": 32,"Height": 167, "Weight":72}]
# Create the original dataframe
df=pd.DataFrame(data)
print("The original dataframe is:")
print(df)
# Drop multiple rows from the dataframe
df.drop([0, 2, 3],axis=0, inplace=True)
print("The modified dataframe is:")
print(df)
Output:
The original dataframe is:
Name Age Height Weight
0 Aditya 21 181 76
1 Joel 30 176 64
2 Sam 23 156 66
3 Tina 26 155 56
4 Chris 32 167 72
The modified dataframe is:
Name Age Height Weight
1 Joel 30 176 64
4 Chris 32 167 72
In the above example, we passed the list [0, 2, 3]
to the drop()
method as input. Hence, the rows at indices 0, 2, and 3 are dropped from the input dataframe.
Drop One or More Columns From a Pandas Dataframe
To drop a column from a pandas dataframe, you can pass the column name in a list to the drop()
method as the first input argument and set the axis
parameter to 1. After execution, the drop()
method returns a new dataframe by removing the specified column from the input dataframe.
import pandas as pd
# Create a list of dictionaries
data=[{"Name":"Aditya","Age": 21,"Height": 181, "Weight":76},
{"Name":"Joel","Age": 30,"Height": 176, "Weight":64},
{"Name":"Sam","Age": 23,"Height": 156, "Weight":66},
{"Name":"Tina","Age": 26,"Height": 155, "Weight":56},
{"Name":"Chris","Age": 32,"Height": 167, "Weight":72}]
# Create the original dataframe
df=pd.DataFrame(data)
print("The original dataframe is:")
print(df)
# Drop a row from the dataframe
df=df.drop(["Weight"],axis=1)
print("The modified dataframe is:")
print(df)
Output:
The original dataframe is:
Name Age Height Weight
0 Aditya 21 181 76
1 Joel 30 176 64
2 Sam 23 156 66
3 Tina 26 155 56
4 Chris 32 167 72
The modified dataframe is:
Name Age Height
0 Aditya 21 181
1 Joel 30 176
2 Sam 23 156
3 Tina 26 155
4 Chris 32 167
To drop a column inplace from a pandas dataframe, you can set the inplace
parameter to True
as shown below:
import pandas as pd
# Create a list of dictionaries
data=[{"Name":"Aditya","Age": 21,"Height": 181, "Weight":76},
{"Name":"Joel","Age": 30,"Height": 176, "Weight":64},
{"Name":"Sam","Age": 23,"Height": 156, "Weight":66},
{"Name":"Tina","Age": 26,"Height": 155, "Weight":56},
{"Name":"Chris","Age": 32,"Height": 167, "Weight":72}]
# Create the original dataframe
df=pd.DataFrame(data)
print("The original dataframe is:")
print(df)
# Drop a column inplace
df.drop(["Weight"],axis=1, inplace=True)
print("The modified dataframe is:")
print(df)
Output:
The original dataframe is:
Name Age Height Weight
0 Aditya 21 181 76
1 Joel 30 176 64
2 Sam 23 156 66
3 Tina 26 155 56
4 Chris 32 167 72
The modified dataframe is:
Name Age Height
0 Aditya 21 181
1 Joel 30 176
2 Sam 23 156
3 Tina 26 155
4 Chris 32 167
To drop multiple columns from a dataframe, you can pass all the column names you want to drop in an input list to the drop()
method as shown below:
import pandas as pd
# Create a list of dictionaries
data=[{"Name":"Aditya","Age": 21,"Height": 181, "Weight":76},
{"Name":"Joel","Age": 30,"Height": 176, "Weight":64},
{"Name":"Sam","Age": 23,"Height": 156, "Weight":66},
{"Name":"Tina","Age": 26,"Height": 155, "Weight":56},
{"Name":"Chris","Age": 32,"Height": 167, "Weight":72}]
# Create the original dataframe
df=pd.DataFrame(data)
print("The original dataframe is:")
print(df)
# Drop multiple rows
df.drop(["Weight", "Height"],axis=1, inplace=True)
print("The modified dataframe is:")
print(df)
Output:
The original dataframe is:
Name Age Height Weight
0 Aditya 21 181 76
1 Joel 30 176 64
2 Sam 23 156 66
3 Tina 26 155 56
4 Chris 32 167 72
The modified dataframe is:
Name Age
0 Aditya 21
1 Joel 30
2 Sam 23
3 Tina 26
4 Chris 32
Count Rows and Columns in a Dataframe
To find the number of rows in a pandas dataframe, we use the len()
function. The len()
function takes the dataframe as its input and returns the number of rows in the dataframe.
import pandas as pd
# Create a list of dictionaries
data=[{"Name":"Aditya","Age": 21,"Height": 181, "Weight":76},
{"Name":"Joel","Age": 30,"Height": 176, "Weight":64},
{"Name":"Sam","Age": 23,"Height": 156, "Weight":66},
{"Name":"Tina","Age": 26,"Height": 155, "Weight":56},
{"Name":"Chris","Age": 32,"Height": 167, "Weight":72}]
# Create the dataframe
df=pd.DataFrame(data)
print("The dataframe is:")
print(df)
# Count rows in the dataframe
row_count=len(df)
print("The number of rows in the dataframe is:", row_count)
Output:
The dataframe is:
Name Age Height Weight
0 Aditya 21 181 76
1 Joel 30 176 64
2 Sam 23 156 66
3 Tina 26 155 56
4 Chris 32 167 72
The number of rows in the dataframe is: 5
To find the number of columns in a pandas dataframe, we use the columns
attribute of the dataframe with the len()
function. The columns.values
attribute of a pandas dataframe contains a list of column names. We can find the length of this list to find the number of columns in a given dataframe as shown in the following example:
import pandas as pd
# Create a list of dictionaries
data=[{"Name":"Aditya","Age": 21,"Height": 181, "Weight":76},
{"Name":"Joel","Age": 30,"Height": 176, "Weight":64},
{"Name":"Sam","Age": 23,"Height": 156, "Weight":66},
{"Name":"Tina","Age": 26,"Height": 155, "Weight":56},
{"Name":"Chris","Age": 32,"Height": 167, "Weight":72}]
# Create the dataframe
df=pd.DataFrame(data)
print("The dataframe is:")
print(df)
# Get column names in a list
column_names=df.columns.values
# Find the number of columns
column_count=len(column_names)
print("The columns in the dataframe are:",column_names)
print("The number of columns in the dataframe is:", column_count)
Output:
The dataframe is:
Name Age Height Weight
0 Aditya 21 181 76
1 Joel 30 176 64
2 Sam 23 156 66
3 Tina 26 155 56
4 Chris 32 167 72
The columns in the dataframe are: ['Name' 'Age' 'Height' 'Weight']
The number of columns in the dataframe is: 4
Rename a Column in a Pandas Dataframe
To rename a column in a pandas dataframe, you can use the rename()
method. The rename()
method, when invoked on a pandas dataframe, takes a dictionary as its input argument. The dictionary should contain the existing column names we want to rename as its keys and the corresponding new ones as values. After execution, it returns a new dataframe with columns renamed according to the input dictionary.
import pandas as pd
# Create a list of dictionaries
data=[{"Name":"Aditya","Age": 21,"Height": 181, "Weight":76},
{"Name":"Joel","Age": 30,"Height": 176, "Weight":64},
{"Name":"Sam","Age": 23,"Height": 156, "Weight":66},
{"Name":"Tina","Age": 26,"Height": 155, "Weight":56},
{"Name":"Chris","Age": 32,"Height": 167, "Weight":72}]
# Create the original dataframe
df=pd.DataFrame(data)
print("The original dataframe is:")
print(df)
# Rename the Height column
df=df.rename(columns={"Height":"Height_in_cm"})
print("The updated dataframe is:")
print(df)
Output:
The original dataframe is:
Name Age Height Weight
0 Aditya 21 181 76
1 Joel 30 176 64
2 Sam 23 156 66
3 Tina 26 155 56
4 Chris 32 167 72
The updated dataframe is:
Name Age Height_in_cm Weight
0 Aditya 21 181 76
1 Joel 30 176 64
2 Sam 23 156 66
3 Tina 26 155 56
4 Chris 32 167 72
In the above example, we passed the dictionary {"Height": "Height_in_cm"}
to the rename()
method. Hence, The "Height"
column is renamed to "Height_in_cm"
. To rename multiple columns at once, you can pass the existing and new column names as key-value pairs in the dictionary, as shown below:
import pandas as pd
# Create a list of dictionaries
data=[{"Name":"Aditya","Age": 21,"Height": 181, "Weight":76},
{"Name":"Joel","Age": 30,"Height": 176, "Weight":64},
{"Name":"Sam","Age": 23,"Height": 156, "Weight":66},
{"Name":"Tina","Age": 26,"Height": 155, "Weight":56},
{"Name":"Chris","Age": 32,"Height": 167, "Weight":72}]
# Create the original dataframe
df=pd.DataFrame(data)
print("The original dataframe is:")
print(df)
# Rename multiple columns
df=df.rename(columns={"Height":"Height_in_cm","Weight":"Weight_in_kg"})
print("The updated dataframe is:")
print(df)
Output:
The original dataframe is:
Name Age Height Weight
0 Aditya 21 181 76
1 Joel 30 176 64
2 Sam 23 156 66
3 Tina 26 155 56
4 Chris 32 167 72
The updated dataframe is:
Name Age Height_in_cm Weight_in_kg
0 Aditya 21 181 76
1 Joel 30 176 64
2 Sam 23 156 66
3 Tina 26 155 56
4 Chris 32 167 72
By default, the rename()
method returns a new dataframe after renaming the columns. You can set the inplace
parameter to True
to modify the original dataframe instead of creating a new one.
import pandas as pd
# Create a list of dictionaries
data=[{"Name":"Aditya","Age": 21,"Height": 181, "Weight":76},
{"Name":"Joel","Age": 30,"Height": 176, "Weight":64},
{"Name":"Sam","Age": 23,"Height": 156, "Weight":66},
{"Name":"Tina","Age": 26,"Height": 155, "Weight":56},
{"Name":"Chris","Age": 32,"Height": 167, "Weight":72}]
# Create the original dataframe
df=pd.DataFrame(data)
print("The original dataframe is:")
print(df)
# Rename columns inplace
df.rename(columns={"Height":"Height_in_cm","Weight":"Weight_in_kg"}, inplace=True)
print("The updated dataframe is:")
print(df)
Output:
The original dataframe is:
Name Age Height Weight
0 Aditya 21 181 76
1 Joel 30 176 64
2 Sam 23 156 66
3 Tina 26 155 56
4 Chris 32 167 72
The updated dataframe is:
Name Age Height_in_cm Weight_in_kg
0 Aditya 21 181 76
1 Joel 30 176 64
2 Sam 23 156 66
3 Tina 26 155 56
4 Chris 32 167 72
Sort a Pandas Dataframe in Python
We can sort a pandas dataframe using the sort_values()
method. The sort_values()
method has the following syntax.
df.sort_values(by, ascending=True, inplace=False)
Here,
- The
by
parameter takes a list of column names by which we want to sort the dataframedf
. - By default, the
ascending
parameter is set to True. Hence, thesort()
method sorts the dataframe by values in a given column in ascending order. To sort the dataframe in descending order, you can set theascending
parameter toFalse
. - The
sort()
method returns a new sorted dataframe instead of modifying the original dataframe.To sort the original dataframe, you can set theinplace
parameter to True.
We can sort a pandas dataframe by a given column in ascending order, as shown below:
import pandas as pd
# Create a list of dictionaries
data=[{"Name":"Aditya","Age": 21,"Height": 181, "Weight":76},
{"Name":"Joel","Age": 30,"Height": 176, "Weight":64},
{"Name":"Sam","Age": 23,"Height": 156, "Weight":66},
{"Name":"Tina","Age": 26,"Height": 155, "Weight":56},
{"Name":"Chris","Age": 32,"Height": 167, "Weight":72}]
# Create the original dataframe
df=pd.DataFrame(data)
print("The original dataframe is:")
print(df)
# Sort the dataframe by the Age column
df=df.sort_values(by=["Age"])
print("The sorted dataframe is:")
print(df)
Output:
The original dataframe is:
Name Age Height Weight
0 Aditya 21 181 76
1 Joel 30 176 64
2 Sam 23 156 66
3 Tina 26 155 56
4 Chris 32 167 72
The sorted dataframe is:
Name Age Height Weight
0 Aditya 21 181 76
2 Sam 23 156 66
3 Tina 26 155 56
1 Joel 30 176 64
4 Chris 32 167 72
In this example, we sorted the dataframe by the "Age"
column in ascending order.
To sort the dataframe in descending order, you can set the ascending
parameter to False
.
import pandas as pd
# Create a list of dictionaries
data=[{"Name":"Aditya","Age": 21,"Height": 181, "Weight":76},
{"Name":"Joel","Age": 30,"Height": 176, "Weight":64},
{"Name":"Sam","Age": 23,"Height": 156, "Weight":66},
{"Name":"Tina","Age": 26,"Height": 155, "Weight":56},
{"Name":"Chris","Age": 32,"Height": 167, "Weight":72}]
# Create the original dataframe
df=pd.DataFrame(data)
print("The original dataframe is:")
print(df)
# Sort the dataframe in descending order
df=df.sort_values(by=["Age"], ascending=False)
print("The sorted dataframe is:")
print(df)
Output:
The original dataframe is:
Name Age Height Weight
0 Aditya 21 181 76
1 Joel 30 176 64
2 Sam 23 156 66
3 Tina 26 155 56
4 Chris 32 167 72
The sorted dataframe is:
Name Age Height Weight
4 Chris 32 167 72
1 Joel 30 176 64
3 Tina 26 155 56
2 Sam 23 156 66
0 Aditya 21 181 76
To sort the pandas dataframe inplace, set the inplace
parameter to True
.
import pandas as pd
# Create a list of dictionaries
data=[{"Name":"Aditya","Age": 21,"Height": 181, "Weight":76},
{"Name":"Joel","Age": 30,"Height": 176, "Weight":64},
{"Name":"Sam","Age": 23,"Height": 156, "Weight":66},
{"Name":"Tina","Age": 26,"Height": 155, "Weight":56},
{"Name":"Chris","Age": 32,"Height": 167, "Weight":72}]
# Create the original dataframe
df=pd.DataFrame(data)
print("The original dataframe is:")
print(df)
#Sort the dataframe inplace
df.sort_values(by=["Age"], inplace=True)
print("The sorted dataframe is:")
print(df)
Output:
The original dataframe is:
Name Age Height Weight
0 Aditya 21 181 76
1 Joel 30 176 64
2 Sam 23 156 66
3 Tina 26 155 56
4 Chris 32 167 72
The sorted dataframe is:
Name Age Height Weight
0 Aditya 21 181 76
2 Sam 23 156 66
3 Tina 26 155 56
1 Joel 30 176 64
4 Chris 32 167 72
Filter Rows in a Pandas Dataframe by Condition
We use the indexing operator with comparison operators to filter rows in a pandas dataframe by a given condition. You can apply a condition on a given column and filter rows from the dataframe as shown in the following example:
import pandas as pd
# Create a list of dictionaries
data=[{"Name":"Aditya","Age": 21,"Height": 181, "Weight":76},
{"Name":"Joel","Age": 30,"Height": 176, "Weight":64},
{"Name":"Sam","Age": 23,"Height": 156, "Weight":66},
{"Name":"Tina","Age": 26,"Height": 155, "Weight":56},
{"Name":"Chris","Age": 32,"Height": 167, "Weight":72}]
# Create a dataframe
df=pd.DataFrame(data)
print("The dataframe is:")
print(df)
# Filter rows based on condition
filtered_data=df[df["Age"]>25]
print("The filtered dataframe is:")
print(filtered_data)
Output:
The dataframe is:
Name Age Height Weight
0 Aditya 21 181 76
1 Joel 30 176 64
2 Sam 23 156 66
3 Tina 26 155 56
4 Chris 32 167 72
The filtered dataframe is:
Name Age Height Weight
1 Joel 30 176 64
3 Tina 26 155 56
4 Chris 32 167 72
To apply multiple conditions while filtering rows in a dataframe, you can use logical operators such as the AND operator or OR operator, as shown below.
import pandas as pd
# Create a list of Dictionaries
data=[{"Name":"Aditya","Age": 21,"Height": 181, "Weight":76},
{"Name":"Joel","Age": 30,"Height": 176, "Weight":64},
{"Name":"Sam","Age": 23,"Height": 156, "Weight":66},
{"Name":"Tina","Age": 26,"Height": 155, "Weight":56},
{"Name":"Chris","Age": 32,"Height": 167, "Weight":72}]
# Create the dataframe
df=pd.DataFrame(data)
print("The dataframe is:")
print(df)
# Filter data using multiple conditions
filtered_data=df[(df["Age"]>25) & (df["Height"]>160)]
print("The filtered dataframe is:")
print(filtered_data)
Output:
The dataframe is:
Name Age Height Weight
0 Aditya 21 181 76
1 Joel 30 176 64
2 Sam 23 156 66
3 Tina 26 155 56
4 Chris 32 167 72
The filtered dataframe is:
Name Age Height Weight
1 Joel 30 176 64
4 Chris 32 167 72
In the above example, we used the AND operator to filter rows in the dataframe using two conditions. Observe that the individual conditions are put into parentheses while using multiple conditions. If you don’t put the individual conditions inside parentheses and use the & operator, the program will run into a ValueError
exception.
import pandas as pd
# Create a list of dictionaries
data=[{"Name":"Aditya","Age": 21,"Height": 181, "Weight":76},
{"Name":"Joel","Age": 30,"Height": 176, "Weight":64},
{"Name":"Sam","Age": 23,"Height": 156, "Weight":66},
{"Name":"Tina","Age": 26,"Height": 155, "Weight":56},
{"Name":"Chris","Age": 32,"Height": 167, "Weight":72}]
# Create the dataframe
df=pd.DataFrame(data)
print("The dataframe is:")
print(df)
# Filter rows by multiple conditions
filtered_data=df[df["Age"]>25 & df["Height"]>160]
print("The filtered dataframe is:")
print(filtered_data)
Output:
The dataframe is:
Name Age Height Weight
0 Aditya 21 181 76
1 Joel 30 176 64
2 Sam 23 156 66
3 Tina 26 155 56
4 Chris 32 167 72
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
Export Pandas Dataframe Into a CSV or Excel File
You can export a pandas dataframe to a CSV file using the to_csv()
method. The to_csv()
method, when invoked on a pandas dataframe, takes the filename of the output file and saves the data in the dataframe to the CSV file.
import pandas as pd
# Create a list of dictionaries
data=[{"Name":"Aditya","Age": 21,"Height": 181, "Weight":76},
{"Name":"Joel","Age": 30,"Height": 176, "Weight":64},
{"Name":"Sam","Age": 23,"Height": 156, "Weight":66},
{"Name":"Tina","Age": 26,"Height": 155, "Weight":56},
{"Name":"Chris","Age": 32,"Height": 167, "Weight":72}]
# Create a dataframe
df=pd.DataFrame(data)
# Export the dataframe to a csv file
df.to_csv("output_file.csv")
Output:
In the above example, the output CSV file contains an unnamed column containing the indices of the dataframe.To avoid the unnamed column in the output file, you can set the index
parameter to False
in the to_csv()
method.
import pandas as pd
# Create a list of dictionaries
data=[{"Name":"Aditya","Age": 21,"Height": 181, "Weight":76},
{"Name":"Joel","Age": 30,"Height": 176, "Weight":64},
{"Name":"Sam","Age": 23,"Height": 156, "Weight":66},
{"Name":"Tina","Age": 26,"Height": 155, "Weight":56},
{"Name":"Chris","Age": 32,"Height": 167, "Weight":72}]
# Create a dataframe
df=pd.DataFrame(data)
# Export the dataframe to a csv file
df.to_csv("output_without_index.csv", index=False)
Output:
By default, the to_csv()
method uses the comma character ","
as the separator for the values in the output file. If the data in the dataframe contains the comma character, the output CSV file will be corrupted in such a case.
To avoid this, we can specify a different character for the column separator in the CSV file using the sep
parameter in the to_csv()
method. For instance, we can use the "|"
character as the separator in the CSV file while exporting a pandas dataframe as shown below:
import pandas as pd
# Create a list of dictionaries
data=[{'Name': 'Aditya', 'Roll': 21, 'Subjects': 'Maths,Physics,Chemistry'},
{'Name': 'Joel', 'Roll': 30, 'Subjects': 'Biology,Physics,Chemistry'},
{'Name': 'Sam', 'Roll': 23, 'Subjects': 'History,Geography,Civics'},
{'Name': 'Tina', 'Roll': 26, 'Subjects': 'Maths,Physics,Chemistry'},
{'Name': 'Chris', 'Roll': 32, 'Subjects': 'Biology,Maths,Physics,Chemistry'}]
# Create a dataframe
df=pd.DataFrame(data)
# Export dataframe to csv file with | separator
df.to_csv("output_with_sep.csv",sep="|", index=False)
Output:
You can also export a pandas dataframe to an Excel file. For this, you can use the to_excel()
method, as shown below.
import pandas as pd
# Create a list of dictionaries
data=[{"Name":"Aditya","Age": 21,"Height": 181, "Weight":76},
{"Name":"Joel","Age": 30,"Height": 176, "Weight":64},
{"Name":"Sam","Age": 23,"Height": 156, "Weight":66},
{"Name":"Tina","Age": 26,"Height": 155, "Weight":56},
{"Name":"Chris","Age": 32,"Height": 167, "Weight":72}]
# Create a dataframe
df=pd.DataFrame(data)
#Export dataframe to Excel file
df.to_excel("output_file.xlsx", index=False)
Output:
Conclusion
In this tutorial, we discussed basic operations on a pandas dataframe in Python, how to create and modify pandas dataframes using various approaches, sorting and filtering dataframes by one or more columns, and exporting pandas dataframes into CSV and Excel files. To learn more about python programming, you can read this article on SimpleHTTPServer in Python. You might also like this article on working with random module in Python.
I hope you enjoyed reading this tutorial. Stay tuned for more informative content.
Happy Learning!