Pandas iloc vs loc are two of the most widely used functions of the Pandas library in Python which generally focuses on data wrangling and helps build the basis of data tables in python. The functions help in accessing the labels or index-based rows and columns in the data table and allows a user to see exactly what they want using very short calling codes. The code is easy to use and can be understood by non-programmers too.
In this article, we’ll go over what the iloc and loc functions in pandas offer to users, their restrictions, and when you can use them with a wide range of examples.
Also, read -> Pandas Crosstab Function in Python: How to utilize it in a better way?
What is Pandas in Python?
Pandas is one of the most powerful libraries in Python. Think of them as an imitation of Excel with upgrades for programming. Pandas are the go-to, to handle datasets in python. Pandas work with tabular data and can handle enormous amounts of rows and columns of data and can manipulate them in seconds.
They are also highly preferred in the Data Analytics process because of their impact on the Preprocessing and Transformation of data. You can combine commands with other libraries in python and get your desired output within just a handful of lines of code. You can take in data into pandas from many sources such as CSV files and excel files and transform them as necessary.
Find more about pandas here -> Python Pandas
What is the difference between Pandas iloc vs loc functions?
What is Pandas iloc:
The iloc and loc functions of pandas in python are used to locate the elements in a table. The table comprises rows and columns and between Pandas iloc and loc, if you use the iloc function you’d end up locating the elements in a table using the row numbers or column numbers, essentially the iloc function does not work without using numbers. The locating process happens using integers only which is not the case in the loc function.
Restrictions on using Pandas iloc:
The difference between Pandas iloc vs loc is that the inputs allowed by the .iloc function in Pandas are primarily integer or index-based and can be used as follows:
- An integer, e.g. 5.
- A list or array of integers, e.g. [4, 3, 0].
- A slice object with ints, e.g. 1:7.
- A boolean array.
- A callable function with one argument (the calling Series or DataFrame) that returns valid output for indexing (one of the above). This is useful in method chains, when you don’t have a reference to the calling object, but would like to base your selection on some value.
What is Pandas loc:
Similar to the iloc function, the loc function of pandas in python is used to locate the elements in a table but by calling the elements or column names or row names without having to use the index numbers. The table comprises rows and columns and between Pandas iloc vs loc, if you use the loc function you’d end up locating the elements in a table using the element names or column names or with booleans which work with conditions imposed on the columns, etc.
It is important to know that between Pandas iloc vs loc, this is more used in data analysis as with a lot of columns, you would find it difficult to keep a track of the analyses with the iloc function.
Restrictions on using Pandas loc:
On the other side of Pandas iloc vs loc, the inputs allowed by a .loc function in Pandas are primarily label based and can be used with a boolean array too are:
- A single label like ‘a’
- A list or array of labels [‘a’,’b’]
- A slice of objects with labels
- A boolean array of the length as the axis being sliced.
- An alignable boolean Series. The index of the key will be aligned before masking.
- An alignable Index. The Index of the returned selection will be the input.
- A callable function with one argument that returns valid output for indexing.
Examples for pandas iloc vs loc
Let us go over an example to use index locations or iloc to see how the code works between Pandas iloc vs loc.
Pandas iloc vs loc both have a lot of ways of execution.
The iloc function can be executed using any of the following methods:
- scalar integer like [0]
- list of integers like [[0, 1]]
- slice object like [:3]
- boolean mask like [[True, False, True]]
- lambda functions like [lambda x: x.index % 2 == 0]
- scalar integers like [0, 1]
- list of integers like [[0, 2], [1, 3]]
- multiple slices like [1:3, 0:3]
- boolean arrays like [:, [True, False, True, False]]
- callable functions expecting a dataframe like ( lambda df: [0,3])
import pandas as pd #assigning a dataframe using pandas for demonstrating pandas iloc vs loc rame using pandas for demonstrating pandas iloc vs loc dict = [{'a': 1, 'b': 2, 'c': 3, 'd': 4}, {'a': 100, 'b': 200, 'c': 300, 'd': 400}, {'a': 1000, 'b': 2000, 'c': 3000, 'd': 4000 }] df = pd.DataFrame(dict) df ## a b c d #0 1 2 3 4 #1 100 200 300 400 #2 1000 2000 3000 4000 type(df.iloc[0]) #<class 'pandas.core.series.Series'> df.iloc[0] #a 1 #b 2 #c 3 #d 4 #Name: 0, dtype: int64 df.iloc[[0]] # a b c d #0 1 2 3 4 #type(df.iloc[[0]]) #<class 'pandas.core.frame.DataFrame'> df.iloc[:3] # a b c d #0 1 2 3 4 #1 100 200 300 400 #2 1000 2000 3000 4000 df.iloc[lambda x: x.index % 2 == 0] # a b c d #0 1 2 3 4 #2 1000 2000 3000 4000 df.iloc[[0, 2], [1, 3]] # b d #0 2 4 #2 2000 4000 df.iloc[:, [True, False, True, False]] # a c #0 1 3 #1 100 300 #2 1000 3000 df.iloc[:, lambda df: [0, 2]] a c #0 1 3 #1 100 300 #2 1000 3000 df.iloc[1:3, 0:3] # a b c #1 100 200 300 #2 1000 2000 3000
You can use the loc functions in pandas in the following different ways: (observe how similar they look to the iloc function in pandas iloc vs loc):
- a single label like [‘a’]
- list of labels like [[‘a’,’b’]]
- single label for rows and columns like [‘a’,’b’]
- Boolean list with the same length as the row axis
- Alignable boolean Series
- Conditional that returns a boolean Series like [df[‘a’]>2]
- lambda functions like lambda df: df[‘a’] == 0
- Set value for an entire row or column
- Set value for rows matching callable condition
- A number of examples using a DataFrame with a MultiIndex
You can refer to the entire list of examples here in the Pandas documentation for Pandas iloc vs loc.
The examples for the execution of loc codes using the same examples as the Pandas Documentation for loc are as follows;
import pandas as pd #assigning a dataframe using pandas for demonstrating pandas iloc vs loc df = pd.DataFrame([[1, 2], [4, 5], [7, 8]], index=['cobra', 'viper', 'sidewinder'], columns=['max_speed', 'shield']) df # max_speed shield #cobra 1 2 #viper 4 5 #sidewinder 7 8 df.loc['viper'] #max_speed 4 #shield 5 #Name: viper, dtype: int64 df.loc[['viper', 'sidewinder']] # max_speed shield #viper 4 5 #sidewinder 7 8 df.loc[pd.Series([False, True, False], # index=['viper', 'sidewinder', 'cobra'])] # max_speed shield #sidewinder 7 8 df.loc[df['shield'] > 6, ['max_speed']] # max_speed #sidewinder 7 df.loc[lambda df: df['shield'] == 8] # max_speed shield #sidewinder 7 8 df.loc[['viper', 'sidewinder'], ['shield']] = 50 df # max_speed shield #cobra 1 2 #viper 4 50# #sidewinder 7 50 df.loc[df['shield'] > 35] = 0 df # max_speed shield #cobra 30 10 #viper 0 0 #sidewinder 0 0 tuples = [ ('cobra', 'mark i'), ('cobra', 'mark ii'), ('sidewinder', 'mark i'), ('sidewinder', 'mark ii'), ('viper', 'mark ii'), ('viper', 'mark iii') ] index = pd.MultiIndex.from_tuples(tuples) values = [[12, 2], [0, 4], [10, 20], [1, 4], [7, 1], [16, 36]] df = pd.DataFrame(values, columns=['max_speed', 'shield'], index=index) df # max_speed shield #cobra mark i 12 2 # mark ii 0 4 #sidewinder mark i 10 20 # mark ii 1 4 #viper mark ii 7 1 # mark iii 16 36
Conclusion
Understanding the pandas iloc vs loc difference and knowing when you can use which one is essential in conducting a production-grade algorithm or to have precise data wrangling before an analysis. The data is easier to identify, organize, and manipulate, with the help of these functions. The pandas functions not only help with locating things but manipulation and changing values is as easy as just typing them out.
The iloc and loc functions of pandas are also used to find the columns that you need to manipulate in your data and in data wrangling, one can apply either pandas iloc vs loc to manipulate data furthermore without having to add any other library in your analysis. Methods like drop(), dropna(), etc. can be applied as required too.
Knowing how to use Pandas to manipulate data and wrangle it is what is probably the best thing to do when you start out with Python for Data Science. Check out the codes right now and tell us what you prefer to use in your data analyses.
For more such content, check out our website -> Buggy Programmer
An eternal learner, I believe Data is the panacea to the world's problems. I enjoy Data Science and all things related to data. Let's unravel this mystery about what Data Science really is, together. With over 33 certifications, I enjoy writing about Data Science to make it simpler for everyone to understand. Happy reading and do connect with me on my LinkedIn to know more!