はじめに
DataFrameにおける欠損値(NaN)の処理方法として、NaNを除外したり、置換したりする方法について説明する。
解説
モジュールのインポート
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import pandas as pd
import numpy as np
pd.options.display.notebook_repr_html = False
NaNを含むDataFrameの作成
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#DataFrameの作成
data = pd.DataFrame(np.random.randint(1,10,(3,3)), index=['C','A','C'],columns=['T','U','S'] )
data
'''
T U S
C 3 2 4
A 4 8 5
C 4 6 5
'''
data.iloc[-1,-1] = np.nan
data
'''
T U S
C 3 2 4.0
A 4 8 5.0
C 4 6 NaN
'''
NaNがTrueとなるisnull()
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
data.isnull()
'''
T U S
C False False False
A False False False
C False False True
'''
isnullにより、NaNの部分がTrueとなったbool型の配列が得られる。
NaNがFalseとなるnotnull()
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
data.notnull()
'''
T U S
C True True True
A True True True
C True True False
'''
notnullはisnullの逆で、NaNの部分がFalseとなったbool型の配列が得られる。
NaNを除外するdropna()
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
data.dropna()
'''
T U S
C 3 2 4.0
A 4 8 5.0
'''
dropna()によりNaNを含む行を除外した配列が得られる。
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
data.dropna(axis=1)
'''
T U
C 3 2
A 4 8
C 4 6
'''
data.dropna(axis='columns')
'''
T U
C 3 2
A 4 8
C 4 6
'''
dropna()でaxis=1
、または、axis='columns'
とすることでよりNaNを含む列を除外できる。
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
data.iloc[2,:] = np.nan
data
'''
T U S
C 3.0 2.0 4.0
A 4.0 8.0 5.0
C NaN NaN NaN
'''
data.dropna(how='all')
'''
T U S
C 3.0 2.0 4.0
A 4.0 8.0 5.0
'''
how='all'
により、全てNaNの行だけ除外される。
NaNに値を代入するfillna()
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
data.fillna(0)
'''
T U S
C 3.0 2.0 4.0
A 4.0 8.0 5.0
C 0.0 0.0 0.0
'''
.fillna(#)の#にNaNの代わりにしたい数値をいれると、NaNがその数値となった配列が得られる。
ひとつ前の要素を代入するffill
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
data.fillna(method='ffill')
'''
T U S
C 3.0 2.0 4.0
A 4.0 8.0 5.0
C 4.0 8.0 5.0
'''
ffillはforward-fillの略で、NaNのひとつ前の要素が代入される。
ひとつ後の要素を代入するbfill
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#fillna bfill
data = data.fillna(5)
data
'''
T U S
C 3.0 2.0 4.0
A 4.0 8.0 5.0
C 5.0 5.0 5.0
'''
data.iloc[1,:] = np.nan
data
'''
T U S
C 3.0 2.0 4.0
A NaN NaN NaN
C 5.0 5.0 5.0
'''
data.fillna(method='bfill')
'''
T U S
C 3.0 2.0 4.0
A 5.0 5.0 5.0
C 5.0 5.0 5.0
'''
bfillはback-fillの略でひとつ後の要素が代入される。
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
data = data.fillna(7)
data
'''
T U S
C 3.0 2.0 4.0
A 7.0 7.0 7.0
C 5.0 5.0 5.0
'''
data.iloc[:,1] = np.nan
data
'''
T U S
C 3.0 NaN 4.0
A 7.0 NaN 7.0
C 5.0 NaN 5.0
'''
data.fillna(method='bfill',axis=1)
'''
T U S
C 3.0 4.0 4.0
A 7.0 7.0 7.0
C 5.0 5.0 5.0
'''
axis=1とすることで列方向のひとつ後の要素が代入される。
ffill, bfillともに前、後の要素がNaNの場合はNaNのままとなる。
参考
pandas.DataFrame.fillna — pandas 2.2.3 documentation
pandas.DataFrame.dropna — pandas 2.2.3 documentation
コメント