はじめに
sklearnのdatasets.make_regression
で回帰問題用のランダムなデータを作成することができる。ここでは各種パラメータが生成するデータに及ぼす影響について説明する。
解説
モジュールのインポートなど
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import matplotlib.pyplot as plt
from sklearn.datasets import make_regression
plt.rcParams['scatter.edgecolors'] = "gray"
バージョン
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#version
import matplotlib
print(matplotlib.__version__)
3.3.3
import sklearn
print(sklearn.__version__)
0.24.0
n_samples
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#n_samples
X1,Y1 = make_regression(n_samples=10,n_features=1,n_informative=1)
X2,Y2 = make_regression(n_samples=20,n_features=1,n_informative=1)
X3,Y3 = make_regression(n_samples=50,n_features=1,n_informative=1)
X4,Y4 = make_regression(n_samples=100,n_features=1,n_informative=1)
fig, ax = plt.subplots(nrows=2,ncols=2,sharex=True,sharey=True,dpi=140)
ax =ax.ravel()
ax[0].scatter(X1[:,0],Y1,s=20)
ax[1].scatter(X2[:,0],Y2,s=20)
ax[2].scatter(X3[:,0],Y3,s=20)
ax[3].scatter(X4[:,0],Y4,s=20)
ax[0].set_title("n_samples=10")
ax[1].set_title("n_samples=20")
ax[2].set_title("n_samples=50")
ax[3].set_title("n_samples=100")
plt.tight_layout()
plt.savefig("n_samples.png",dpi=120)
plt.show()
n_samplesを変化させることでサンプル数を変えることができる。

n_features
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#n_features
X1,Y1 = make_regression(n_features=2)
X2,Y2 = make_regression(n_features=3)
X3,Y3 = make_regression(n_features=4)
X4,Y4 = make_regression(n_features=5)
X1.shape,X2.shape,X3.shape,X4.shape
((100, 2), (100, 3), (100, 4), (100, 5))
データの列数を変えることができる。
n_informative
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#n_informative
X1,Y1 = make_regression(n_features=4,n_informative=1)
X2,Y2 = make_regression(n_features=4,n_informative=2)
fig, ax = plt.subplots(nrows=2,ncols=2,sharex=True,sharey=True,dpi=140)
ax =ax.ravel()
ax[0].scatter(X1[:,0],Y1,s=10)
ax[1].scatter(X1[:,1],Y1,s=10)
ax[2].scatter(X1[:,2],Y1,s=10)
ax[3].scatter(X1[:,3],Y1,s=10)
plt.suptitle("n_informative=1")
plt.tight_layout()
plt.savefig("n_informative1.png",dpi=120)
plt.show()
fig, ax = plt.subplots(nrows=2,ncols=2,sharex=True,sharey=True,dpi=140)
ax =ax.ravel()
ax[0].scatter(X2[:,0],Y2,s=10)
ax[1].scatter(X2[:,1],Y2,s=10)
ax[2].scatter(X2[:,2],Y2,s=10)
ax[3].scatter(X2[:,3],Y2,s=10)
plt.suptitle("n_informative=2")
plt.tight_layout()
plt.savefig("n_informative2.png",dpi=120)
plt.show()
線形モデルに適合するデータ(列)の数を設定できる。1と2とした場合は以下のようになる。2は左上と右下が線形モデルに合うように思える。


n_targets
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#n_targets
X1,Y1 = make_regression(n_features=1,n_informative=1,n_targets=1)
X2,Y2 = make_regression(n_features=1,n_informative=1,n_targets=2)
X3,Y3 = make_regression(n_features=1,n_informative=1,n_targets=3)
X4,Y4 = make_regression(n_features=1,n_informative=1,n_targets=4)
Y1.shape,Y2.shape,Y3.shape,Y4.shape
((100,), (100, 2), (100, 3), (100, 4))
fig, ax = plt.subplots(nrows=2,ncols=2,sharex=True,sharey=True,dpi=140)
ax =ax.ravel()
ax[0].scatter(X4[:,0],Y4[:,0],s=10)
ax[1].scatter(X4[:,0],Y4[:,1],s=10)
ax[2].scatter(X4[:,0],Y4[:,2],s=10)
ax[3].scatter(X4[:,0],Y4[:,3],s=10)
plt.suptitle("n_targets=4")
plt.tight_layout()
plt.savefig("n_targets.png",dpi=120)
plt.show()
n_targetsは出力値Yの数となる。n_targets=4として異なるYとXをプロットすると以下のようになる。

bias
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#bias
X1,Y1 = make_regression(n_features=1,n_informative=1,bias=100)
X2,Y2 = make_regression(n_features=1,n_informative=1,bias=0)
X3,Y3 = make_regression(n_features=1,n_informative=1,bias=-100)
X4,Y4 = make_regression(n_features=1,n_informative=1,bias=-200)
fig, ax = plt.subplots(nrows=2,ncols=2,sharex=True,sharey=True,dpi=140)
ax =ax.ravel()
ax[0].scatter(X1[:,0],Y1,s=10)
ax[1].scatter(X2[:,0],Y2,s=10)
ax[2].scatter(X3[:,0],Y3,s=10)
ax[3].scatter(X4[:,0],Y4,s=10)
ax[0].plot(0,100,"ro")
ax[1].plot(0,0,"ro")
ax[2].plot(0,-100,"ro")
ax[3].plot(0,-200,"ro")
ax[0].set_title("bias=100")
ax[1].set_title("bias=0")
ax[2].set_title("bias=-100")
ax[3].set_title("bias=-200")
plt.tight_layout()
plt.savefig("bias.png",dpi=120)
plt.show()
biasは切片の値となる。

noise
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#noise
X1,Y1 = make_regression(n_features=1,n_informative=1,noise=10)
X2,Y2 = make_regression(n_features=1,n_informative=1,noise=20)
X3,Y3 = make_regression(n_features=1,n_informative=1,noise=50)
X4,Y4 = make_regression(n_features=1,n_informative=1,noise=100)
fig, ax = plt.subplots(nrows=2,ncols=2,sharex=True,sharey=True,dpi=140)
ax =ax.ravel()
ax[0].scatter(X1[:,0],Y1,s=10)
ax[1].scatter(X2[:,0],Y2,s=10)
ax[2].scatter(X3[:,0],Y3,s=10)
ax[3].scatter(X4[:,0],Y4,s=10)
ax[0].set_title("noise=10")
ax[1].set_title("noise=20")
ax[2].set_title("noise=50")
ax[3].set_title("noise=100")
plt.tight_layout()
plt.savefig("noise.png",dpi=120)
plt.show()
noiseでばらつきを付与できる。

random_state
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#random_state
X1,Y1 = make_regression(n_features=1,n_informative=1,random_state=1)
X2,Y2 = make_regression(n_features=1,n_informative=1,random_state=2)
X3,Y3 = make_regression(n_features=1,n_informative=1,random_state=3)
X4,Y4 = make_regression(n_features=1,n_informative=1,random_state=4)
fig, ax = plt.subplots(nrows=2,ncols=2,sharex=True,sharey=True,dpi=140)
ax =ax.ravel()
ax[0].scatter(X1[:,0],Y1,s=10)
ax[1].scatter(X2[:,0],Y2,s=10)
ax[2].scatter(X3[:,0],Y3,s=10)
ax[3].scatter(X4[:,0],Y4,s=10)
ax[0].set_title("random_state=1")
ax[1].set_title("random_state=2")
ax[2].set_title("random_state=3")
ax[3].set_title("random_state=4")
plt.tight_layout()
plt.savefig("random_state.png",dpi=120)
plt.show()
random_stateを変えることで再現可能な乱数を生成することができる。

coef
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#coef
X1,Y1,coef = make_regression(n_features=1,n_informative=1,coef=True)
coef
array(22.1601716)
X2,Y2,coef = make_regression(n_features=4,n_informative=2,coef=True)
coef
array([ 0. , 1.5863515 , 0. , 41.66548154])
coefをTrueとすることで線形モデルの係数を得ることができる。デフォルトはFalse.
n_features=4とすれば4つのcoefが得られる。
参考

make_regression
Gallery examples: Release Highlights for scikit-learn 1.4 Release Highlights for scikit-learn 0.23 Prediction Latency Co...
コメント