はじめに
sklearnのdatasets.make_moons
で三日月状の分布を示すクラスタリング、分類用のデータを作成することができる。ここでは各種パラメータが生成データに及ぼす影響について説明する。
解説
モジュールのインポートなど
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import matplotlib.pyplot as plt
from sklearn.datasets import make_moons
バージョン
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#version
import matplotlib
print(matplotlib.__version__)
3.3.3
import sklearn
print(sklearn.__version__)
0.24.0
n_samples
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#n_samples
X1,Y1 = make_moons(n_samples=10)
X2,Y2 = make_moons(n_samples=20)
X3,Y3 = make_moons(n_samples=50)
X4,Y4 = make_moons(n_samples=100)
fig, ax = plt.subplots(figsize=(5,5),nrows=2,ncols=2,dpi=140)
ax =ax.ravel()
ax[0].scatter(X1[:,0],X1[:,1],s=10,c=Y1,cmap="coolwarm")
ax[1].scatter(X2[:,0],X2[:,1],s=10,c=Y2,cmap="coolwarm")
ax[2].scatter(X3[:,0],X3[:,1],s=10,c=Y3,cmap="coolwarm")
ax[3].scatter(X4[:,0],X4[:,1],s=10,c=Y4,cmap="coolwarm")
ax[0].set_title("n_samples=10")
ax[1].set_title("n_samples=20")
ax[2].set_title("n_samples=50")
ax[3].set_title("n_samples=100")
[ax[i].axis("equal") for i in range(4)]
plt.tight_layout()
plt.savefig("n_samples.png",dpi=120)
plt.show()
n_samplesを変化させることでサンプル数を変えることができる。

noise
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#noise
X1,Y1 = make_moons(noise=None)
X2,Y2 = make_moons(noise=.1)
X3,Y3 = make_moons(noise=.2)
X4,Y4 = make_moons(noise=.3)
fig, ax = plt.subplots(figsize=(5,5),nrows=2,ncols=2,dpi=140)
ax =ax.ravel()
ax[0].scatter(X1[:,0],X1[:,1],s=10,c=Y1,cmap="coolwarm")
ax[1].scatter(X2[:,0],X2[:,1],s=10,c=Y2,cmap="coolwarm")
ax[2].scatter(X3[:,0],X3[:,1],s=10,c=Y3,cmap="coolwarm")
ax[3].scatter(X4[:,0],X4[:,1],s=10,c=Y4,cmap="coolwarm")
ax[0].set_title("noise=None")
ax[1].set_title("noise=0.1")
ax[2].set_title("noise=0.2")
ax[3].set_title("noise=0.3")
[ax[i].axis("equal") for i in range(4)]
plt.tight_layout()
plt.savefig("noise.png",dpi=120)
plt.show()
noiseでばらつきを付与できる。

random_state
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#random_state
X1,Y1 = make_moons(noise=.1,random_state=1)
X2,Y2 = make_moons(noise=.1,random_state=2)
X3,Y3 = make_moons(noise=.1,random_state=3)
X4,Y4 = make_moons(noise=.1,random_state=4)
fig, ax = plt.subplots(figsize=(5,5),nrows=2,ncols=2,dpi=140)#,sharex=True,sharey=True)
ax =ax.ravel()
ax[0].scatter(X1[:,0],X1[:,1],s=10,c=Y1,cmap="coolwarm")
ax[1].scatter(X2[:,0],X2[:,1],s=10,c=Y2,cmap="coolwarm")
ax[2].scatter(X3[:,0],X3[:,1],s=10,c=Y3,cmap="coolwarm")
ax[3].scatter(X4[:,0],X4[:,1],s=10,c=Y4,cmap="coolwarm")
ax[0].set_title("random_state=1")
ax[1].set_title("random_state=2")
ax[2].set_title("random_state=3")
ax[3].set_title("random_state=4")
[ax[i].axis("equal") for i in range(4)]
plt.tight_layout()
plt.savefig("random_state.png",dpi=120)
plt.show()
random_stateを変えることで再現可能な乱数を生成することができる。

shuffle
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#shuffle
X1,Y1 = make_moons(shuffle=True)
X2,Y2 = make_moons(shuffle=False)
Y1#shuffle=True
array([1, 1, 0, 1, 0, 0, 0, 1, 1, 0, 1, 1, 1, 0, 1, 1, 0, 1, 0, 0, 0, 1,
1, 0, 0, 1, 1, 0, 1, 0, 0, 0, 1, 0, 0, 1, 0, 1, 1, 1, 0, 1, 1, 1,
0, 1, 0, 1, 1, 1, 1, 0, 1, 1, 0, 0, 1, 1, 0, 0, 1, 0, 1, 1, 0, 1,
0, 1, 1, 0, 1, 0, 1, 0, 0, 0, 1, 1, 0, 0, 1, 0, 0, 1, 0, 1, 0, 1,
0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 1, 0])
Y2#shuffle=False
array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1])
shuffleをFalseとすることでラベルがソートされたデータを得ることができる。デフォルトはTrue.
コードをダウンロード(.pyファイル) コードをダウンロード(.ipynbファイル)参考

make_moons
Gallery examples: Classifier comparison Comparing different clustering algorithms on toy datasets Comparing different hi...
コメント