Statistics and ML

Scikit-learn을 활용한 데이터 처리와 성능 평가

칼쵸쵸 2022. 5. 13. 00:24

1. 데이터 전처리후(Pandas dataTable) Target Value 분리

import numpy as np
import pandas as pd

# target 확인
target = 'Target_YVALUE'

# 데이터 분리
x = data.drop(target, axis=1)
y = data[target]

2) Training, Test 데이터 분리

# scikit-learn을 사용한 데이터 분리
from sklearn.model_selection import train_test_split

# 8:2으로 분리
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.2, random_state=2022)

 

3. Traing Set을 활용한 간단한 회귀 분석 모델 구현

from sklearn.linear_model import LinearRegression
model = LinearRegression()
model.fit(x_train, y_train)
y_pred = model.predict(x_test)
 

4 . 다양한 성능 평가

from sklearn.metrics import mean_absolute_error
from sklearn.metrics import mean_squared_error
from sklearn.metrics import mean_squared_error
from math import sqrt
from sklearn.metrics import mean_absolute_percentage_error
from sklearn.metrics import r2_score

## 성능 평가

# MAE
mean_absolute_error(y_test,y_pred)

# MSE
mean_squared_error(y_test,y_pred)

#RMSE
sqrt(mean_squared_error(y_test,y_pred))

#MAPE
mean_absolute_percentage_error(y_test,y_pred)

#R2
r2_score(y_test,y_pred)