SK Networks Family AI bootcamp 강의노트/Kaggle

타이타닉 데이터셋 분석 및 학습 설계도

HyunJung_Jo 2025. 2. 11. 23:27

Workflow

  1. EDA
    1. pivot_table
    2. groupby
    3. 통계분석
    4. 상관관계
    5. 교차분석
  2. Data Preprocessing
    1. 결측치 치환 혹은 삭제
    2. data encoding (one-hot encoding, mean-encoding, label-encoding)
    3. data scaling (standard scaler, minmax scaler, max abs scaler, robust scaler)
    4. train_test_split()
  3. Train
    1. 분류모델 : 로지스틱 회귀, linear SVM,
    2. 분류, 회귀 : Decision Tree, K-NN
    3. 앙상블 : Light GBM
    4. random_state:42 고정
    5. HPO: Bayesian Search 
  4. Evaluation
    1. 회귀: R2, RMSE
    2. 정규화: Ridge (L2), Lasso (L1)
    3. 분류: Recall, Precision, AUROC (이진분류)
  5. Prediction