27일차 [ earlystopping ]

SK Networks Family AI bootcamp 강의노트

27일차 [ earlystopping ]

HyunJung_Jo 2025. 2. 21. 17:29

금요일이다.
아침부터 좀 힘들었다. 오늘은 진짜 일찍 자야지. 12시에 자는게 목표다.
그리고... 괜히 힘들게 놀지 말아야지.

무엇을 배우건 간에 내가 직접 설명할 수 있을 수준으로 해야 한다. 남의 코드도 해석할 수 있는 정도로 해야 한다.

수업 자료

내가 작성한 자료

https://colab.research.google.com/drive/1qNZoVRQwamzY-nD3Z_ctezQ1mzM9zID7#scrollTo=PJX6dW9EHQRq

Google Colab Notebook

Run, share, and edit Python notebooks

colab.research.google.com

Optimizer 설명

Optimizer는 머신러닝 모델, 특히 신경망 모델의 학습 과정에서 매우 중요한 역할을 수행하는 알고리즘입니다.

간단히 말해, Optimizer는 모델의 가중치(weights)를 업데이트하는 방식을 결정합니다. 모델은 입력 데이터를 받아 예측을 출력하는데, 이 예측과 실제 값 사이의 오차를 줄이기 위해 가중치를 조정해야 합니다. Optimizer는 이 가중치 조정을 효율적으로 수행하는 역할을 담당합니다.

Optimizer의 역할

손실 함수(Loss function)의 기울기(Gradient) 계산: Optimizer는 모델의 손실 함수 값을 최소화하기 위해 손실 함수의 기울기를 계산합니다. 기울기는 가중치를 어떤 방향으로 얼마나 조정해야 손실 값이 감소하는지를 나타냅니다.
가중치 업데이트: Optimizer는 계산된 기울기를 이용하여 모델의 가중치를 업데이트합니다. 기울기가 가리키는 방향으로 가중치를 조정하여 손실 값을 줄이는 방향으로 모델을 학습시킵니다.
학습률(Learning rate) 조절: Optimizer는 학습률이라는 매개변수를 사용하여 가중치 업데이트의 크기를 조절합니다. 학습률이 너무 크면 최적값을 overshoot하여 발산할 수 있고, 너무 작으면 학습 속도가 느려질 수 있습니다. Optimizer는 학습 과정 동안 학습률을 조절하여 최적의 학습 속도를 유지합니다.

Optimizer 종류

다양한 종류의 Optimizer가 존재하며, 각 Optimizer는 가중치 업데이트 방식에 따라 장단점을 가지고 있습니다. 몇 가지 대표적인 Optimizer는 다음과 같습니다.

SGD (Stochastic Gradient Descent): 가장 기본적인 Optimizer로, 기울기 방향으로 가중치를 업데이트합니다.
Adam (Adaptive Moment Estimation): SGD의 단점을 보완한 Optimizer로, 학습률을 자동으로 조절하고 momentum을 사용하여 학습 속도를 높입니다.
RMSprop (Root Mean Square Propagation): Adam과 유사하지만, momentum 대신 지수 평균을 사용하여 기울기를 조절합니다.

Optimizer 선택

Optimizer 선택은 모델의 성능에 큰 영향을 미칠 수 있습니다. 일반적으로 Adam이나 RMSprop과 같은 고급 Optimizer가 SGD보다 빠르고 안정적인 학습을 제공합니다. 하지만 특정 문제에서는 SGD가 더 나은 성능을 보일 수도 있습니다. 따라서 다양한 Optimizer를 실험해보고 문제에 가장 적합한 Optimizer를 선택하는 것이 중요합니다.

코드 예시

content_copy

import torch.optim as optim

# 모델 생성
model = MyModel()

# Optimizer 생성 (Adam 사용)
optimizer = optim.Adam(model.parameters(), lr=0.001) 

# 학습 과정에서 Optimizer 사용
...
optimizer.zero_grad() # 기울기 초기화
loss.backward() # 기울기 계산
optimizer.step() # 가중치 업데이트
...

Use code with caution

이 코드에서는 torch.optim 모듈에서 Adam Optimizer를 생성하고, 모델의 parameters()를 전달하여 Optimizer가 모델의 가중치를 업데이트하도록 설정합니다. lr은 학습률을 나타냅니다.

Optimizer는 머신러닝 모델 학습에 필수적인 요소이며, 적절한 Optimizer 선택과 설정은 모델의 성능을 크게 향상시킬 수 있습니다.

도움이 되었기를 바랍니다.

앞으로는 코딩이 아니라 설계가 더 중요해진다. 코딩은 Coding assistant로서 지피티를 쓰면 된다. 산책하면서 설계한다.

Okay, let's focus on the dim parameter in this line:

content_copy

pred_inx = nn.Softmax(dim=1)(pred).argmax(dim=1)

Use code with caution

pred_inx = nn.Softmax(dim=1)(pred).argmax(dim=1)

Understanding dim

The dim parameter in both nn.Softmax and argmax is crucial. It specifies the dimension along which the operation is performed.

Think of it like this: Your prediction pred is likely a tensor with a shape of [1, 10] where:

1: Represents the batch size (since you're predicting on a single image)
10: Represents the number of classes in the FashionMNIST dataset.

nn.Softmax(dim=1)

Here, dim=1 tells the Softmax function to calculate probabilities across the second dimension (index 1) of the pred tensor. This means it will calculate the probability distribution across the 10 classes for the single image in the batch. The result will still be a tensor of shape [1, 10] but now containing probabilities instead of raw logits.

argmax(dim=1)

Similarly, in argmax(dim=1), dim=1 instructs the function to find the index of the maximum value along the second dimension (the 10 classes). It essentially finds the class with the highest probability. The output here will be a tensor of shape [1] containing the predicted class index.

Why not dim=0?

If you were to use dim=0, the operations would be performed along the first dimension, which represents the batch size. This would lead to incorrect results:

nn.Softmax(dim=0) would calculate probabilities across the batch dimension (which is only a single image in this case) – this doesn't make much sense.
argmax(dim=0) would find the index of the maximum value within the batch dimension, essentially comparing probabilities between different batches instead of classes.

In essence, dim=1 ensures that you are working with the dimension representing the different classes, which is what you want for prediction.

I hope this makes things clearer! Let me know if you have any more questions.

[[ 2.6516, -2.0811, 1.0573, 3.9493, 1.1335, 4.5684, -0.2976, -6.9771, 10.9653, -9.9250]] 이게 1,10이야?

spark

You are correct in noticing something seems off. While the shape conceptually is (1, 10) as I previously explained, the way you've presented the values suggests a list of lists (or a nested list):

content_copy

[[ 2.6516, -2.0811,  1.0573,  3.9493,  1.1335,  4.5684, -0.2976, -6.9771, 10.9653, -9.9250]]

Use code with caution

This representation is how a (1, 10) PyTorch tensor would typically look if you printed it directly or converted it to a Python list.

Let's break it down:

Outer List: The outer brackets [ ] represent the main list, in this case, containing just one element. This corresponds to the batch size of 1.
Inner List: The inner brackets [ ] represent a single row within the batch. This row holds 10 values, corresponding to the 10 output classes of your model. These are the raw predictions (logits) for each class.

So, while technically it's a list containing a list with 10 numbers, it does represent the data of a (1, 10) tensor.

To understand it better:

If you were to run this prediction on a batch of, say, 32 images, pred would look like:

content_copy

[[...10 values for image 1...], 
    [...10 values for image 2...],
    ...
    [...10 values for image 32...]]

Use code with caution

When working with PyTorch, you'll almost always interact with pred as a tensor. So, operations like nn.Softmax and argmax work directly on this tensor structure.

Key takeaway: The nested list structure you see is a common way for PyTorch tensors to be displayed when printed. While the format might differ slightly, it effectively represents the shape and values of the underlying tensor (which is (1, 10) in this case). I apologize if my previous explanation caused confusion regarding the visual representation. I hope this detailed explanation helps you understand the structure of pred better.