28일차[ Torchvision/ Image Preprocessing, CNN ]

SK Networks Family AI bootcamp 강의노트

28일차[ Torchvision/ Image Preprocessing, CNN ]

HyunJung_Jo 2025. 2. 24. 12:46

Image Preprocessing 수업 자료

https://colab.research.google.com/drive/14bX4CaRRphMNHjzcMpEcgMnetsO2yOy8

Google Colab Notebook

Run, share, and edit Python notebooks

colab.research.google.com

Image Preprocessing 따라 친 자료

https://colab.research.google.com/drive/1iq8x-O7DGu1dVCRTzdRjM6oCZc5sZcXq

Google Colab Notebook

Run, share, and edit Python notebooks

colab.research.google.com

이미지 전처리 기법

클래식 이미지 전처리
1. 행태학적 (확장/침식- 커널이 돌아가면서 처리)
2. 가우시안 함수 (블러링)
3. 엣지 검출 (경계 찾기- 밝기 불연속성 감지,작동)
jpg vs png 차이
1. JPG (JPEG)
  - Uses lossy compression, discarding some image data to reduce file size.
  - Generally smaller file size, ideal for web use and storage.
  - Best for photographs, realistic images with smooth color transitions.
  - Does not support transparency.
  PNG
  - Uses lossless compression, preserving all image data.
  - Generally larger file size due to lossless compression.
  - Best for images with sharp lines, text, logos, and graphics.
  - Supports transparency, making it suitable for graphic design.
TorchVision vs PIL
1. Torchvision transforms: random preprocessing, 원본 변경 안됨
2. PIL : 물리적으로 수 늘리기 가능
3. torchvision.transforms:
  - 주요 초점: PyTorch 모델을 위한 이미지 전처리
  - 데이터 유형: 주로 PyTorch 텐서에서 작동
  - 기능: 모델 학습을 위한 일반적인 이미지 변환 제공 (크기 조정, 자르기, 정규화, 데이터 증강 등)
  - 통합: PyTorch 데이터 로더와의 원활한 통합
  - 성능: 텐서 연산 및 GPU 가속에 최적화
  - 유연성: 고도로 전문화된 이미지 처리 작업에는 제한적일 수 있음
  PIL (Pillow):
  - 주요 초점: 범용 이미지 처리
  - 데이터 유형: PIL Image 객체에서 작동
  - 기능: 이미지 편집, 조작 및 분석을 위한 광범위한 기능 제공
  - 통합: PyTorch와 함께 사용할 수 있지만 텐서로 변환 필요
  - 성능: 대규모 작업에서 속도가 느릴 수 있음
  - 유연성: 다양한 이미지 편집 요구에 맞게 매우 유연하고 사용자 정의 가능
  요약:
  - PyTorch 딥러닝 모델에서 사용할 이미지를 준비하는 것이 주요 목표라면 torchvision.transforms 가 적합합니다. 원활한 통합, 데이터 증강에 중점, 성능 최적화가 장점입니다.
  - 모델 학습 이외의 작업을 위해 더 광범위한 이미지 처리 기능이 필요하거나 고도로 사용자 정의된 이미지 조작이 필요한 경우 PIL (Pillow) 이 적합합니다. 다양성과 유연성을 제공합니다.
  - torchvision.transforms와 PIL (Pillow)은 모두 Python에서 이미지 처리에 널리 사용되는 라이브러리이지만, 목적과 강점이 약간 다릅니다.

# prompt: torch transforms로 다양한 이미지 전처리 하는 함수

import torchvision.transforms as T

def transform_image(image_path, transforms_list):
    """
    Applies a list of torchvision transforms to an image.

    Args:
        image_path: Path to the image file.
        transforms_list: A list of torchvision transforms.

    Returns:
        A list of transformed images.
    """
    try:
        orig_img = Image.open(image_path).convert('RGB')
        transformed_images = []
        for transform in transforms_list:
          transformed_images.append([transform(orig_img.copy()) for _ in range(3)])  
        return orig_img, transformed_images
    except FileNotFoundError:
        print(f"Error: Image file not found at {image_path}")
        return None, None
    except Exception as e:
        print(f"An error occurred: {e}")
        return None, None

# Example usage
img_path = "/content/drive/MyDrive/Colab Notebooks/SKNetworks_10기/강의/2. DL/2.Vision/1.Image Preprocessing/MILO.jpg"

# Define a list of transformations
transforms_to_apply = [
    T.Pad(padding=10, fill=0),
    T.Resize(size=128),
    T.CenterCrop(size=64),
    T.Grayscale(num_output_channels=3),
    T.ColorJitter(brightness=0.5, contrast=0.5, saturation=0.5, hue=0.5),
    T.GaussianBlur(kernel_size=(5,9), sigma=(0.1, 2.0)),
    T.RandomPerspective(distortion_scale=0.6, p=1.0),
    T.RandomRotation(degrees=(-45,70)),
    T.RandomAffine(degrees=(-45,45), translate=(0.1,0.1), scale=(0.8,1.2))
]

# Apply transforms and plot results
orig_img, transformed_imgs = transform_image(img_path, transforms_to_apply)


if orig_img and transformed_imgs:
  plot(orig_img, transformed_imgs, row_title=[transform.__class__.__name__ for transform in transforms_to_apply])

# prompt: # PIL PREPROCESSING IN ONE PROCESS

# Assuming the necessary libraries are already imported and the image path is defined as before.

def transform_image_pil(image_path):
    """
    Applies a series of PIL transformations to an image.

    Args:
        image_path: Path to the image file.

    Returns:
        A list of transformed images.
    """
    try:
        orig_img = Image.open(image_path).convert('RGB')
        transformed_images = []

        # Cropping
        box = (100, 100, 400, 400)
        cropped_img = orig_img.crop(box)
        transformed_images.append(cropped_img)

        # Rotation
        rotated_img = orig_img.rotate(45)
        transformed_images.append(rotated_img)

        # Pasting another image (assuming 'logo' image is loaded)
        logo = Image.open("/content/drive/MyDrive/Colab Notebooks/SKNetworks_10기/강의/2. DL/2.Vision/1.Image Preprocessing/logo_pillow.png")
        img_copy = orig_img.copy()
        position = (40, 350)
        img_copy.paste(logo, position)
        transformed_images.append(img_copy)

        # Grayscale conversion
        gray_img = orig_img.convert('L')
        transformed_images.append(gray_img)

        # Image Enhancements (example: sharpness)
        enhancer = ImageEnhance.Sharpness(orig_img)
        sharpened_img = enhancer.enhance(10)
        transformed_images.append(sharpened_img)

        # Image Filtering (example: blur)
        blurred_img = orig_img.filter(ImageFilter.BLUR)
        transformed_images.append(blurred_img)

        return orig_img, transformed_images

    except FileNotFoundError:
        print(f"Error: Image file not found at {image_path}")
        return None, None
    except Exception as e:
        print(f"An error occurred: {e}")
        return None, None


img_path = "/content/drive/MyDrive/Colab Notebooks/SKNetworks_10기/강의/2. DL/2.Vision/1.Image Preprocessing/MILO.jpg"
orig_img, transformed_imgs = transform_image_pil(img_path)


if orig_img and transformed_imgs:
    # Assuming 'plot' function from previous code is defined. 
    # Replace with suitable plotting method if necessary.
    plot(orig_img, transformed_imgs, row_title=["Cropped", "Rotated", "Pasted", "Grayscale", "Sharpened", "Blurred"])

CNN

https://colab.research.google.com/drive/1M-lJ5Mw6OxE0CFeuWtCpqCPPcPzeAw2N#scrollTo=WbRa7CIV6jeu

Google Colab Notebook

Run, share, and edit Python notebooks

colab.research.google.com

질문

처음 hidden_units default value 32로 정했는데 왜 model layer input 에선 channel value가 3으로 보이는가? 그리고 output shape에서는 6으로 왜 변하는가?

# todo: 그림이랑 어떻게 다른 지 봐야 함
class CNNModel(nn.Module):
  def __init__(self,color_size,target_size, hidden_units=32)->None:
    # 요약하면, hidden_units는 CNN 모델의 복잡도와 표현력을 조절하는 
    # 중요한 매개변수이며, 모델이 이미지에서 추출하는 특징의 개수를 결정합니다
    super().__init__()

    self.block_1 = nn.Sequential(
        nn.Conv2d(in_channels=color_size, 
                  out_channels=hidden_units, 
                  kernel_size=3,stride=1,padding=1),
        nn.ReLU(),
        nn.MaxPool2d(kernel_size=2, stride=2,)
    )
    self.block_2 = nn.Sequential(
        nn.Conv2d(
            in_channels=hidden_units,
            out_channels=hidden_units*2,
            kernel_size=3,
            stride=1,
            padding=1
        ),
        nn.ReLU(),
        nn.MaxPool2d(kernel_size=2, stride=2),
    )

    dummy_input = torch.randn(1,color_size,-1,-1)
    out = self.bloc_1(dummy_input)
    out = self.block_2(out)
    in_features = out.shape[1] * out.shape[2] * out.shape[3]

    self.FC = nn.Sequential(  
    nn.Flatten(),
    nn.Linear(in_features=in_features, 
              out_features=in_features),
    nn.ReLU(),
    nn.Linear(in_features=in_features, 
              out_features=target_size),
    nn.ReLU()
    )
    # self.FC = nn.Sequential(  
    #     nn.Flatten(),
    #     nn.Linear(in_features=hidden_units*2*16*16, 
    #               out_features=hidden_units*2*16*16),
    #     nn.ReLU(),
    #     nn.Linear(in_features=hidden_units*2*16*16, 
    #               out_features=target_size),
    #     nn.ReLU()
    # )
  def forward(self, x):
    out = self.block_1(x)
    out = self.block_2(out)
    return self.FC(out)

===================================================================================================================
Layer (type:depth-idx)                   Kernel Shape              Input Shape               Output Shape
===================================================================================================================
CNNModel                                 --                        [32, 3, 64, 64]           [32, 3]
├─Sequential: 1-1                        --                        [32, 3, 64, 64]           [32, 3, 32, 32]
│    └─Conv2d: 2-1                       [3, 3]                    [32, 3, 64, 64]           [32, 3, 64, 64]
│    └─ReLU: 2-2                         --                        [32, 3, 64, 64]           [32, 3, 64, 64]
│    └─MaxPool2d: 2-3                    2                         [32, 3, 64, 64]           [32, 3, 32, 32]
├─Sequential: 1-2                        --                        [32, 3, 32, 32]           [32, 6, 16, 16]
│    └─Conv2d: 2-4                       [3, 3]                    [32, 3, 32, 32]           [32, 6, 32, 32]
│    └─ReLU: 2-5                         --                        [32, 6, 32, 32]           [32, 6, 32, 32]
│    └─MaxPool2d: 2-6                    2                         [32, 6, 32, 32]           [32, 6, 16, 16]
├─Sequential: 1-3                        --                        [32, 6, 16, 16]           [32, 3]
│    └─Flatten: 2-7                      --                        [32, 6, 16, 16]           [32, 1536]
│    └─Linear: 2-8                       --                        [32, 1536]                [32, 1536]
│    └─ReLU: 2-9                         --                        [32, 1536]                [32, 1536]
│    └─Linear: 2-10                      --                        [32, 1536]                [32, 3]
│    └─ReLU: 2-11                        --                        [32, 3]                   [32, 3]
===================================================================================================================
Total params: 2,365,695
Trainable params: 2,365,695
Non-trainable params: 0
Total mult-adds (Units.MEGABYTES): 92.21
===================================================================================================================
Input size (MB): 1.57
Forward/backward pass size (MB): 5.11
Params size (MB): 9.46
Estimated Total Size (MB): 16.15
============================================================================================================

hidden_units는 CNN 모델의 은닉 레이어에 있는 특징(feature)의 개수를 나타냅니다. 쉽게 말해, CNN이 이미지에서 정보를 추출하고 분석할 때 사용하는 중간 단계의 필터(filter) 개수라고 생각할 수 있습니다.

더 자세히 설명하면 다음과 같습니다.

합성곱 레이어(nn.Conv2d)에서 out_channels 값으로 사용됩니다.
- 첫 번째 합성곱 블록(block_1)에서는 입력 이미지에서 hidden_units 개수만큼의 특징을 추출합니다. 즉, 이미지의 특징을 나타내는 hidden_units 개의 필터를 사용합니다.
- 두 번째 합성곱 블록(block_2)에서는 첫 번째 블록에서 추출된 특징들을 입력으로 받아, hidden_units * 2 개수만큼의 더 많은 특징을 추출합니다. 이는 네트워크가 더 복잡하고 추상적인 특징을 학습하도록 도와줍니다.
완전 연결 레이어(nn.Linear)에서 입력 및 출력 크기를 결정하는 데 사용됩니다.
- nn.Flatten() 이후 첫 번째 nn.Linear 레이어는 hidden_units * 2 * 16 * 16 크기의 입력을 받아 같은 크기의 출력을 생성합니다. 여기서 16x16은 두 번의 Max Pooling을 거친 후 특징 맵의 크기를 나타냅니다.
- 두 번째 nn.Linear 레이어는 이전 레이어의 출력을 받아 최종적으로 target_size (예측할 클래스 수) 크기의 출력을 생성합니다.

즉, hidden_units는 CNN 모델이 이미지에서 얼마나 다양하고 복잡한 특징을 추출할지를 결정하는 중요한 역할을 합니다. hidden_units 값이 클수록 모델은 더 많은 특징을 학습할 수 있지만, 계산량이 증가하고 과적합(overfitting)의 위험이 커질 수 있습니다. 따라서 적절한 hidden_units 값을 찾는 것은 모델 성능에 중요한 영향을 미칩니다.

요약하면, hidden_units는 CNN 모델의 복잡도와 표현력을 조절하는 중요한 매개변수이며, 모델이 이미지에서 추출하는 특징의 개수를 결정합니다.

The provided code defines a CNN model (CNNModel) that takes RGB images (3 color channels) as input and aims to classify them into a specified number of target categories.

The code does the following:

Device Selection: It checks if a CUDA-enabled GPU is available and assigns the model to the appropriate device ('cuda' or 'cpu').
Model Creation: It creates an instance of the CNNModel with parameters for color channels (3 for RGB) and the number of target classes.
Input Size: It defines the shape of the input data, including batch size, channels, height, and width.
Model Summary: It uses torchinfo.summary to print an overview of the model's architecture, including layer types, kernel sizes, input and output shapes, and the total number of parameters.

Key Points

Output Channels Change: The output channels of the first convolutional layer change from 3 (RGB) to the number of hidden_units you specify. This is because the convolutional layers extract features from the input, creating feature maps. The number of output channels determines the number of feature maps generated.
Why Change Channels: Increasing the number of channels allows the network to learn more complex and abstract representations of the input data, beyond basic color information.
Model Complexity: The hidden_units parameter controls the model's complexity. More hidden units generally mean a more complex model with potentially higher accuracy but also a higher risk of overfitting.

In essence, the code sets up a CNN model, defines its input, and provides a summary of its structure to help understand how it processes data and extracts features for classification.

I hope this concise summary is helpful! Let me know if you have any further questions.

'SK Networks Family AI bootcamp 강의노트' 카테고리의 다른 글

31일차 [ local에서 모델 돌리기 ] (0)	2025.02.27
29일차 [ CNN, Vgg, ResNet ] (0)	2025.02.25
27일차 [ earlystopping ] (0)	2025.02.21
26일차 [ Multi Classification Model (pytorch) (중요)] (0)	2025.02.20
25일차 [ 선형회귀모델 ] (0)	2025.02.19

현재글28일차[ Torchvision/ Image Preprocessing, CNN ]

끄적끄적 인생기

Today :
Yesterday :

끄적끄적 인생기