김성훈 딥러닝 7 - 학습 rate, Overfitting, 일반화

기타/WWW

김성훈 딥러닝 7 - 학습 rate, Overfitting, 일반화

하늘이푸른오늘 2017. 11. 19. 00:30

Lec 07-1 학습 rate, Overfitting, 일반화(Regularization)

https://www.youtube.com/watch?v=1jPjVoDV_uo

Learning_rate : 이제까지는 임의의 값을 사용했음

이 값을 크게 할 경우, 진동하거나 발산(overshooting)할 수 있음.
아주 작은 값을 사용할 경우, 시간이 너무 많이 걸리고, local minimum에서 정지
어떤 값이 좋은가는 특별한 법칙은 없다. 0.01로 시작하고, 나오는 cost 값에 따라서 줄이거나 늘리는 방법을 사용하면 된다.

Data(X)의 전처리. (Gradient descent용)

아래와 같이 x1, x2의 범위가 차이가 크면, 왜곡된 형태가 되어 데이터 처리가 힘들 수 있다.

이 경우, 아래와 같이 중심을 원점에 일치시키거나(zero-centered), 각변수가 차지하는 범위가 비슷하도록(normalize) 해준다.

표준화(Standardization) 방법

$$ x_j^\prime = \frac{x_j - \mu_j}{\sigma_j} $$

python code : X_std[:,0] = (X[:,0] -X[:,0].mean() / X[:,0].std()

Overfitting

학습데이터에는 정말 잘 맞지만, 실제 데이터로는 잘 안맞는 경우. 예를 들어 아래그림에서 모델2는 학습데이터에서는 100%이지만 실 데이터에서는 문제가 생길 수도 있음.

Overfitting을 줄이는 방법

학습데이터를 늘려라
feature의 수를 줄여라
Regularization (일반화)

Weight를 큰 숫자를 사용하지 말라 (구부리지 말라.) 아래와 같이 Cost function 에 $ \lambda \sum W^2 $를 추가하여 Weight 항이 작아질 수록 Cost가 작아지도록 하는 방법을 사용한다. 이때, $ \lambda $ 는 Regularization Strenth라고 하여, 작은 값을 사용하면 일반화를 중요하지 않게 생각한다는 의미이다.

l2reg = 0.001 * tf.reduce_sum(tf.square(W) 를 cost 함수에 추가한다

Lec 07-2 Training/Testing 데이터셋

https://www.youtube.com/watch?v=KVv1nMSlPzY

머신러닝의 성능 평가방법

training set을 사용하여 평가하는 방법? - 거의 100%에 가까울 것임.
70%를 training set으로 하여 학습하여 모델을 만들고, 남은 30%를(test set) 사용하여 평가해야
경우에 따라선 데이터셋을 Training/Validation/Testing 세트로 구분하여 사용.

Validation set는 $\alpha, \lambda $를 평가하는 데 사용

Online Learning : training set가 너무 많아서 한꺼번에 처리할 수 없을 때.

큰 데이터를 일정갯수로 잘라서 학습시킴.
나중에 새로운 데이터가 추가되더라고 갱신만 하면 되는 잇점.

정확도 측정

실제 $Y$ 값과, $ \bar Y$ 를 비교
이미지 인식 분야는 95%-99% 수준에 달함.

Lab 07-1 training/test dataset, learning rate, normalization

https://www.youtube.com/watch?v=oSJfejG2C3w

소스코드 : https://github.com/hunkim/DeepLearningZeroToAl

x_data = [[1, 2, 1],[1, 3, 2],[1, 3, 4],[1, 5, 5],[1, 7, 5],[1, 2, 5],[1, 6, 6],[1, 7, 7]]
y_data = [[0, 0, 1],[0, 0, 1],[0, 0, 1],[0, 1, 0],[0, 1, 0],[0, 1, 0],[1, 0, 0],[1, 0, 0]]

# 학습데이터와 테스트 데이터를 따로 둔다.
x_test = [[2, 1, 1],[3, 1, 2],[3, 3, 4]]
y_test = [[0, 0, 1],[0, 0, 1],[0, 0, 1]]

X=tf.placeholder(tf.float32, [None, 3])
Y=tf.placeholder(tf.float32, [None, 3])
W=tf.Variable(tf.random_normal([3, 3]))
b =tf.Variable(tf.random_normal([3]))

hypothesis = tf.nn.softmax(tf.matmul(X,W) + b)
cost = tf.reduce_mean(-tf.reduce_sum(Y * tf.log(hypothesis), axis =1))
optimizer = tf.train.GradientDescentOptimizer(learning_rate=0.1).minimize(cost)

#Correct prediction Test model
prediction = tf.argmax(hypothesis, 1)
is_correct = tf.equal(prediction, tf.arg_max(Y,1))
accuracy = tf.reduce_mean(tf.cast(is_correct, tf.float32))

# Launch graph
with tf.Session() as sess :
sess.run(tf.global_variables_initializer())
for step in range (201) :
cost_val, W_val, _ = sess.run( [cost, W, optimizer], feed_dict ={X:x_data, Y: y_data})
if (step % 1 == 0) :
print(step, cost_val, W_val)
print("Prediction : ", sess.run(prediction, feed_dict={X: x_test}))
print("Accuracy : ", sess.run(accuracy, feed_dict={X: x_test, Y: y_test}))

Learning_rate의 문제.

위의 프로그램에서 learning_rate를 1.5로 주면 발산한다.
위의 프로그램에서 learning rate를 1e-10으로주면 정지한다.

Non_normalized inputs

아래 그림에서, xy의 3열은 다른 열에 비해 100배 이상이다. 이로 인해, 분포가 그 아래에 있는 것처럼 한쪽만 짜부라든 모양이된다.
이러한 데이터를 그냥 돌려보면, 모델 학습이 안될 가능성이 높다. 밖으로 튀어나가기 십상이기 때문에

Normalized inputs

아래와 같이 MinMaxScaler 등을 이용해서 정규화시키면 분포가 일정해져서 결과가 잘 나온다.

Lab 07-2 Meet MNIST Dataset

https://www.youtube.com/watch?v=ktd5yrki_KA

소스코드 : https://github.com/hunkim/DeepLearningZeroToAll

MNIST 데이터셋이란 - 우편번호 자동 처리를 위해 만들어 놓은 손글씨 모음 데이터셋

28*28 =784 픽셀.

import tensorflow as tf
import matplotlib.pyplot as plt
import random

# Tensorflow에서 만들어둔 라이브러리
# 자세한 내용은 https://www.tensorflow.org/tutorials/layers
from tensorflow.examples.tutorials.mnist import input_data

mnist = input_data.read_data_sets("MNIST_data/", one_hot=True)

# 처음 실행하면 지정한 폴더에 데이터가 다운로드됨

nb_classes = 10

X=tf.placeholder(tf.float32, [None, 784]) # 28*28 이 여러개 있음.
Y=tf.placeholder(tf.float32, [None, nb_classes]) # one-hot 데이터. 클래스(nb_classes)는 10개
W=tf.Variable(tf.random_normal([784, nb_classes]))
b=tf.Variable(tf.random_normal([nb_classes]))

# hypothesis : softmax를 사용
hypothesis = tf.nn.softmax(tf.matmul(X, W) + b)

# cross entropy cost
cost = tf.reduce_mean(-tf.reduce_sum(Y * tf.log(hypothesis), axis=1))
optimizer = tf.train.GradientDescentOptimizer(learning_rate=0.1).minimize(cost)

# 정확도 테스트
is_correct = tf.equal(tf.argmax(hypothesis, 1), tf.argmax(Y,1)) # one hot 값과 hypotheis 값 일치?

accuracy = tf.reduce_mean(tf.cast(is_correct, tf.float32))

# Training epoch/batch
# epoch란 전체 training 데이터셋을 한번 돈 것.
# batch size란 한번에 training 시키는 데이터의 크기. 클수록 메모리 소요가 높아짐
# 예를들어 training 데이터가 1000개 이고, batch size가 500 이면, 1 epoch를 완수하는데 2번의 반복
# 아래와 같이 학습하는 것이 일반적인 절차임

training_epochs = 15
batch_size=100

with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
for epoch in range(training_epochs): #15회
      avg_cost = 0
      total_batch = int(mnist.train.num_examples / batch_size)
      for i in range(total_batch):
            batch_xs, batch_ys = mnist.train.next_batch(100) # 100개씩 읽어들임.
          c, _ = sess.run([cost, optimizer], feed_dict={X:batch_xs, Y: batch_ys})
          avg_cost += c / total_batch
        print('Epoch : ', '%04d' % (epoch +1), 'cost =', '{:.9f}'.format(avg_cost))
    print("Accuracy: ", accuracy.eval(session= sess,
feed_dict = {X: mnist.test.images, Y: mnist.test.labels})) #학습에 사용하지 않은 데이터..
#그림 그려보기
#하나를 가져와서 예측해보자
r=random.randint(0,mnist.test.num_examples -1)
print("Label:", sess.run(tf.argmax(mnist.test.labels[r : r+1],1)))
print("Prediction : ", sess.run(tf.argmax(hypothesis,1), feed_dict={X: mnist.test.images[r:r+1]}))
plt.imshow(mnist.test.images[r:r+1].reshape(28,28), cmap='Greys', interpolation = 'nearest')
plt.show()

실행결과

Epoch : 0001 cost = 2.890159705
Epoch : 0002 cost = 1.087193211
Epoch : 0003 cost = 0.857237154
Epoch : 0004 cost = 0.751001975
----
Epoch : 0014 cost = 0.475028615
Epoch : 0015 cost = 0.465113584
Accuracy: 0.8902

현재글김성훈 딥러닝 7 - 학습 rate, Overfitting, 일반화

이미지 생성 AI, 구글어스, 위성영상, 3D City, GPS, 3D 빌딩, street view, Drone, 드론, 스테이블 디퓨전, 구글, 스트릿뷰, Quadcopter, google, 지오캐싱, Stable Diffusion, 인공지능 이미지, 3D모델, Geocaching, Google Earth,

Today :
Yesterday :

공간정보와 인터넷지도