김성훈 딥러닝 6 - Softmax Regression

기타/WWW

김성훈 딥러닝 6 - Softmax Regression

하늘이푸른오늘 2017. 11. 17. 17:11

Lec 6-1 Softmax Regression 기본개념

https://www.youtube.com/watch?v=MFAnsx1y9ZI

복습

H(x) = WX 와 같이 Linear Regression으로부터 출발한다. 그러나, 이런 $WX$ 형태의 단점은, 출력이 $-\infty \lt H_L(x) \lt~\infty$ 이므로, 0이냐 1이냐를 고르는 문제에서는 적합하지 않다.
그래서 $z = H_L (X)$라고 놓고, 이 값을 0부터 1로 압축할 수 있는 $g(z)$ 함수를 사용하여 해결한다.

이에 가장 적합한 $g(x)$는 sigmoid라고 하는 $g(z) = \frac {1}{1+e^{-z}}$ 이다.
이를 적용했을 때의 Hypothesis는 $H_R (X) = g(H_L (X))$ 가 된다.
수식이 많아지므로, 이를 그림으로 나타낸다. 박스에서 S로 그린 것은 Sigmoid 함수를 통과시킨다는 의미이다.

Logistic Regression의 의미는 변수가 $x_1, x_2$ 두개가 있다고 할 때, 아래와 같이 분리시키는 적절한 선을 찾는 것이다.

이와 비슷한 아이디어를 Multimomial Classification에도 적용할 수 있다. 즉 아래와 같이 3개로 분류를 할때, A or not, B or not, C or not 과 같이 3개의 Logistic Regression을 적용하면 된다.

가설은 기본적으로 $ \begin{bmatrix}w_1 &w_2 &w_3 \end{bmatrix} {\begin{bmatrix}x_1 &x_2 &x_3 \end{bmatrix} }^T = \begin{bmatrix}w_1 x_1 &w_2 x_2 &w_3 x_3 \end{bmatrix} $ 이다. Multinomial Classification은 세개의 Logistic Classification 의 결합이므로, 3개의 식이 있으면 된다.

이를 하나의 식으로 합치면 아래와 같다.

$$ \begin{bmatrix}w_A1 &w_A2 &w_A3 \\ w_B1 &w_B2 &w_B3 \\ w_C1 &w_C2 &w_C3 \end{bmatrix} {\begin{bmatrix}x_1 \\ x_2 \\ x_3 \end{bmatrix} } = \begin{bmatrix}w_A1 x_1 &w_A2 x_2 &w_A3 x_3 \\ w_B1 x_1 &w_B2 x_2 &w_B3 x_3 \\ w_C1 x_1 &w_C2 x_2 &w_C3 x_3 \end{bmatrix} = \begin{bmatrix} \bar{y_A} = H_A (X) \\ \bar{y_B} = H_B (X)\\ \bar{y_C} = H_C (X) \end{bmatrix} $$

이제 $\bar{Y_A}$ $\bar{Y_B}$ $\bar{Y_C}$ 에 각각 sigmoid 함수를 적용하면 되지만...

Lec 6-2 Softmax Classifier의 cost 함수

https://www.youtube.com/watch?v=jMU9G5WEtBc

Logistic Classifier를 사용하면, 출력이 $ {\begin{bmatrix} 2.0 & 1.0 & 0.1 \end{bmatrix}}^T $ 와 같은 형태가 되는데, 이것보다 $ 0 \lt \bar{y} \lt 1 $ 이고, 그 값의 합이 1이 되는 (확률처럼) 되는 것이 좋다. 이것이 Softmax 함수이다.

Softmax 함수 : $$ S(y_i ) = \frac {e^{y_i}}{\displaystyle \sum_{j}^{} e^{y_j}} $$
적용방법 : $ \bar y $에 소프트맥스를 적용한 $ S(y) $ 하면 확률이 되는데, 여기에 "One-Hot encoding"을 적용하여 (Tensorflow에서는 argmax) 하나를 선택한다.

Cost 함수...는 Cross-Entropy 함수 : $ \displaystyle - \sum_{i} L_i \log(S_i ) $를 사용함.

이 함수가 적절한 함수인가? $\displaystyle -\sum_{i} L_i \log(S_i ) = \sum_{i} (L_i ) * ( -\log (\bar {y_i}) ) $ 에서 맨 오른쪽은 예전에 본 것처럼, $\bar {y_i}$가 1일때는 0이 되고, $\bar {y_i}$가 0이면 매우 큰수가 된다.
이야기를 간단하기 위해, $Y=L= (\begin{bmatrix} 0, 1 \end{bmatrix})^T $ 와 같이 A,B 둘중 어느것인지를 고르는 문제라고 하자. (이 경우엔 실측이 B임)

$\bar{Y}= (\begin{bmatrix} 0, 1 \end{bmatrix})^T $ 로 예측된 경우(예측이 맞음)

$\begin{bmatrix} 0\\ 1 \end{bmatrix} \bigodot -\log \begin{bmatrix} 0\\ 1 \end{bmatrix} = \begin{bmatrix} 0\\ 1 \end{bmatrix} \bigodot \begin{bmatrix} \infty\\ 0 \end{bmatrix} = 0$ 이 된다. cost가 작으므로 OK.

$\bar{Y}= (\begin{bmatrix} 1, 0 \end{bmatrix})^T $ 로 예측된 경우(예측이 틀림)

$\begin{bmatrix} 0\\ 1 \end{bmatrix} \bigodot -\log \begin{bmatrix} 1\\ 0 \end{bmatrix} = \begin{bmatrix} 0\\ 1 \end{bmatrix} \bigodot \begin{bmatrix} 0 \\ \infty \end{bmatrix} = \infty $ 이 된다. cost가 크므로 OK.

$Y=L= (\begin{bmatrix} 1, 0 \end{bmatrix})^T $ 일 경우에도 마찬가지임.

Losistic의 cost 함수와 Cross-entropy 함수가 모양으로는 달라보이지만, 실제로는 동등함.

여러개의 training set이 있을 때 cost 함수는...전체의 거리를 구한뒤 합하여 평균하면 된다.

$$ L = {1 \over N} \displaystyle \sum_{i} D( S (W X_i + b) , L_i ) $$

Gradient descent 를 사용함. (cost 함수가 concave 하므로) 미분 $ - \alpha \Delta L (w_i , w_w )$ 은 구하지 않는다.

Lab 06-1 TensorFlow로 Softmax Classification 구현하기

https://www.youtube.com/watch?v=VRnubDzIy3A

소스코드 : https://github.com/hunkim/DeepLearningZeroToAll
Softmax를 Tensorflow로 구현하려면... tenforflow 함수인 softmax를 쓰면 됨.

Cost function 을 구현하려면.. $ Y \log( \bar Y ) $를 사용하면 됨.
Cost 최소화는 동일.

import tensorflow as tf

x_data = [[1,2,1,1], [2,1,3,2], [3,1,3,4], [4,1,5,5], [1,7,5,5], [1,2,5,6], [1,6,6,6], [1,7,7,7]]

y_data = [[0,0,1], [0,0,1], [0,0,1], [0,1,0], [0,1,0], [0,1,0], [1,0,0], [1,0,0]]

X= tf.placeholder("float", [None, 4])
Y= tf.placeholder("float", [None, 3])
nb_classes=3

W = tf.Variable(tf.random_normal([4, nb_classes]), name = 'weight')
b = tf.Variable(tf.random_normal([nb_classes]), name = 'bias')

# tf.nn.softmax computes softmax activations
# softmax = exp(Logits) / reduce_sum(exp(Logits), dim)
hypothesis = tf.nn.softmax(tf.matmul(X, W) + b)

# cross entropy cost
cost = tf.reduce_mean(-tf.reduce_sum(Y * tf.log(hypothesis), axis=1))
optimizer = tf.train.GradientDescentOptimizer(learning_rate=0.1).minimize(cost)

with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
for step in range(2001):
      sess.run(optimizer, feed_dict = {X:x_data, Y:y_data})
      if step % 200 ==0:
print(step, sess.run(cost, feed_dict = {X:x_data, Y:y_data}))
    a = sess.run(hypothesis, feed_dict={X:[[1,11,7,9], [1,3,4,3], [1,1,0,1]] })
print(a, sess.run(tf.argmax(a,1))) #arg_max is deprecated.

#[[ 4.16852683e-02 9.58304107e-01 1.06291027e-05]
# [ 6.28286779e-01 3.29489440e-01 4.22237590e-02]
# [ 1.50573225e-08 3.95991723e-04 9.99604046e-01]] [1 0 2]

Lab 06-2 TensorFlow로 Fancy Softmax Classification 구현하기

https://www.youtube.com/watch?v=E-io76NlsqA

소스코드 : https://github.com/hunkim/DeepLearningZeroToAll

복습 : Softmax 구현방법

다음과 같은 형태로 적용하였음

hypothesis = tf.nn.softmax(tf.matmul(X, W) + b)
cost = tf.reduce_mean(-tf.reduce_sum(Y* tf.log(hypothesis), axis =1))

softmax_cross_entropy_with_logits 을 사용하는 방법

cost = tf.reduce_mean(-tf.reduce_sum(Y* tf.log(hypothesis), axis =1)) 대신으로 간단하게 사용하여 깔끔하게 다음처럼 사용 (여기에서 Y는 one_hot이어야 함)
logits = tf.matmul(X,W) +b 이라고 정의하고 이를 사용하여 처리함.
hypothesis = tf.nn.softmax(logits)

cost_i = tf.nn.softmax_cross_entropy_with_logits(logits = logits, labels = Y_one_hot)
cost = tf.reduce_mean(cost_i)

예제 : 동물의 분류

표에서 마지막 열은 분류(0-6), 나머지는 특성(다리의 수, 털의 여부, 알을 낳는지 여부 등등..)

읽어보는 방법은 동일

xy = np.loadtxt('data-04-zoo.csv', delimiter=',', dtype=np.float32)
x_data = xy[:, 0:-1]
y_data = xy[:, {-1]]

Y data에 대해서 알아보기

Y= tf.placeholder("tf.int32", [None, 1]) # Y의 column 수는 1임. 그런데, 파일에서 읽은 데이터는 $ {\begin{bmatrix} 0, 3, 0, 0, ..., 5 \end{bmatrix}}^T $ 으로서, 우리가 원하는 one_hot 데이터가 아님.
Y_one_hot = tf.one_hot(Y, nb_classes) # nb_classes=7 (0부터 6까지). 그런데 이렇게 실행시키면 에러가 발생. tf.one_hot은 입력의 rank가 N 이면, 출력의 rank가 N+1이 된다.

예를 들어 data가 $ \begin{bmatrix} 0, 3 \end{bmatrix} $ 이라고 하면(shape는 none,1) ... one_hot의 출력은 $ \begin{bmatrix} \begin{bmatrix} \begin{bmatrix} 1 0 0 0 0 0 0 \end{bmatrix} \end{bmatrix} \begin{bmatrix} \begin{bmatrix} 0 0 0 1 0 0 0 \end{bmatrix} \end{bmatrix} \end{bmatrix} $ (Shape는 [none, 1, 7]) 이 된다.
그래서 reshape를 통해 Shape가 [none, 7]인 형태, 즉, $ \begin{bmatrix} \begin{bmatrix} 1 0 0 0 0 0 0 \end{bmatrix} \begin{bmatrix} 0 0 0 1 0 0 0 \end{bmatrix} \end{bmatrix} $ 로 바꿔주어야 한다.

Y_one_hot = tf.reshape(Y_one_hot, [-1, nb_classes])

import tensorflow as tf
import numpy as np

xy = np.loadtxt('data-04-zoo.csv', delimiter=',', dtype=np.float32)
x_data = xy[:, 0:-1]
y_data = xy[:, [-1]]

nb_classes =7
X= tf.placeholder(tf.float32, [None, 16])
Y= tf.placeholder(tf.int32, [None, 1])
Y_one_hot = tf.one_hot(Y, nb_classes) # one_hot 으로 바꿔줌.
Y_one_hot = tf.reshape(Y_one_hot, [-1, nb_classes])

W = tf.Variable(tf.random_normal([16, nb_classes]), name = 'weight')
b = tf.Variable(tf.random_normal([nb_classes]), name = 'bias')

logits = tf.matmul(X, W) +b
hypothesis = tf.nn.softmax(logits)

cost_i = tf.nn.softmax_cross_entropy_with_logits(logits = logits, labels = Y_one_hot)
cost = tf.reduce_mean(cost_i)

optimizer = tf.train.GradientDescentOptimizer(learning_rate=0.1).minimize(cost)

prediction = tf.argmax(hypothesis, 1) # 예측한 값
correct_prediction = tf.equal(prediction, tf.argmax(Y_one_hot, 1)) #원래의 값
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))

with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
for step in range(2001):
      sess.run(optimizer, feed_dict = {X:x_data, Y:y_data})
      if step % 200 ==0:
          loss, acc = sess.run([cost, accuracy], feed_dict = {X:x_data, Y:y_data})
print("Step: {:5}\tLostt: {:.3f}\tAcc: {:.2%}".format(step, loss, acc))
pred = sess.run(prediction, feed_dict= {X: x_data})
for p, y in zip(pred, y_data.flatten()) :
print("[{}] Prediction : {} True Y: {}".format(p == int(y), p, int(y)))

현재글김성훈 딥러닝 6 - Softmax Regression

구글어스, 위성영상, 이미지 생성 AI, GPS, Google Earth, google, 인공지능 이미지, 스트릿뷰, 3D 빌딩, 드론, street view, 3D City, 지오캐싱, Geocaching, Quadcopter, 3D모델, Stable Diffusion, 구글, 스테이블 디퓨전, Drone,

Today :
Yesterday :

공간정보와 인터넷지도