ABOUT ME

-

Today
-
Yesterday
-
Total
-
  • [๊ธฐ๊ณ„ํ•™์Šต] 12. Dimensionality Reduction
    ๐ŸณDev/Machine Learning 2022. 1. 1. 14:12
    ์ถฉ๋‚จ๋Œ€ํ•™๊ต์˜ ๊น€๋™์ผ ๊ต์ˆ˜๋‹˜์˜ ๊ธฐ๊ณ„ํ•™์Šต ์ˆ˜์—…์„ ๊ธฐ๋ฐ˜์œผ๋กœ ์ •๋ฆฌํ–ˆ์Šต๋‹ˆ๋‹ค.

     

    ์˜ค๋Š˜์€ ์ฐจ์› ์ถ•์†Œ๋ผ๊ณ  ๋ถˆ๋ฆฌ๋Š” Dimensionality Reduction์— ๋Œ€ํ•ด์„œ ์•Œ์•„๋ณด์ž.

     

    1. Dimensionality Reduction, ์ฐจ์› ์ถ•์†Œ

    1) Curse of Dimensionality, ์ฐจ์›์˜ ์ €์ฃผ

    ๊ฐ‘์ž๊ธฐ ์ฐจ์›์˜ ์ €์ฃผ..๋ผ๋‹ˆ ๋ฌด์Šจ ๋ง ์ผ๊นŒ?

     

    ์ผ๋‹จ dimension์€ ์ผ๋ฐ˜์ ์œผ๋กœ feature(input)์˜ ์ˆ˜๋ฅผ ์˜๋ฏธํ•œ๋‹ค.

    ๋™์‹œ์— ๋†’์€ ์ฐจ์›์˜ ์˜์—ญ์„ ์„ค๋ช…ํ•˜๋Š” ๊ฒƒ์€ ์ง€์ˆ˜์ ์œผ๋กœ ๋งŽ์€ ๋ฐ์ดํ„ฐ๊ฐ€ ํ•„์š”ํ•˜๋‹ค.

     

    ์˜ˆ๋ฅผ ๋“ค์–ด 3์ฐจ์› ๊ณต๊ฐ„์„ ์ตœ์†Œ 8(2^3)๊ฐœ์˜ ๋ฐ์ดํ„ฐ๊ฐ€ ๊ณจ๊ณ ๋ฃจ ์ปค๋ฒ„ํ•  ์ˆ˜ ์žˆ๋‹ค. ๋ฐ˜๋Œ€๋กœ ์ƒ๊ฐํ•˜๋ฉด 8๊ฐœ ๋ฐ–์— ์—†๋Š” ๋ฐ์ดํ„ฐ๋ฅผ 100์ฐจ์›์—์„œ ๋‹ค๋ฃฌ๋‹ค๋Š” ๊ฒƒ์€, ์˜์—ญ์„ ๊ณจ๊ณ ๋ฃจ ์‚ฌ์šฉํ•˜์ง€ ๋ชปํ•˜๋Š” ๊ฒƒ์„ ์˜๋ฏธํ•œ๋‹ค. 8๊ฐœ์˜ ๋ฐ์ดํ„ฐ๊ฐ€ 100์ฐจ์›์—์„œ ๋งค์šฐ ๊ตญ์†Œํ•œ ์˜์—ญ์„ ์ฐจ์ง€ํ•˜๋ฉฐ, ๊ทธ ๋ง์€ 100๊ฐœ์˜ ๋ณ€์ˆ˜๊ฐ€ ์ „๋ถ€ ๋ฐ์ดํ„ฐ์— ์˜ํ–ฅ์„ ์ฃผ๋Š” ๊ฒƒ์ด ์•„๋‹ˆ๋ผ๋Š” ๊ฒƒ์ด๋‹ค.

    ๊ทธ๋Ÿฌ๋‹ˆ ์˜ํ–ฅ์„ ์ฃผ๋Š” ๋ณ€์ˆ˜๋งŒ ๋‹ค๋ฃฐ ์ˆ˜ ์žˆ๋„๋ก, ๋ณ€์ˆ˜์ด์ž ์ฐจ์›์„ ์ตœ๋Œ€ํ•œ ์ถ•์†Œํ•˜์—ฌ ์ตœ๋Œ€ํ•œ compactํ•˜๊ณ  denseํ•œ ํ™˜๊ฒฝ์—์„œ ๋ชจ๋ธ๋ง์„ ํ•ด์•ผ, ์ „์ฒด ์˜์—ญ์— ๋Œ€ํ•ด ๋ชจ๋ธ์ด ๊ณจ๊ณ ๋ฃจ ์ดํ•ดํ•  ์ˆ˜ ์žˆ๋‹ค.

     

    ๋ณ€์ˆ˜๊ฐ€ ๋งŽ์„ ๋•Œ(์ฐจ์›์ด ๋†’์„ ๋•Œ) ์–ด๋–ค ๋ฌธ์ œ๊ฐ€ ๋ฐœ์ƒํ•˜๋Š”์ง€ ์•Œ์•„๋ณด์ž.

    • ๋ถˆํ•„์š”ํ•œ ๋ณ€์ˆ˜๊ฐ€ ์กด์žฌ
    • ์ค‘๋ณต๋˜๋Š” ๋ณ€์ˆ˜๊ฐ€ ์กด์žฌ
    • Overfitting : ์˜๋ฏธ์—†๋Š” ๋ณ€์ˆ˜์— ์˜๋ฏธ ๋ถ€์—ฌ
    • Computational Cost

    ์ฆ‰, ์ค‘์š”ํ•œ ๋ณ€์ˆ˜๋“ค์€ ๋ชจ๋“  ๋ณ€์ˆ˜์˜ ๋ถ€๋ถ„์ง‘ํ•ฉ์ด ๋  ์ˆ˜ ์žˆ๋‹ค!

     

    2) Dimensionality Reduction

    ์ด์ƒ์ ์ธ ์ƒํ™ฉ

    • More features, better performance (under the assumption of independency)

    ๊ทธ๋Ÿฌ๋‚˜ ํ˜„์‹ค์—์„œ๋Š”

    • All features cannot be independent to each other, ๋ชจ๋“  ๋ณ€์ˆ˜๊ฐ€ ๋…๋ฆฝ์„ ๊ฐ€์ •ํ•˜๊ธฐ ์–ด๋ ต๋‹ค
    • Noise features (including irrelevant, contains lots of noises…), ์‚ฌ์šฉํ•˜๋ฉด ์œ„ํ—˜ํ•œ ๋ณ€์ˆ˜๋“ค์ด ๊ฐ€๋“

     

    ๋”ฐ๋ผ์„œ ์šฐ๋ฆฌ์˜ ๋ชฉํ‘œ๋Š” '๋ณ€์ˆ˜๋“ค์˜ ์ž‘์€ ๋ถ€๋ถ„์ง‘ํ•ฉ์„ ์„ ํƒํ•ด์„œ ํ•™์Šตํ•˜์ž!'

    • Subset size: as small as possible (Occam’s Razor), ํฌ๊ธฐ๋Š” ์ž‘์œผ๋ฉด ์ž‘์„ ์ˆ˜๋ก
    • Model performance: as good as possible, ์„ฑ๋Šฅ์€ ์ข‹์œผ๋ฉด ์ข‹์„ ์ˆ˜๋ก

     

    3) Category of Dimensionality Reduction

    ์ž, ์ด์ œ ์ฐจ์›์„ ์ถ•์†Œํ•˜๋Š” ๋ฐฉ๋ฒ•์— ๋Œ€ํ•ด ์•Œ์•„๋ณด์ž. ์ฐจ์› ์ถ•์†Œ์—๋Š” ๋‘๊ฐ€์ง€ ๋ฐฉ๋ฒ•์ด ์žˆ๋‹ค

     

    1. Feature selection method, ๋ณ€์ˆ˜ ์„ ํƒ

    • ์›๋ž˜์˜ ๋ณ€์ˆ˜์—์„œ ์ง์ ‘ ๋ช‡ ๊ฐœ์˜ ๋ณ€์ˆ˜๋ฅผ ์„ ํƒํ•œ๋‹ค
    • Filter ๋˜๋Š” Wrapper ๋ฐฉ๋ฒ•์ด ์žˆ๋‹ค

    2. Feature extraction method, ๋ณ€์ˆ˜ ์ถ”์ถœ(๊ตฌ์ถ•)

    • ์›๋ž˜์˜ ๋ณ€์ˆ˜๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ ์ƒˆ๋กœ์šด ๋ณ€์ˆ˜๋ฅผ ๊ตฌ์ถ•ํ•œ๋‹ค by ํ•จ์ˆ˜

     

    4) Category of Feature Selection

    Feature selection method, ๋ณ€์ˆ˜ ์„ ํƒ ๋ฐฉ๋ฒ•์—๋Š” Filter Approach์™€ Wrapper Approach๊ฐ€ ์žˆ๋‹ค.

    ๊ธฐ์กด ๋ณ€์ˆ˜์—์„œ ์„ ํƒํ•˜์—ฌ ๋ณ€์ˆ˜๋“ค์˜ ๋ถ€๋ถ„ ์ง‘ํ•ฉ์œผ๋กœ ํ•™์Šต์„ ํ•˜๋Š” ๊ฒƒ์€ ๋˜‘๊ฐ™์ง€๋งŒ, ์„ ํƒํ•˜๋Š” ๊ณผ์ •์—์„œ ์ฐจ์ด๊ฐ€ ์กด์žฌํ•œ๋‹ค.

    • Filter Approach
      Dimensionality reduction with one feed-forward preprocessing step
      ํ•™์Šต ๋ชจ๋ธ๊ณผ ๋…๋ฆฝ์ ์œผ๋กœ ๋ณ€์ˆ˜๋ฅผ ์„ ํƒ
    • Wrapper Approach
      Learning method is involved and gives feed-back to dimensionality reduction process
      ํ•™์Šต ๋ชจ๋ธ์˜ ์ฐธ์—ฌ์™€ ํ”ผ๋“œ๋ฐฑ์„ ๋ฐ›์•„ ๋ณ€์ˆ˜๋ฅผ ์„ ํƒ

    Wrapper๊ฐ€ ๋” ์„ฑ๋Šฅ์ด ์ข‹์œผ๋ฏ€๋กœ, ์ด๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ ๋ณ€์ˆ˜ ์„ ํƒ ์•Œ๊ณ ๋ฆฌ์ฆ˜๋“ค์„ ์•Œ์•„๋ณด๋„๋ก ํ•˜์ž.

     

     


    2. Feature Selection Method

    1) Exhaustive Search

    ๋ชจ๋“  ๊ฒฝ์šฐ์˜ ๋ณ€์ˆ˜๋“ค์˜ ๋ถ€๋ถ„ ์ง‘ํ•ฉ์„ ์•Œ์•„๋ณด๋Š” ๋ฐฉ๋ฒ•์ด ์žˆ์„ ์ˆ˜ ์žˆ๋‹ค. ์ฆ‰, Full Search.

    ๊ทธ๋ ‡๋‹ค๋ฉด O(2^n-1)|n=#๋ณ€์ˆ˜ ์˜ ๋ณต์žก๋„๊ฐ€ ๋‚˜์˜ค๋Š”๋ฐ, ์ง€์ˆ˜์‹œ๊ฐ„์˜ ๋ณต์žก๋„๋Š” ์‹ค์ œ๋กœ ๊ตฌํ˜„ํ•˜๊ธฐ์—๋„ ๋งค์šฐ ๋ณต์žกํ•˜๋ฏ€๋กœ ์‚ฌ์šฉํ•˜์ง€ ์•Š๋Š”๋‹ค.

     

    2) Heuristics

    Full Search๋ฅผ ํ•  ์ˆ˜ ์—†์œผ๋ฏ€๋กœ, ์šฐ๋ฆฌ๋Š” Heuristics๋ฅผ ํ†ตํ•ด ๊ทผ์‚ฌ์น˜๋ฅผ ๊ตฌํ•˜๋ ค๊ณ  ํ•œ๋‹ค.

    ๋จผ์ € ๊ทผ์‚ฌ์—๋Š” Approximation๊ณผ Hearistic์ด ์žˆ๋Š”๋ฐ, ๊ฐ„๋‹จํžˆ ์ „์ž๋Š” ๋ฒ”์œ„๋‚ด์˜ ๊ทผ์‚ฌํ•ด๋ผ๋Š” ๊ฒƒ์„ ์ฆ๋ช…ํ•ด์•ผ ํ•˜์ง€๋งŒ ํ›„์ž๋Š” ์กฐ๊ธˆ naiveํ•œ ๊ทผ์‚ฌ๊ฐ’์„ ๊ฐ€์ง„๋‹ค. 

     

    ๋ณ€์ˆ˜ ์„ ํƒ์„ ์œ„ํ•œ Heuristics ์„ธ๊ฐ€์ง€ ๋ฐฉ๋ฒ•์„ ์•Œ์•„๋ณด์ž. (์ถ”๊ฐ€์ ์œผ๋กœ ๋ช‡ ๊ฐœ์˜ ๋ฐฉ์‹๋„ ํ•จ๊ป˜!)

    • Forward search
    • Backward search
    • Stepwise search

     

    3) ๋ณ€์ˆ˜์„ ํƒ : Forward Search, ์ „์ง„ ํƒ์ƒ‰ ๊ธฐ๋ฒ•

    ๋ณ€์ˆ˜๋ฅผ ํ•˜๋‚˜์”ฉ ์ถ”๊ฐ€ํ•ด๊ฐ€๋ฉฐ, ๋ชจ๋ธ์˜ ์„ฑ๋Šฅ์„ ๋†’์ด๋Š” ๋ณ€์ˆ˜๋ฅผ ์„ ํƒํ•˜๋Š” greedyํ•œ ๋ฐฉ๋ฒ•์ด๋‹ค.

    ๋ณ€์ˆ˜๊ฐ€ ํ•˜๋‚˜์ผ ๋•Œ, ์ „๋ถ€ ๋Œ๋ ค๋ณด๊ณ  ์ตœ์„ ์˜ ์„ ํƒ์ธ x3๋ฅผ ๊ฐ€์ ธ๊ฐ„๋‹ค. ๋‹ค์Œ์€ x3๋ฅผ ํ”ฝ์Šคํ•œ๋’ค ๋˜ ์ „๋ถ€ ๋Œ๋ ค๋ณด๋ฉฐ ๋ณ€์ˆ˜๊ฐ€ ๋‘ ๊ฐœ์ผ ๋•Œ ์ตœ์„ ์˜ ์„ ํƒ์„ ์ฐพ์•„๊ฐ„๋‹ค. ์ด๋ ‡๊ฒŒ ๊ณ„์† ๋ฐ˜๋ณตํ•˜์—ฌ ์ตœ์„ ์˜ ๋ณ€์ˆ˜๋“ค์„ ์ฐพ์•„๋‚ธ๋‹ค.

     

     

    4) ๋ณ€์ˆ˜์„ ํƒ : Backward Elimination, ํ›„์ง„ ์ œ๊ฑฐ ๊ธฐ๋ฒ•

    ์œ„์™€ ๋ฐ˜๋Œ€๋กœ ๋ณ€์ˆ˜๋ฅผ ํ•˜๋‚˜์”ฉ ์ œ๊ฑฐํ•ด๋‚˜๊ฐ€๋ฉฐ ๋ชจ๋ธ ์„ฑ๋Šฅ์— ์˜ํ–ฅ์„ ์•ˆ ์ฃผ๋Š” ๋ณ€์ˆ˜๋ฅผ ์ œ๊ฑฐํ•˜๋Š” ๋ฐฉ๋ฒ•์ด๋‹ค. ์œ„์™€ ๋ฐฉ์‹๋„ ๋˜‘๊ฐ™์ด ๋ฐ˜๋ณต์ ์œผ๋กœ ์ง„ํ–‰ํ•œ๋‹ค.

     

    5) ๋ณ€์ˆ˜์„ ํƒ : Stepwise Search

    ํ•˜์ง€๋งŒ ์œ„ ๋‘ ๊ฐ€์ง€ ๋ฐฉ๋ฒ•์˜ ๋‹จ์ ์€ ๋„ˆ๋ฌด greedyํ•˜๋‹ค๋Š” ๊ฒƒ์ด๋‹ค! greedy๋Š” ์ง€์—ญ์ ์ธ ์ตœ์ ์˜ ์ƒํ™ฉ์ด๋ฉฐ, ์ „์ฒด์ ์ธ ์ตœ์ ์˜ ์ƒํ™ฉ์ด ์•„๋‹ ์ˆ˜ ์žˆ๋‹ค. ๋”ฐ๋ผ์„œ ์ด์™€ ๊ฐ™์€ ๋ฌธ์ œ๋ฅผ ๊ฐœ์„ ํ•œ ๊ฒƒ์ด Stepwise Search์ด๋‹ค.

     

    Stepwise๋Š” ๋‹จ๊ณ„์ ์œผ๋กœ forward์™€ backward๋ฅผ ๊ฐ™์ด ์‚ฌ์šฉํ•œ๋‹ค. ์ดˆ๋ฐ˜์—๋Š” forward๋กœ ๋ณ€์ˆ˜๋ฅผ ์ฑ„์›Œ์ฃผ๊ณ , ์ดํ›„๋ถ€ํ„ฐ๋Š” ๋‘ ๊ฐœ๋ฅผ ๋ฒˆ๊ฐˆ์ด ์‚ฌ์šฉํ•˜๋ฉฐ, ์ถ”๊ฐ€ํ•˜๊ณ  ๋นผ๊ณ ๋ฅผ ๋ฐ˜๋ณตํ•œ๋‹ค. 

     

    • Any evaluation criteria can be used, ๋ณ€์ˆ˜๋“ค์˜ ์„ฑ๋Šฅ์„ ์ธก์ •ํ•˜๋Š” ํ‰๊ฐ€ ์ง€ํ‘œ๋Š” ์–ด๋–ค ๊ฒƒ์ด๋“  ๊ฐ€๋Šฅ

     

    ์•„๋ž˜ ์ง€ํ‘œ๋Š” ํ†ต๊ณ„์—์„œ ์ฃผ๋กœ ์‚ฌ์šฉํ•˜๋Š” ํ‰๊ฐ€ ์ง€ํ‘œ๋“ค์ด๋‹ค.

    • Akaike Information Criteria (AIC)
      SSE + the number of feature selected
    • Bayesian Information Criteria (BIC)
      SSE + the number of feature selected, std. value from the model trained with all features

    6) ๋ณ€์ˆ˜ ์„ ํƒ : L1 Parameter Regularization

    ์ถ”๊ฐ€์ ์œผ๋กœ Lasso Regression์œผ๋กœ๋„ ๋ณ€์ˆ˜์„ ํƒ์„ ํ•œ ํšจ๊ณผ๋ฅผ ์ค„ ์ˆ˜ ์žˆ๋‹ค.

    (ํ•˜์ง€๋งŒ ๊ต์ˆ˜๋‹˜์€ All in one์ธ Lasso ๋ฐฉ์‹์„ ๋ณ€์ˆ˜ ์„ ํƒ์œผ๋กœ ์‚ฌ์šฉํ•˜๋Š” ๊ฒƒ์„ ์„ ํ˜ธํ•˜์ง€ ์•Š์œผ์‹ ๋‹ค)

     

    • Weight์˜ ์ ˆ๋Œ€๊ฐ’์„ ์ค„์ด๋ ค๊ณ  ํ•˜๋ฉฐ,
    • Weight๊ฐ€ 0์œผ๋กœ ๊ฐˆ ๋•Œ๊ฐ€ ์ตœ์„ ์˜ ์„ ํƒ
    • Weight=0 ์ธ ๋ณ€์ˆ˜๋“ค์€ ์•Œ์•„์„œ ์˜ํ–ฅ๋ ฅ์„ ์žƒ์–ด ์ œ๊ฑฐ๋จ

     

    7) ๋ณ€์ˆ˜ ์„ ํƒ : Meta-Heuristic

    • Genetic Algorithm (GA)-based search, ์œ ์ „์ž ์•Œ๊ณ ๋ฆฌ์ฆ˜์ด๋ผ๊ณ  ๋ถˆ๋ฆฐ๋‹ค
      • Exhaustive search: computational cost (but global optimum)
      • Local search: search space is very limited -> local optimum (but efficient)
    • Main motivation of GA
      • Better efficiency than the exhaustive search
      • Better solution than the local search

    ์ฆ‰, ์ด๋ก ์ ์œผ๋กœ local search๋ณด๋‹ค ์˜ค๋ž˜ ๊ฑธ๋ฆฌ์ง€๋งŒ, global optimum์— ๋„๋‹ฌํ•  ์ˆ˜ ์žˆ๋Š” ๋ฐฉ์‹์ด๋‹ค

     

    • FYI : meta-heuristic
      • Heuristic approaches that can be generally used, Heuristic ๋ฐฉ์‹ ์ค‘ ์ผ๋ฐ˜์ ์œผ๋กœ ์‚ฌ์šฉ๋˜๋Š” ๋ฐฉ์‹
      • GA, simulated annealing, particle swarm optimization, etc.

     

    8) Genetic Algorithm (GA)

    • Evolutionary computing, ์ง„ํ™” ์—ฐ์‚ฐ์˜ ํ•œ ๋ถ„์•ผ
    • Meta-heuristic approach์˜ ๋ฐฉ๋ฒ•
    • Efficient search for the all search space, ์ „์ฒด ์˜์—ญ์—์„œ ํšจ์œจ์ ์œผ๋กœ ์ฐพ๋Š”๋‹ค
    • (Theoretically) can obtain a global optimum, ์ „์ฒด์ ์ธ ์ตœ์ ํ•ด๋ฅผ ๊ตฌํ•  ์ˆ˜ ์žˆ๋‹ค
    • (Practically) can obtain a near-global optimum 

     

    GA์˜ ๊ธฐ๋ณธ์ ์ธ ์•„์ด๋””์–ด๋ฅผ ๋ณด์ž. ์ผ๋‹จ ์ƒํƒœ๊ณ„์—์„œ ์ข…์€ ์„ธ๋Œ€๊ฐ€ ์ง€๋‚  ์ˆ˜๋ก ์•„๋ž˜์™€ ๊ฐ™์€ ๊ณผ์ •์œผ๋กœ ์ง„ํ™”๋ฅผ ํ•œ๋‹ค.

     

    Evolution Process

    1. Selection : select two superior parents, ์šฐ์ˆ˜ํ•œ ์œ ์ „์ž๊ฐ€ ์„ ํƒ๋˜๊ณ 
    2. Cross-over : select two superior parents, ์„ž์–ด์„œ ์ž์†์„ ๋งŒ๋“ ๋‹ค
    3. Mutaion : mutate each gene in a rare probability, ๋™์‹œ์— ๋Œ์—ฐ๋ณ€์ด๋„ ์ผ์–ด๋‚œ๋‹ค

    ์ด๋Ÿฌํ•œ ์ •๋ฆฌ๋ฅผ ์ง„ํ™” ํ”„๋กœ์„ธ์Šค๋ผ๊ณ  ํ•˜๋ฉฐ, ์ด๋ฅผ ๋ณ€์ˆ˜ ์„ ํƒ์— ์ ์šฉํ•˜๋ ค๊ณ  ํ•œ๋‹ค.

    ์•„๋ž˜๋Š” ์ „์ฒด์ ์ธ ์ˆœ์„œ์ด๋ฉฐ, 4๋ฒˆ๋ถ€ํ„ฐ 7๋ฒˆ์ด ๊ณ„์† ๋ฐ˜๋ณต๋œ๋‹ค.

     

    1. Chromosome encoding
    2. Population setting
    3. Fitness evaluation
    4. Selection
    5. Cross-over
    6. Mutation
    7. Create next generation

     

    ์ด์ œ ์ˆœ์„œ๋Œ€๋กœ ํ•˜๋‚˜ํ•˜๋‚˜ ์•Œ์•„๋ณด๋„๋ก ํ•˜์ž.

    1. Chromosome encoding

    • The number of genes in a chromosome: the number of features, ๋ณ€์ˆ˜์˜ ์ง‘ํ•ฉ์„ ๋‚˜ํƒ€๋‚ด๋Š” ํฌ๋กœ๋ชจ์ข€
    • Binary encoding: each gene represents each feature is selected or not, ๊ฐ’์€ ๋ณ€์ˆ˜๊ฐ€ ์„ ํƒ๋˜์—ˆ๋Š”์ง€๋ฅผ ์˜๋ฏธ

     

     

     

     

    2. Population setting

    • A population consists of p chromosomes, p๊ฐœ์˜ ํฌ๋กœ๋งˆ์ข€์œผ๋กœ ๊ตฌ์„ฑ
    • Each chromosome is set randomly, ์ฒ˜์Œ์—๋Š” ๋žœ๋คํ•˜๊ฒŒ 20% ์ •๋„์˜ 1์„ ํ• ๋‹นํ•˜๋Š” ํฌ๋กœ๋งˆ์ข€์„ ๊ตฌ์„ฑ

    p=500, ์€ 500๊ฐœ์˜ ๊ฐ๊ฐ ๋‹ค๋ฅธ ํฌ๋กœ๋งˆ์ข€๋“ค์„ ์‚ฌ์šฉํ•œ๋‹ค.

     

    3. Fitness evaluation

    • Fitness๋Š” ์ ํ•ฉ๋„, ์–ผ๋งˆ๋‚˜ ์šฐ์ˆ˜ํ•œ Model์ธ๊ฐ€? ์ฆ‰ ์šฐ์ˆ˜ํ•œ ์œ ์ „์ž = ๋ณ€์ˆ˜์ธ๊ฐ€?
    • Fitness can be differ to problems, ์ ํ•ฉ๋„๋Š” ๋ฌธ์ œ๋งˆ๋‹ค ๋‹ค๋ฅผ ์ˆ˜ ์žˆ๋‹ค
    • Fitness function in feature selection method 
      • AIC, BIC, Adjusted R2, etc…
      • Validation error
    • Train each model with each feature set, then calculate fitness function for each model
      ๊ฐ๊ฐ์˜ ๋ชจ๋ธ์— ๋Œ€ํ•ด์„œ ์ ํ•ฉ๋„๋ฅผ ํ‰๊ฐ€ํ•œ๋‹ค

     

     

    4. Selection

    ํ‰๊ฐ€๋ฅผ ํ†ตํ•ด ์šฐ์ˆ˜์„ฑ์„ ๊ฒ€์ฆํ•œ ๋’ค, ์ด์ œ ์„ ํƒ์„ ํ•œ๋‹ค

     

    • With fitness value of each chromosome
    • Deterministic selection
      • Selection of top n% parents with the fitness values
        ์ƒ์œ„ n%์˜ ๋ถ€๋ชจ๋“ค์„ ์„ ํƒ
        ๊ทธ๋Ÿฌ๋‚˜ local optimal์ด๋ฉด ์–ด๋–กํ•ด!
    • Probabilistic selection
      • Fitness value is an weight to be selected
        ๊ทธ๋ž˜์„œ ์„ ํƒ์— ํ™•๋ฅ ๊ฐ’์„ ์ฃผ์–ด ๊ฒฐ์ •ํ•˜๋Š” ๋ฐฉ์‹์„ ์ฃผ๋กœ ์‚ฌ์šฉํ•œ๋‹ค
        ์•„๋ž˜ ๋™๊ทธ๋ผ๋ฏธ ํŒ์— ๋‹คํŠธ๋ฅผ ๋‘๋ฒˆ ๋˜์ ธ์„œ ๊ฑธ๋ฆฌ๋ฉด ๊ต๋ฐฐ! ํ•˜๋Š” ๋ฐฉ์‹

     

    5. Cross-over

    • Make two children with two parents, (์‹์˜ ์ˆ˜์›”ํ•จ์„ ์œ„ํ•ด)

    ๊ฐ„๋‹จํžˆ ์ž์‹์„ ๋งŒ๋“œ๋Š” ๋ฐฉ๋ฒ•์—๋Š” ๋ถ€๋ชจ์˜ ์œ ์ „์ž ์ค‘ ๋žœ๋คํ•˜๊ฒŒ ๊ณจ๋ผ ์„œ๋กœ ๊ตํ™˜ํ•ด์ฃผ๋Š” ๋ฐฉ์‹์ด ์žˆ๋‹ค.

    • We can choice different cross-over point, ๊ตํ™˜ํ•˜๋Š” ํฌ์ธํŠธ๋Š” ์ž์œ ๋กญ๊ฒŒ ๊ฒฐ์ •ํ•ด๋„ ๋œ๋‹ค

    ์•„๋ž˜ ์˜ˆ์‹œ๋Š” ํŠน์ • ๋ฒ”์œ„๋ฅผ ์ •ํ•˜๊ณ  ๋žœ๋ค๊ฐ’์„ ๋ชจ๋“  ๋ฐฐ์—ด ๊ฐ’์— ๋Œ€ํ•ด ๊ตฌํ•ด์„œ, ํŠน์ • ๊ฐ’ ์ด์ƒ์ด ๋˜๋ฉด ๋ฐฐ์—ด ๊ฐ’์„ ๋ณ€ํ™˜ํ•ด ์ค€๋‹ค.

     

    6. Mutation

    ์œ„์™€ ๊ฐ™์€ ๋ฐฉ์‹์€ ์ƒˆ๋กœ์šด ์˜์—ญ์œผ๋กœ ์ด๋™ํ•  ์ˆ˜๋Š” ์—†๋‹ค. ๊ทธ๋ž˜์„œ ์‚ฌ์šฉํ•˜๋Š” ๊ฒƒ์ด mutation์ด๋‹ค.

    • With a very small probability (<= 3%) ๋งŒ์•ฝ ๋‚ฎ์€ ๋žœ๋ค๊ฐ’์„ ๊ฐ€์ง€๋ฉด ๊ฐ’์„ ์ „ํ™˜ํ•ด์ค€๋‹ค
    • Jump to another search space, ์ด๋™ํ•˜๊ธฐ ์œ„ํ•จ์ด๋ฉฐ
    • Escape from local optimal, ๋™์‹œ์— ์ง€์—ญ์˜ ์ตœ์ ์„ ํ”ผํ•˜๊ธฐ ์œ„ํ•จ์ด๋‹ค

     

    7. Create next generation

    • Make p new chromosomes with p parent chromosomes
    • Solution can be evolved through the generation, ์ง„ํ™”๊ฐ€ ์ง„ํ–‰๋  ์ˆ˜๋ก ์„ฑ๋Šฅ์ด ์ข‹์€ ๋ชจ๋ธ์ด ์ƒ์„ฑ๋œ๋‹ค

     

     

    ์•„๋ž˜์™€ ๊ฐ™์ด ์„ฑ๋Šฅ์ด ์šฐ์ˆ˜ํ•ด์ง์„ ์•Œ ์ˆ˜ ์žˆ๋‹ค.

    ์ฐธ๊ณ ๋กœ ์ผ๋ฐ˜์ ์œผ๋กœ 100๊ฐœ ์ด์ƒ์˜ p๊ฐ€ 300~400๋ฒˆ์˜ ์„ธ๋Œ€๋ฅผ ๊ฑฐ์ณ์•ผ ์ข‹์€ ์„ฑ๋Šฅ์„ ๊ฐ€์ง„๋‹ค.

    ๋˜ํ•œ Crosssover์™€ Mutaion์„ ์ง„ํ–‰ํ• ๋•Œ ํ™•์‹คํ•˜๊ฒŒ ๋ณ€ํ™”๊ฐ€ ์žˆ์–ด์•ผ, generation์„ ๊ฑฐ๋“ญํ• ์ˆ˜๋ก ์—ฌ๋Ÿฌ ์˜์—ญ์„ ๋‘˜๋Ÿฌ๋ณด๋ฉฐ ์ข‹์€ solution์„ ์ฐพ์„ ์ˆ˜ ์žˆ๋‹ค.

     

    Set of hyper-parameters

    • # of chromosomes in a population, 100๊ฐœ ์ด์ƒ
    • # of generations, 300~500๊ฐœ ์ด์ƒ (์ถฉ๋ถ„ํžˆ!)
    • Selection method, ํ™•๋ฅ ์ ์ธ ๋ฐฉ๋ฒ• ์“ฐ์„ธ์š”!
    • Cross-over rate, 50% ์ •๋„ ์„ž์–ด์š”!
    • Mutation rate, 3%
    • Termination criteria, ๋ช‡ ํšŒ ์ด์ƒ ๋˜๋Š” ํ‰๊ท  fitness์˜ ๋ณ€ํ™”๊ฐ€ ์—†์„ ๋•Œ ๋“ฑ 

     

    GA Tips

    • Don’t worry about # of population (computing power)
    • Probabilistic approach is preferred
    • You can try more than two fitness values at one time, ๋‘๊ฐ€์ง€ ์ ํ•ฉ๋„ ๊ณ„์‚ฐ๋„ OK 
    • The best chromosome can always remain (like invincible), ์ตœ๊ณ ์˜ ํฌ๋กœ๋งˆ์ข€์€ ์–ธ์ œ๋‚˜ ๋‚จ์•„์žˆ๋„๋ก ๋‹ค์Œ์„ธ๋Œ€๋กœ ๋ณต๋ถ™ํ•ด ์œ ์ง€ํ•˜๋Š”๊ฑด ๊ต์ˆ˜๋‹˜ ํŒ(์ „๊ต 1๋“ฑ์€ ์šฐ๋ฆฌํ•™๊ต์—^^)

     

    Pros/Cons

    • Meta-heuristic : So we don’t know how close the solution is to the optimal solution
      ์‹ค์ œ ์ „์ฒด์ ์ธ ์ตœ์ ํ•ด์™€ ์–ผ๋งˆ๋‚˜ ์œ ์‚ฌํ•œ์ง€ ์•Œ ์ˆ˜ ์—†๋‹ค, ๊ทธ๋ƒฅ ์ด๋ก ์ƒ์œผ๋กœ!
    • Solution can be upgraded through a number of evolution
      ํ•ด๊ฒฐ๋ฒ•์€ ์ง„ํ™”๋ฅผ ํ†ตํ•ด ๊ฐœ์„ ์ด ๋œ๋‹ค
    • Theoretically, the solution is global optimum
    • Be patient, ์ธ๋‚ด์‹ฌ ํ•„์ˆ˜..^^
    • Define the fitness function, ๋ฌธ์ œ์— ๋”ฐ๋ผ ๋‹ฌ๋ผ์งˆ ์ˆ˜ ์žˆ์œผ๋ฏ€๋กœ ๋งค์šฐ ์ค‘์š”
    • If you have a lot of time, but you don’t want to spend your time to think
      ์‹œ๊ฐ„์ด ๋งŽ๋‹ค๋ฉด ํ•œ ๋ฒˆ ๋Œ๋ ค๋ณด๋Š” ๊ฑฐ ์ถ”์ฒœ
    • One of optimization methods, but optimization gurus don’t like GA
      ์†”์งํžˆ ์ตœ์ ํ™” + ์ˆ˜ํ•™ ํ•˜์‹œ๋Š” ๋ถ„๋“ค์€ ๋ณ„๋กœ ์•ˆ์ข‹์•„ํ•œ๋‹ค..

     


    3. Feature Extraction

    1. Feature Extraction Method

    ์ด์ œ ๋ณ€์ˆ˜๋ฅผ ์ค„์ด๋Š” ๋‘๋ฒˆ์งธ ๋ฐฉ๋ฒ•์ธ feature extraction์„ ์•Œ์•„๋ณด์ž

    origin ๋ณ€์ˆ˜๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ ์ƒˆ๋กœ์šด ๋ณ€์ˆ˜๋ฅผ ๋งŒ๋“ค์–ด๋‚ด๋Š” ๋ฐฉ๋ฒ•์œผ๋กœ, ๋Œ€ํ‘œ์ ์œผ๋กœ PCA๊ฐ€ ์žˆ๋‹ค.

     

    1) Principal Component Analysis (PCA, ์ฃผ์„ฑ๋ถ„๋ถ„์„)

    PCA๋Š” ๋‹ค์–‘ํ•œ ๋ถ„์•ผ์—์„œ ๋งŽ์ด ์‚ฌ์šฉ๋˜๋Š” ๋ถ„์„๋ฐฉ๋ฒ•์ด๋‹ค.

     

    • Principal : ์ฃผ์š”ํ•œ
    • Component : ์š”์†Œ
    • Goal
      • Identify bases (axes) that contains the variance of the original data as much as possible
        original data์˜ ๋ถ„์‚ฐ์„ ๊ฐ€๋Šฅํ•˜๋ฉด ์ž˜ ์œ ์ง€ํ•  ์ˆ˜์žˆ๋Š” ์ƒˆ๋กœ์šด ์ถ•์„ ์ฐพ๋Š”๋‹ค(๋ถ„์‚ฐ ์ตœ๋Œ€ํ™”๊ฐ€ ๋ชฉํ‘œ)
      • Variable means a lot when the variance is large
        ์ฆ‰, ๋ณ€์ˆ˜๋Š” ๋ฐ”๋€Œ์ง€๋งŒ ๋ณ€์ˆ˜๊ฐ€ ๊ฐ€์ง€๊ณ  ์žˆ๋Š” ์ •๋ณด๋Ÿ‰์„ ์œ ์ง€ํ•  ์ˆ˜ ์žˆ์–ด์•ผ ํ•œ๋‹ค

     

    ์•„๋ž˜ ์ฒซ๋ฒˆ์งธ ๊ทธ๋ž˜ํ”„๋ฅผ ๋ณด๋ฉด, x1๊ณผ x2๋ฅผ ํ†ตํ•ด data๋“ค์˜ ๋ถ„์‚ฐ์ด ๊ฐ€์žฅ ํฐ ์ง์„ ์„ ๋งŒ๋“ค์—ˆ๋‹ค. ๊ทธ๋ฆฌ๊ณ  ํ•ด๋‹น ์ง์„ ์„ ๊ธฐ์ค€์œผ๋กœ ํŒŒ๋ž€์„ ์ด decision boundary๊ฐ€ ๋  ์ˆ˜ ์žˆ๋‹ค.

     

    ์ด๋•Œ x1๊ณผ x2์œผ๋กœ ๋งŒ๋“  ๊ทธ๋ž˜ํ”„๊ฐ€ ์•„๋‹Œ ๋ณ€๋Ÿ‰์„ ์ตœ๋Œ€ํ•œ ๋˜‘๊ฐ™์ด ์œ ์ง€ํ•˜๋Š” ๊ฒ€์€ ์ง์„  PC1์„ ๊ธฐ์ค€์œผ๋กœ ํ•˜๋Š” data๋ถ„ํฌ์—์„œ, ์šฐ๋ฆฌ๋Š” ์ƒˆ๋กœ์šด 2์ฐจ์› DB๊ฐ€ ์•„๋‹Œ ๊ฐ„๋‹จํ•˜๊ฒŒ ๋Œ€๋žต 0.7์ด์ƒ์€ ๋นจ๊ฐ• data๊ฐ€, ๊ทธ ์ดํ•˜๋Š” ํŒŒ๋ž‘ data๊ฐ€ ์žˆ์Œ์„ ๋ถ„๋ฅ˜ํ•  ์ˆ˜ ์žˆ๋‹ค!

     

    ์ด๊ฒƒ์ด PCA์ด๋‹ค!

     

    ๊ทธ๋Ÿผ ๊ฒ€์€ ์ง์„ , ์ถ•์€ ์–ด๋–ป๊ฒŒ ๊ตฌํ•˜๋Š”๊ฐ€?

     

    Dimensionality reduction

    • Mapping to a new variable (PC) while keeping the original variance
      ์›๋ž˜์˜ ๋ณ€์ˆ˜๋“ค์„ ํ†ตํ•ด์„œ ์ƒˆ๋กœ์šด ๋ณ€์ˆ˜๋ฅผ ๋งŒ๋“œ๋Š”๋ฐ, ์ƒˆ๋กœ์šด ๋ณ€์ˆ˜์˜ ๊ฐ’์„ ํ†ตํ•ด์„œ ์›๋ž˜์˜ ๋ณ€์ˆ˜๊ฐ’์„ ์•Œ ์ˆ˜ ์—†๋‹ค
      ์ฆ‰ ์„ค๋ช…๋ ฅ์ด ๋–จ์–ด์ง€๋Š” ๋ชจ๋ธ์ด ๋œ๋‹ค

     

     

    ์ž ์ด์ œ Mapping์„ ํ•˜๊ธฐ ์ „์— ์„ ํ˜•๋Œ€์ˆ˜์—์„œ ๋ฐฐ์› ๋˜ ๋‚ด์šฉ์„ ๋ณต์Šตํ•ด๋ณด์ž.

    ์šฐ๋ฆฌ๋Š” ๋ฒกํ„ฐ b๋ฅผ a์— ๋งคํ•‘ํ•˜๋ ค๊ณ  ํ•œ๋‹ค. ์‹์€ ์•„๋ž˜์™€ ๊ฐ™๊ณ , ๊ฒฐ๊ตญ p๋Š” ๋ฒกํ„ฐ a์™€ b๋ฅผ ๋‚ด์ ํ•œ ๊ฒƒ๊ณผ ๊ฐ™๋‹ค๋Š” ๊ฒฐ๊ณผ๋ฅผ ์–ป์„ ์ˆ˜ ์žˆ๋‹ค.

    ๋‹ค์Œ์€ Covariance์™€ Eigenvector and eigenvalue๋ฅผ ๋ณต์Šตํ•ด๋ณด์ž.

    ๋‘๊ฐ€์ง€๋ฅผ ์ „์ œ๋กœ ํ•˜์—ฌ, ์•„๋ž˜ ์„ธ๊ฐ€์ง€๋ฅผ ๊ธฐ์–ตํ•˜๊ณ  ๋„˜์–ด๊ฐ€์ž

    • X์˜ ํ‰๊ท ์ด 0์ด๋ฉด Cov๋Š” (X XT)/n
    • Ax = λx

     

    Mapping X to w

    • Projected points = wTX
      ์šฐ๋ฆฌ๋Š” wTX๋ฅผ ํ†ตํ•ด w๋ฅผ ์•Œ์•„๋‚ด๋ ค๊ณ  ํ•œ๋‹ค
    • Covariance of projected points(when X has zero mean value )

      where S is covariance matrix of X
      ๊ณต๋ถ„์‚ฐ์„ ์‚ฌ์šฉํ•˜๊ธฐ ์œ„ํ•ด wTX์™€ ๊ทธ์˜ ์ „์น˜ํ–‰๋ ฌ์„ ๊ณฑํ•œ๋‹ค
      ์ด๋•Œ ์šฐ๋ฆฌ๋Š” X์˜ ๊ณต๋ถ„์‚ฐ XXT(S๋ผ ํ•˜์ž)๋ฅผ ์•Œ๊ณ  ์žˆ์œผ๋ฏ€๋กœ, ํ•ด๋‹น ์‹์— ์‚ฌ์šฉํ•œ๋‹ค

     

    ์ง€๊ธˆ๊นŒ์ง€ ์šฐ๋ฆฌ๊ฐ€ ์–ป์„ ์ •๋ณด๋ฅผ ์•„๋ž˜ ๋‘๊ฐ€์ง€๋กœ ์ •๋ฆฌํ•  ์ˆ˜ ์žˆ๋‹ค. ๊ทธ๋ฆฌ๊ณ  ๊ณต๋ถ„์‚ฐ ์ตœ๋Œ€ํ™”๋ฅผ ์œ„ํ•œ w๋ฅผ ์•Œ์•„๋ณด๋Š” ๊ฒƒ์„ ๋ชฉํ‘œ๋กœ ํ•œ๋‹ค๋Š” ๊ฒƒ์„ ์žŠ์ง€๋ง์ž.

    • Max(wT X XT w = wT S w) 
    • wT w = 1

    ์œ„์˜ ๋‘ ์‹์—์„œ ๋ณ€์ˆ˜๋Š” w๋ฟ์ด๋ฏ€๋กœ ์ด๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๊ณ„์‚ฐํ•œ๋‹ค.

    ๋˜ํ•œ ๋‘๊ฐ€์ง€ ๊ฐ€์ •๋„ ์กด์žฌํ•˜๋Š”๋ฐ

    • w๋Š” unit vector
    • X๋Š” zero mean์œผ๋กœ ์ •๊ทœํ™”๋˜์–ด ์žˆ์Œ

     

    (L์€ ๋ผ๊ทธ๋ž‘์ง€์•ˆ) ์ด๋ฅผ ํ†ตํ•ด ์œ„๋ฅผ ๋งŒ์กฑ์‹œํ‚ค๋Š” w๋ฅผ ์ฐพ์„ ์ˆ˜ ์žˆ๋Š”๋ฐ, ์ด๋•Œ eigen vector and eigen value ๋ฌธ์ œ์ž„์„ ์•Œ ์ˆ˜ ์žˆ๋‹ค! 

    ํ•˜์ง€๋งŒ S์˜ ํฌ๊ธฐ์— ๋”ฐ๋ผ ๊ทธ๋งŒํผ์˜ eigen vector and eigen value์˜ ์ˆ˜๊ฐ€ ๋‚˜์˜จ๋‹ค. ๋”ฐ๋ผ์„œ pc๋„ ๊ทธ์— ๋งž๋Š” ๊ฐœ์ˆ˜๋กœ ๋‚˜์˜ค๊ฒŒ ๋œ๋‹ค! ๊ทธ๋ฆฌ๊ณ  ์ด ๋ชจ๋“  pc๋Š” ์ „๋ถ€ ๋…๋ฆฝ์ด๋ฉฐ ์ง๊ฐ์ด๋‹ค!

     

    ์•„๋ž˜์—์˜ ์˜ˆ์‹œ์—์„œ๋„ 2์ฐจ์› ์ž„์œผ๋กœ ๋‘ ๊ฐ€์ง€์˜ pc๊ฐ€ ๋‚˜์˜ค๋ฉฐ, ์ด ๋‘๊ฐ€์ง€๊ฐ€ ์ง๊ฐ์ž„์„ ์•Œ ์ˆ˜ ์žˆ๋‹ค.

    ์˜ˆ์‹œ๋ฅผ ๋ณด์ž.

     

     

     

     

    4๋ฒˆ์˜ ๋ถ„์‚ฐ์„ค๋ช…๋ ฅ์„ ํ†ตํ•ด 5๋ฒˆ ๋ถ„์„์„ ์ง„ํ–‰ํ•œ๋‹ค. ๋” ๋†’์€ ๋น„์œจ์„ ๊ฐ€์ง€๋Š” ์ˆœ์„œ๋Œ€๋กœ ์ „์ฒด ๋ณ€ํ™”๋Ÿ‰์˜ ํ•ด๋‹น eigen vector์˜ ์˜ํ–ฅ๋ ฅ์„ ์•Œ ์ˆ˜ ์žˆ๋‹ค. ๊ทธ๋ฆฌ๊ณ  ์ด๋ฅผ ๊ธฐ์ค€์œผ๋กœ data๋“ค์„ ์˜ฌ๋ ค์ฃผ๋ฉด ๋œ๋‹ค!

     

    ๊ทผ๋ฐ 2๊ฐœ์˜ ๋ณ€์ˆ˜๋กœ 2๊ฐœ์˜ pc๋ฅผ ๋งŒ๋“ค๋ฉด, ๊ฒฐ๊ตญ 2๊ฐœ์˜ ๋ณ€์ˆ˜๊ฐ€ ๋˜๋Š” ๊ฒƒ์ด ์•„๋‹Œ๊ฐ€..?

    ์•„๋ž˜ ์˜ˆ์‹œ๋ฅผ ํ•œ๋ฒˆ ๋ณด์ž! 13๊ฐœ์˜ origin ๋ณ€์ˆ˜๋ฅผ ํ†ตํ•ด์„œ 13๊ฐœ์˜ pc๋ฅผ ๋งŒ๋“ค์—ˆ์ง€๋งŒ, ์ด๋•Œ ์ค‘์š”ํ•œ ๊ฒƒ์€ ๋ˆ„์ ๋ถ„์‚ฐ๋น„์ด๋‹ค.

    ๋ˆ„์  ๋ถ„์‚ฐ๋น„๊ฐ€ 93%์ผ๋•Œ์˜ ์‚ฌ์šฉํ•œ pc ๊ฐœ์ˆ˜๋Š” 7๊ฐœ๋กœ, origin๋ณด๋‹ค ์ ์€ ์ˆ˜๋ฅผ ๊ฐ€์ง„๋‹ค (pc๋ฅผ loading vector๋ผ๊ณ ๋„ ๋ถ€๋ฅธ๋‹ค)

    ๋”ฐ๋ผ์„œ ๊ธฐ์ค€์„ ์ •ํ•˜์—ฌ ์ฐจ์› ์ถ•์†Œ์˜ ๋ฒ”์œ„๋ฅผ ๊ฒฐ์ •ํ•œ๋‹ค.

     

     

    ์ฐจ์›์ด ์ž‘์•„์งˆ ์ˆ˜๋ก ์„ค๋ช…๋ ฅ๋„ ์ค„์–ด๋“ค์ง€๋งŒ, loading vector(๋ณ€์ˆ˜๋“ค์˜ ๋น„์œจ)๋ฅผ ํ†ตํ•ด ๊ฐ„์ ‘์ ์œผ๋กœ ์„ค๋ช…์ด ๊ฐ€๋Šฅํ•˜๋‹ค.

     

    Number of PC Selected

    • d-dimensional original space can make d numbers of PC
    • I may select to contain 80~90% of original variance
      ๋ˆ„์ ๋ถ„์‚ฐ๋น„์˜ 80-90%๊นŒ์ง€ pc๋“ค์„ ์„ ํƒํ•  ์ˆ˜ ์žˆ๋‹ค

    pc 4๊ฐœ ์ •๋„๋ฅผ ์‚ฌ์šฉํ•ด๋„ ๊ดœ์ฐฎ์€ ๊ฒฐ๊ณผ๊ฐ€ ๋‚˜์˜จ๋‹ค

     

    Variations

    • Singular Value Decomposition (SVD),
      PCA๋Š” square matrix๋ฅผ ๋‹ค๋ฃจ์ง€๋งŒ SVD๋Š” rectangular matrix๋กœ ํ™•์žฅ
    • Kernel PCA, nonlinerํ•œ mapping๋„ ๊ฐ€๋Šฅ
    • Autoencoder(with linear activation function)
      NN์ค‘์—์„œ๋„ ์œ„์˜ ๊ธฐ๋Šฅ์„ ํ†ตํ•ด PCA์™€ ๊ฐ™์€ ๊ธฐ๋Šฅ์„ ์ˆ˜ํ–‰ํ•  ์ˆ˜ ์žˆ๋‹ค

     

Designed by Tistory.