ABOUT ME

-

Today
-
Yesterday
-
Total
-
  • [๊ธฐ๊ณ„ํ•™์Šต] 13. Class Imbalanced Problems
    ๐ŸณDev/Machine Learning 2021. 12. 9. 17:09

    ์˜ค๋Š˜์€ ๋ฐ์ดํ„ฐ ๋ถˆ๊ท ํ˜•์˜ ๋ฌธ์ œ์™€ ํ•ด๊ฒฐ๋ฐฉ๋ฒ•์— ๋Œ€ํ•ด ์ด์•ผ๊ธฐ ํ•ด๋ณด๋ ค๊ณ  ํ•œ๋‹ค.

    ์•Œ๊ณ ๋ฆฌ์ฆ˜์„ ๊ณต๋ถ€ํ•  ๋•Œ๋Š” ์šฐ๋ฆฌ๊ฐ€ ์ถ”์ถœํ•˜๋ ค๋Š” ๋ฐ์ดํ„ฐ์™€ ์ผ๋ฐ˜์ ์ธ ๋ฐ์ดํ„ฐ๊ฐ€ ๋Œ€๋žต 1 : 1์˜ ๋น„๋ฅผ ๊ฐ€์ง„๋‹ค๊ณ  ๊ฐ€์ •ํ•œ๋‹ค.

    ํ•˜์ง€๋งŒ ์‹ค์ œ ์ƒํ™ฉ์—์„œ๋Š” ์ด ๋‘ ๋ฐ์ดํ„ฐ๊ฐ„์˜ ๋น„์˜ ์ฐจ์ด๊ฐ€ ๋น„์Šทํ•œ ๊ฒฝ์šฐ๋Š” ๊ฑฐ์˜ ์—†๋‹ค.

     

    1. Class-Imbalanced Problems

    ๋ถ„๋ฅ˜ํ•˜๋Š” ์ƒํ™ฉ์—์„œ, ์šฐ๋ฆฌ๊ฐ€ ์ถ”์ถœํ•˜๋ ค๋Š” ๋ฐ์ดํ„ฐ๋Š” positive, ์ผ๋ฐ˜์ ์ธ ๋ฐ์ดํ„ฐ๋Š” negative์ด๋ผ๊ณ  ๋ถ€๋ฅธ๋‹ค.

     

    In classification problems,

    • The total number of a class of data (positive) is far less than the total number of another class of data (negative)
    • The minority class is a rare event or the data in minority class is hard to collect
    • Example – Majority class : Minority class = 9:1
    • Problem – A classifier tends to classify all data to the majority class

     

    ๊ทธ๋ฆฌ๊ณ  ์ผ๋ฐ˜์ ์œผ๋กœ ์ถ”์ถœํ•˜๋ ค๋Š” ๋ฐ์ดํ„ฐ๋Š” ์–‘์ด ๋งค์šฐ ์ ๋‹ค. ์˜ˆ๋ฅผ ๋“ค์–ด Fault Detection, Cancer detection ๋“ฑ์„ ์ƒ๊ฐํ•˜๋ฉด, ์šฐ๋ฆฌ๊ฐ€ ์ฐพ๊ณ ์žํ•˜๋Š” ๋ฐ์ดํ„ฐ๋Š” ์ผ๋ฐ˜์ ์ด์ง€ ์•Š๊ณ  ํŠน์ˆ˜ํ•˜๋‹ค.

     

    • What may cause the class-imbalanced problems
      • It is hard to collect the minority data
      • Minority data are collected from rare events
    • Minority class is often what we are interested in
      • Minority class: positive
      • Majority class: negative
      • We are interested in prediction of the minority class
      • In real-world cases, we barely deal with problems at 50:50

     

    1) Class-Imbalanced Problems

    • Two perspectives on class-imbalanced problems
      minority class๋ฅผ ์ผ๋ฐ˜์ ์œผ๋กœ ๋‘๊ฐ€์ง€ ๋ถ„ํฌ๋กœ ๋‚˜๋ˆŒ ์ˆ˜ ์žˆ๋‹ค
      ์™ผ์ชฝ์€ minority๊ฐ€ ์˜์—ญ์„ ์ด๋ฃจ๊ณ , ์ •์˜๊ฐ€ ๊ฐ€๋Šฅํ•˜๋‹ค
      ์˜ค๋ฅธ์ชฝ์€ ์˜์—ญ์„ ์ด๋ฃจ์ง€ ์•Š๊ณ , major๊ฐ€ ์•„๋‹Œ ๊ฒฝ์šฐ minor๋กœ ์ทจ๊ธ‰ํ•œ๋‹ค

     

    • Possible classification results from the conventional 2-class classifiers
      ์ด๋Ÿฌํ•œ ์ƒํ™ฉ์—์„œ decision boundary๋ฅผ ์„ค์ •ํ•˜๋ฉด ๋‹ค์Œ๊ณผ ๊ฐ™๋‹ค

    • Desired classification results
      ํ•˜์ง€๋งŒ iid์— ์˜ํ•ด ์—ฌ๋Ÿฌ ๋ฐ์ดํ„ฐ๊ฐ€ ์กด์žฌํ•  ์ˆ˜ ์žˆ์œผ๋ฏ€๋กœ,
      ์šฐ๋ฆฌ๋Š” ๊ฑฐ๋ฆฌ๊ฐ€ ์œ ์ง€๋˜๋Š”, ์ ์ ˆํ•œ decision boundary๋ฅผ ํ•„์š”๋กœ ํ•œ๋‹ค

     

    ์•„๋ž˜ ์˜ˆ์‹œ๋Š” majority์™€ minority์˜ ๋น„์— ๋”ฐ๋ฅธ  decision boundary์ด๋‹ค.

    ๋น„์˜ ์ฐจ์ด๊ฐ€ ์ปค์งˆ ์ˆ˜๋ก minoiry๊ฐ€ ๋ฌด์‹œ๋˜๋Š” ๊ฒƒ์„ ๋ณผ ์ˆ˜ ์žˆ๋‹ค. 

     

     


    2. Sampling Method

    ๋”ฐ๋ผ์„œ sampling technique๋ฅผ ํ†ตํ•ด ๋‘ ํด๋ž˜์Šค๊ฐ„์˜ ๊ท ํ˜•์„ ๋งž์ถ”๊ธฐ๋กœ ํ•œ๋‹ค.

    ๊ฐ€์žฅ ๊ฐ„๋‹จํ•˜๊ณ  ์‰ฌ์šด ๋ฐฉ๋ฒ•์€ sampling์ด๋‹ค.

    • Oversampling, minority๋ฅผ ๋Š˜๋ฆฌ๊ธฐ
    • UnderSampling, majority๋ฅผ ์ค„์ด๊ธฐ

     

    1) Oversampling

    Oversample from the minority class

    • Make duplicates, ๊ฐ™์€ ๊ฐ’์„ ๋ณต์ œํ•˜๊ฑฐ๋‚˜
    • Add random noises, ๋žœ๋ค ๋…ธ์ด์ฆˆ๋ฅผ ์ฒจ๊ฐ€ํ•œ ๋ฐ์ดํ„ฐ๋ฅผ ์ถ”๊ฐ€

    minority๋ฅผ ๋Š˜๋ฆฌ๊ธฐ ์œ„ํ•ด ์œ„์™€ ๊ฐ™์€ ๋ฐฉ์‹์„ ์‚ฌ์šฉํ•  ์ˆ˜๋„ ์žˆ์œผ๋ฉฐ,

    ์ฃผ๋กœ ์‚ฌ์šฉ๋˜๋Š” Oversampling ๋ฐฉ๋ฒ•์œผ๋กœ๋Š” SMOTE(Synthetic Minority Oversampling Technique)๊ฐ€ ์žˆ๋‹ค.

     

    ๊ฝค ์˜ค๋ž˜๋œ ๋ฐฉ๋ฒ•์ด๋ฉฐ, ๋‹จ์ ๋„ ์กด์žฌํ•˜์ง€๋งŒ ์—ฌ์ „ํžˆ ๋งŽ์ด ์“ฐ์ด๋Š” ๋ฐฉ๋ฒ•์ด๋‹ค.

     

    SMOTE

    SMOTE๋Š” ํŠน์ • minority sample k์™€ k ์ฃผ๋ณ€ minoirty data์™€ ์„ ๋ถ„์„ ํ˜•์„ฑํ•˜๊ณ , ๋žœ๋ค ๋น„์œจ๋กœ ์„ ๋ถ„ ์œ„์˜ ํ•œ ์ง€์ ์— data๋ฅผ ์ƒ์„ฑํ•˜๋Š” ๋ฐฉ๋ฒ•์ด๋‹ค. ๋ฐ์ดํ„ฐ๋“ค์˜ ๋‚ด๋ถ€์—๋งŒ ํ˜•์„ฑ๋˜์–ด ๋„“๊ณ  ํ’์„ฑํ•œ ๋ฐ์ดํ„ฐ๋Š” ์ƒ์„ฑํ•˜๊ธฐ ์–ด๋ ค์šฐ๋ฉฐ, ๋งŒ์•ฝ majority๊ฐ€ ์ค‘์‹ฌ์— ์žˆ๊ณ  minority๊ฐ€ ์™ธ๊ฐ์— ์žˆ๋Š” ๋ชจ๋ธ์—์„œ๋Š” ์˜๋ฏธ๊ฐ€ ์—†๋Š” data๊ฐ€ ์ƒ์„ฑ๋  ์ˆ˜ ์žˆ๋‹ค๋Š” ๋‹จ์ ์ด ์žˆ๋‹ค. 

     

     

    2) Undersampling

    Undersample from the majority class

    • Random removal, ๋žœ๋ค์œผ๋กœ ์ œ๊ฑฐ
    • Clustering-based removal, ํด๋Ÿฌ์Šคํ„ฐ์—์„œ ํ•ต์‹ฌ data๋งŒ ๋‚จ๊ธฐ๊ณ  ์ œ๊ฑฐ

    ๋žœ๋ค์œผ๋กœ ์ œ๊ฑฐํ•˜๋Š” ๊ฒฝ์šฐ, majority์˜ ํŠน์ • ์˜์—ญ data๋“ค์„ ์ „๋ถ€ ์ง€์›Œ majority๋ฅผ ์นจ๋ฒ”ํ•˜๋Š” ์ด์ƒํ•œ decision boundsasry๋ฅผ ์ƒ์„ฑํ•  ์ˆ˜ ์žˆ๋‹ค. ๋”ฐ๋ผ์„œ ํด๋Ÿฌ์Šคํ„ฐ ๊ธฐ๋ฐ˜์˜ ์ œ๊ฑฐ๋ฅผ ์ถ”์ฒœํ•œ๋‹ค. 

     


    3. Cost-Sensitive Learning

    1) Cost-sensitive learning

    Set the cost of misclassification to calibrate ratio between classes
    minority data์— ๋Œ€ํ•œ ์˜ค๋ถ„๋ฅ˜์— ๋” ํฐ panelty์„ ๋ถ€์—ฌํ•˜๋Š” ๋ฐฉ์‹์ด๋‹ค

    • If Positive : Negative = 1:9
    • Cost of misclassification –Positive : Negative = 9:1
    • Same results from the oversampling without additional noises

    ์‹์€ ๊ฐ„๋‹จํžˆ Losss function์„ ํด๋ž˜์Šค์— ๋”ฐ๋ผ ๋‹ค๋ฅธ ๊ฐ€์ค‘์น˜๋ฅผ ๋ถ€์—ฌํ•˜๋Š” ๋ฐฉ์‹์œผ๋กœ ๊ตฌํ˜„ํ•œ๋‹ค.

     

    Cost-Sensitive Learning์„ ํ†ตํ•ด ์™ผ์ชฝ ์ƒํ™ฉ์—์„œ๋Š” ๋นจ๊ฐ•์ด๋ฅผ ํ•˜๋‚˜ ์นจ๋ฒ”ํ•˜๊ณ  9์ ์˜ ํŒจ๋„ํ‹ฐ๋ฅผ ์–ป์—ˆ์ง€๋งŒ, ์˜ค๋ฅธ์ชฝ ์ƒํ™ฉ์—์„œ๋Š” ํŒŒ๋ž‘์ด๋ฅผ ๋‘ ๊ฐœ ๋„˜๊ณ ๋„ 2์ ์˜ ํŒจ๋„ํ‹ฐ๋ฅผ ์–ป์€ ๊ฒƒ์„ ๋ณผ ์ˆ˜ ์žˆ๋‹ค. ๊ทธ๋ ‡๋‹ค๋ฉด ๋ชจ๋ธ์€ ํŒŒ๋ž‘์ด๋ณด๋‹ค ๋นจ๊ฐ•์ด๋ฅผ ๋” ์กฐ์‹ฌํ•  ๊ฒƒ์ด๋‹ค.

     


    4. Ensemble

    • To reduce the variance of sampling methods

    ์•™์ƒ๋ธ”์„ majority๋ฅผ minority์™€ ๋น„์Šทํ•œ ํฌ๊ธฐ๋กœ ๋‚˜๋ˆ„์–ด ์—ฌ๋Ÿฌ๋ฒˆ ํ•™์Šต์‹œํ‚ค๋Š” ๋ฐฉ๋ฒ•์ด๋‹ค.


    5. Novelty Detection

    ์ด ์™ธ์˜ ๋ฐฉ๋ฒ•์—๋Š” Novelty Detection, ์ด์ƒ์น˜ ์ฐธ์ง€๊ฐ€ ์žˆ๋Š”๋ฐ,  ๊ฐ™์€ ๋ง๋กœ๋Š” 1-Class Classification, Outlier Detection, Anormaly Detection, Abnormality Detection๊ฐ€ ์žˆ๋‹ค.

     

    1) 1 - Class Classification

    minority๊ฐ€ ์—†๋‹ค๊ณ  ์ƒ๊ฐํ•˜๊ณ , majority ํ•˜๋‚˜์˜ ํด๋ž˜์Šค๋กœ ํ•™์Šตํ•˜๊ณ  boundary(์ •์ƒ ๋ฒ”์œ„)๋ฅผ ๊ฒฐ์ •ํ•œ๋‹ค. ๊ทธ๋ฆฌ๊ณ  ์ด์— ํ•ด๋‹นํ•˜์ง€ ์•Š๋Š” data๋Š” ๋‹ค๋ฅธ ํด๋ž˜์Šค(์ด์ƒ)๋ผ๊ณ  ํŒ๋‹จํ•œ๋‹ค.

     

    ๋งจ ์ฒ˜์Œ์— minority๋ฅผ ๋‘ ๊ฐ€์ง€ ๊ฒฝ์šฐ๋กœ ๋ถ„๋ฅ˜ํ–ˆ์„ ๋•Œ์˜ ๋‘ ๋ฒˆ์งธ ๊ฒฝ์šฐ์™€ ๊ฐ™๋‹ค.

     

    • Train a model with only data of the majority class
    • Define a region of majority class
    • Outside the defined region -> minority class
    • Like unsupervised method
    • Highly imbalanced problems, ๋งค์šฐ ๋ถˆ๊ท ํ˜•ํ•  ๋•Œ ์‚ฌ์šฉ
      • Positive : Negative = 1 : 999

     

    Binary Classification๊ณผ ๋น„์Šทํ•˜๋‹ค๊ณ  ์ƒ๊ฐํ•  ์ˆ˜ ๋„ ์žˆ์ง€๋งŒ, ์ด ๋‘˜์€ ์ฒ ํ•™์  ํŠน์ง•์ด ์•„์— ๋‹ค๋ฅด๋‹ค.

    • The number of data belonging to the minority class is too small
    • The definition of minority class is just NOT-majority class
      Binary์—์„œ๋Š” A์™€ B๊ฐ€ Major์ด์ง€๋งŒ, Novelty์—์„œ๋Š” ์•„๋‹ˆ๋‹ค. ์ด ๋‘˜์ด ์–ด๋–ป๊ฒŒ ๋ถ„๋ฅ˜๋˜๋Š”๊ฐ€๊ฐ€ ํฐ ์ฒ ํ•™์  ํŠน์ง•์˜ ์ฐจ์ด
    • Generalization vs. specialization

    binary๋Š” minorty๋ฅผ ํ†ตํ•ด boundary๋ฅผ ์กฐ์ ˆํ•  ์ˆ˜ ์žˆ๋‹ค. ํ•˜์ง€๋งŒ Novelty๋Š” ์™ธ๋ถ€๋กœ๋ถ€ํ„ฐ์˜ ์˜ํ–ฅ์„ ์ „ํ˜€ ๋ฐ›์ง€ ์•Š์œผ๋ฏ€๋กœ ํ˜„์žฌ์˜ boundary๋ณด๋‹ค ๋” ์ปค์ง€๋Š”๊ฒŒ ์ข‹์„์ง€, data์™€ ๋”ฑ๋งž๊ฒŒ ์„ค์ •ํ•˜๋Š” ๊ฒƒ์ด ์ข‹์€์ง€ ๊ฒฐ์ •ํ•  ์ˆ˜ ์—†๋‹ค. ๋ฌผ๋ก  ์š”์ƒˆ๋Š” ํ•˜์ด๋ธŒ๋ฆฌ๋“œ๋กœ ์ ์ ˆํ•œ ํฌ๊ธฐ์˜ boundary๋ฅผ ์ฐพ์„ ์ˆ˜ ์žˆ๋‹ค.

     

    2) Clustering-based method

    ํด๋Ÿฌ์Šคํ„ฐ ๋ฐ–์œผ๋กœ ๋ฒ—์–ด๋‚˜๋ฉด ์ด์ƒ์น˜, ์•ˆ์— ์žˆ๋‹ค๋ฉด ์ •์ƒ์„ ์˜๋ฏธํ•œ๋‹ค

    • Clustering with only data belonging to the majority class
    • Set a threshold that decides the novelty
      • Eg. Distance from each center, bottom p% of training data
        ๋…ธ์ด์ฆˆ๊ฐ€ ์žˆ์„ ์ˆ˜ ์žˆ์œผ๋‹ˆ ํด๋Ÿฌ์Šคํ„ฐ์— margin์„ ์กฐ๊ธˆ ๋‘์ž!

     

    3) Density estimation-based method

    ์ด ๋˜ํ•œ ๋ฒ”์œ„ ๋‚ด์— ์žˆ์œผ๋ฉด ์ •์ƒ์น˜, ๋ฐ–์— ์žˆ๋‹ค๋ฉด ์ด์ƒ์น˜๋ฅผ ์˜๋ฏธํ•œ๋‹ค

    • Density estimation with only data belonging to the majority class (eg. GMM ๊ฐ€์šฐ์‹œ์•ˆ ํ˜ผํ•ฉ ๋ชจ๋ธ)
    • Set a threshold that decides the novelty
      • Eg. Probability value, bottom p% of training dat

     

    4) Reconstruction-based method

    NN์„ ์ด์šฉํ•˜์—ฌ Novelty Detection๋„ ๊ฐ€๋Šฅํ•˜๋‹ค

    ์ฐจ์›์„ ์ถ•์†Œํ•˜๋Š” ๊ณผ์ •์„ Encode, ๋‹ค์‹œ ํ™•์žฅํ•˜๋Š” ๊ณผ์ •์„ Decode๋ผ๊ณ  ํ•˜๋Š”๋ฐ ์ด๋•Œ ํ™•์žฅ์„ ์žฌ๊ตฌ์ถ•์˜ ์˜๋ฏธ๋กœ Reconstruction์ด๋ผ๊ณ ๋„ ํ•œ๋‹ค. ์›๋ณธ๊ณผ ์ƒˆ๋กœ์šด ๊ฐ’์˜ ์ฐจ์ด๋ฅผ ์ค„์ด๊ธฐ ์œ„ํ•ด loss๋ฅผ ์ตœ์†Œํ™” ํ•˜๋Š”๋ฐ, ์ตœ๋Œ€ํ•œ ์˜๋ฏธ์žˆ๋Š” ๋ณ€์ˆ˜๋งŒ์„ Encodeํ•˜๊ธฐ์— ๋ณ€์ˆ˜ ์ถ”์ถœ๊ณผ๋„ ํฐ ์—ฐ๊ด€์ด ์žˆ๋‹ค.

     

    ๋‹ค์‹œ ๋Œ์•„์™€, ์›๋ณธ์„ ๋ณต์›ํ•ด์•ผํ•˜๋Š” ์ด ๋ชจ๋ธ์€ ์›๋ณธ์— ์ง‘์ค‘ํ•˜์—ฌ ํ•™์Šต์„ ํ•˜๊ฒŒ ๋œ๋‹ค. ์•„๋ž˜์˜ ์˜ˆ์‹œ์— ๋”ฐ๋ฅด๋ฉด ์ € ๋ชจ๋ธ์€ ๋ฒ„์„ฏ์— ์ง‘์ค‘ํ•˜์—ฌ ํ•™์Šต์„ ํ•˜๊ธฐ์—, ๋ฒ„์„ฏ์ด ์•„๋‹Œ ๋งˆ๋ฆฌ์˜ค๊ฐ€ input์œผ๋กœ ๋“ค์–ด์˜ค๋ฉด reconsturction error๊ฐ€ ๋งค์šฐ ์ปค์ง€๊ฒŒ ๋œ๋‹ค. 

    ๋ฌผ๋ก  ๋ฒ„์„ฏ๋“ค๋กœ ๋ชจ๋ธ๋ง์„ ํ•ด๋„ ์˜ค์ฐจ๋Š” ๋ถ„๋ช…ํžˆ ์กด์žฌํ•œ๋‹ค. ๋”ฐ๋ผ์„œ ํŠน์ • threshold๋ฅผ ๋„˜์œผ๋ฉด ์ด์ƒ์น˜๋ผ๊ณ  ํŒ๋‹จํ•œ๋‹ค.

     

    • Replicator neural networks (autoencoder)
    • Train a neural network to minimize the reconstruction error

     

    5) Support vector machine-based method

    • 1-class SVM (1-SVM) vs. Support vector domain (data) description (SVDD)

     

     


    6. Summary

    • ๋ฒ”์ฃผ ๋ถˆ๊ท ํ˜• ๋ฌธ์ œ๋Š” ์ผ๋ฐ˜์ ์ธ 2-class ๋ถ„๋ฅ˜๊ธฐ๋กœ๋Š” ํ•ด๊ฒฐํ•˜๊ธฐ ์–ด๋ ค์›€
    • ํ•ด๊ฒฐ์ฑ…
      • Sampling, cost-sensitive ๋“ฑ์˜ ๋ณด์ •์„ ํ•˜๊ณ  2-class ๋ถ„๋ฅ˜๊ธฐ๋กœ ํ•ด๊ฒฐ
      • Novelty detection method ์‚ฌ์šฉ, ์ด์ƒ์น˜ํƒ์ง€

     

     

    ๋งˆ์ง€๋ง‰์œผ๋กœ ML์˜ ๋‘ ๊ฐ€์ง€ ๋ช…์–ธ์„ ์•Œ์•„๋ณด์ž

     

    Occam's Razor, “If there are various logical ways to explain a certain phenomenon, the simplest is the best
    Wolpert, "No Free Lunch Theorem" : ํŠน์ • ๋ฌธ์ œ์— ์ตœ์ ํ™”๋œ ์•Œ๊ณ ๋ฆฌ์ฆ˜์€ ๋‹ค๋ฅธ ๋ฌธ์ œ์—๋Š” ๊ทธ๋ ‡์ง€ ์•Š๋‹ค

     

     

    ์ด์ƒ์œผ๋กœ ๋จธ์‹ ๋Ÿฌ๋‹ ์ˆ˜์—… ๋!

     

Designed by Tistory.