Peter dev blog: Classification

Binary Classifier

* try SGDClassifier as case study

Performance Measure

* use cross validation

cons: not good for skewed dataset

* use confusion matrix

cross_val_predict follows by confusion_matrix.

* Receiver Operating Characteristic (ROC) Curve

Confusion matrix

True Negative | False Negative

False Positive | True Positive

Accuracy = TP / ( TP + FN ) . e.g. kids video filterer.

Recall/TPR = TP / (TP + FP) . e.g. detect thief on video surveillance

Both accuracy and recall is tradeoff; because based on SGDClassifier, if you move threshold to right; recall will go down and accuracy will go up. vice versa.

ROC

* plots true positive rate against false positive rate. or, plot recall versus 1-specificity (True negative rate)

* Based on graph, higher recall, more false positive the classifier produces.

* One way to measure the classifier is Area Under Curve. (roc_auc_score).

* Perfect classifier will have auc=1.0, while random will have 0.5.

PrecisionRecall Curve should be used when positivitiese class is rare or false positive is more important than false negative. Otherwise, use ROC curve

Other Type of classification

* Multiclass classification

extending binary classification to support more than one, either through OneVersusOne or OneVersusAll

* Multilabel classification

By outputing multiple label because it is trained on multiple label

* Multioutput classification

generalisation of multilabel classification where each label can have multiclass (multiple values)

Error Analysis

* By using confusion matrix and plotting it, we can see which error is the most common from our model

* It is not enough though.

* Divide each value in confusion matrix by number of images in corresponding class

* We will see more meaningful errors.

Peter dev blog

Saturday, November 2, 2019

Classification

No comments:

Post a Comment

Artificial Neural Network

Report Abuse