Saturday, November 9, 2019

Unsupervised Learning

* It is without labels. just data.
* Three types: Clustering, Anomaly detection, Density Estimation

Clustering
* Separate data to different clusters
* Hard clustering: it must belong to one cluster
* Soft clustering: calculate the score based on distance from centroid

K-Means
* if we have cluster, it is easy to assign training instance
* if we have label, it is easy to calculate the means of each cluster.
* No Cluster + No Label? => pick random k, label it, update the centroid (calculate the centroid means), label it and so on until it converges.
* Incorrect initialisation might lead to local minimum
* How to select initialisation k?
** use other algorithm to roughly know where it is.
** run multiple times, and select the best one.
** inertia = mean of distance between instance and its cluster centroid.
* K-Means-+ improve the initialisation selection by
** pick the further away centroid after the initial k1 is selected based on weighted randomisation.
* Accelerated K-means used triangle inequality
* Mini-batch K-means does not use the whole training set
** a Bit worse inertia, but it is faster than normal one.
* Deciding number of cluster
** inertia is not valid for this one, because large number of cluster may have small inertia. so do small number of cluster.
** using graph is bit coarse
** instead, use silhouette score. (b-a)/max(b,a), b distance in the cluster, a distance to the next cluser
** when the score is approaching +1, it is well inside the cluster
** when the score is around zero, it is around boundary
** when the score is approaching -1, it might be belong to another cluster
** use silhouette diagram
* K-means has its own limit, like those graphs that have varying size, varying densities, or non-spherical shape.
* K-means can be used for image segmentation, dimensional reduction, semi-supervised learning
* semi-supervised: use the representative instance, instance that is the closest to the centroid.

DBSCAN
* define cluster as continuous region of high density
* It is good for data that is separated by low density.

Gaussian Misture Model
* It assumes instance is generated by various Gaussian distribution form.
* deciding number of cluster by using Theoretical Information Criterion
** Bayesian Information Criterion
** Akaike Information Criterion
** BIC tends to be simpler but AIC tends for fit the data more.
* Probability function is the possibility of outcome x given condition theta
* Likelihood function is the possibility of condition theta given outcome x

No comments:

Post a Comment

Artificial Neural Network

Logical Computation With Neuron * It has one or more binary input and one output. * Activate output when certain number of input is active...