Peter dev blog: Gradient Descent

Linear Regression
y = ax + b
* trying to find a and b based on the train set.
* create cost function which is Mean Squared Error.
* minimising the mean_squared_error.

Ways to minimise
* Normal equation
inv(X.T.dot.X).dot.X.T.y
Computational complexity O(n2.4) to O(n3)
* Gradient Descent
Start from random initialisation, then keep trying by modifying the partial derivative until it reaches minimum.
Important parameter: learning rate.
* Stochastic Gradient Descent
Instead of using the whole train_set, use random index of it
* Mini-batch Gradient Descent
Combining both batch Gradient Descent and Stochastic Gradient Descent
Use small random set of index called mini-batches

Polynomial Regression
Convert the polynomial into another feature (PolynomialFeatures) and then use LinearRegression to solve it.

Learning Curves
* see how is the model learning in terms of errors
* one of the way to see whether model is too simple or too complex, beside the cross validation way.

How to reduce overfitting
* Regularized Linear Models

Multiple way to regularised linear model
* Ridge Regression. Adding 1/2 of squared l2 norm of the weight vector to MSE
* Lasso Regression. Adding l1 norm of the weight vector to MSE
* Lasso Regression tends to perform feature selection and remove it from the equation.
* Elastic Net. combining both ridge regression and lasso regression.

Peter dev blog

Saturday, November 2, 2019

Gradient Descent

No comments:

Post a Comment

Artificial Neural Network

Report Abuse