{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "

Multi-layer Perceptron

" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In this mini-project, we will implement a simple one hidden layer neural network from scratch.\n", "Even if you will use deep learning libraries like Pytorch or Tensorflow later, implementing a network from scratch at least once is an extremely useful exercise, essential for designing and optimizing your own models effectively.\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "!pip install numpy\n", "!pip install sklearn\n", "\n", "# Imports of useful packages\n", "import matplotlib # For the plots\n", "import matplotlib.pyplot as plt \n", "import numpy as np # To perform operations on matrices efficiently\n", "\n", "\n", "# We will use the sklearn library to compare our neural network to that \n", "# of a simpler approach like logistic regression\n", "\n", "import sklearn \n", "import sklearn.datasets\n", "import sklearn.linear_model\n", "\n", "from math import exp,log\n", "\n", "# To display plots inline and adjust the display\n", "\n", "%matplotlib inline\n", "matplotlib.rcParams['figure.figsize'] = (10.0, 8.0)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Generating a dataset" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's start by generating a dataset that we can play with. The scikit-learn machine learning library has a few useful data generators, saving us the trouble of writing the code ourselves. We will use the make_moons function, which creates a two-class dataset of two-dimensional examples in the shape of two half-moons: each of the half-moons corresponds to a class." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "np.random.seed(1)\n", "X, y = sklearn.datasets.make_moons(n_samples=300, noise=0.20) # We create a dataset with 300 elements\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ " Question 1. Show the coordinates and labels of the first two elements of the dataset.\n", "\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# You can use f-string as proposed below (https://www.geeksforgeeks.org/formatted-string-literals-f-strings-python/)\n", "#print(f\"The first coordinate point {TODO} has a label {TODO}\")\n", "#print(f\"The second coordinate poin {TODO} has a label {TODO}\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can display this dataset easily using Matplotlib using colors to make the labels appear $y$ : " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "plt.scatter(X[:,0], X[:,1], s=50, c=y+1, cmap=plt.cm.Spectral)\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The dataset we generated has two classes, represented by red and blue dots.\n", "\n", "Our goal is to train a classifier that predicts the correct class from point coordinates $x_1$ et $x_2$. " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Find the best line manually" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In this part we will try to find the best line that separates our cloud of points manually." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Question 2 Creates a function that returns 1 if a coordinate point $(x_1,x_2$) is below the line with slope $a$ and the bias $b$." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "def pred_linear(a: float, b: float, x1: float, x2: float):\n", " # TODO\n", " pass" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "It is important to be able to evaluate the performance of our approaches using metrics. Here, we will choose the accuracy which is simply the number of well-classified elements divided by the total number of elements. To learn more about classification metrics like precision, recall and their link with accuracy, you can consult the following excellent Wikipedia (if you plan to do Machine Learning later, the notion of precision/recall is a classic) https://en.wikipedia.org/wiki/Precision_and_recall\n", "\n", "Question 3. Complete the following accuracy function. (1 Python line with the comprehension of a list)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "def accuracy(y_true, y_pred):\n", " \"\"\"\n", " Args:\n", " y_true (list[int]): A list of integers having values in {0,1} that contain the class labels\n", " y_pred (list[int]): A list of integers having values in {0,1} that contain the predictions of the model\n", "\n", " Returns:\n", " float: The Accuracy of the model\n", " \n", " Example:\n", " >>> accuracy([0,0,1], [0,1,1])\n", " 0.666...\n", " \"\"\"\n", " pass #TODO" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#### display function\n", "def plot_decision_boundary(pred_func):\n", " \"\"\"\n", " Shows the decision boundaries of a binary prediction function.\n", " \"\"\"\n", " # Set grid dimensions and give some margin for display\n", " x_min, x_max = X[:, 0].min() - 0.5, X[:, 0].max() + 0.5\n", " y_min, y_max = X[:, 1].min() - 0.5, X[:, 1].max() + 0.5\n", " h = 0.01\n", " # Generate the grid of points with a distance of h between them\n", " xx, yy = np.meshgrid(np.arange(x_min, x_max, h), np.arange(y_min, y_max, h))\n", " # Drawing the decision boundary\n", " Z = pred_func(np.c_[xx.ravel(), yy.ravel()])\n", " Z = Z.reshape(xx.shape)\n", " # Show contour and training points\n", " plt.contourf(xx, yy, Z, cmap=plt.cm.Spectral)\n", " plt.scatter(X[:, 0], X[:, 1], c=y, cmap=plt.cm.Spectral)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Question 4. Play by hand with parameters $a$ and $b$ to obtain several linear decision boundaries and try to obtain at least 80% accuracy.\n", "\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "a = 42 #TODO\n", "b = 42 #TODO\n", "def prediction(A, a, b, func):\n", " return np.array([func(a=a, b=b, x1=x[0], x2=x[1]) for x in A])\n", "plot_decision_boundary(lambda x: prediction(x, a, b, pred_linear))\n", "print('le score obtenu est de: ', accuracy(y, prediction(X, a, b, pred_linear)))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Logistic Regression" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Scikit-learn has models such as logistic regression which can find the optimal parameters a and b:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "classifier = sklearn.linear_model.LogisticRegressionCV()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ " Question 5. Train this logistic regression model on the dataset (X,y). You can use the Scikit-learn documentation to see how to train a model on data: https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#TODO (1 line)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Once the model is trained, it can be used to predict and draw the decision boundary:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "plot_decision_boundary(lambda x: classifier.predict(x))\n", "plt.title(\"Logistic Regression\")\n", "print('le score obtenu est de: ', accuracy(y, classifier.predict(X)))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ " Question 6. What do you observe? Was such a result predictable? What can we do to improve our predictions?" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#TODO" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Bonus Question. Find the coefficients $a, b$ obtained by Scikit-learn logistic regression." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#TODO" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Neural Networks" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You will now create a neural network to solve the previous problem." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# We will reuse the same datasets as previously, be careful X and y will be global variables for the rest \n", "# (to be avoided in general, but simplifies the notations for this mini porject)\n", "np.random.seed(1)\n", "X, y = sklearn.datasets.make_moons(300, noise=0.20)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ " Question 7 Complete the following variables and functions to code a **two-layer** neural network (1 hidden layer). The hidden layer will currently have **10 neurons** and we will use a learning rate of 3e-2.\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# number of examples in the training set\n", "\n", "N = 0 #TODO\n", "\n", "# dimension of the input\n", "d_input = 0 #TODO \n", "\n", "# dimension of the output\n", "d_output = 0 #TODO\n", "\n", "# dimension of the hidden layer i.e. number of neurons in the hidden layer\n", "d_hidden = 0 #TODO\n", "\n", "\n", "\n", "# learning rate for the gradient descente algorithm\n", "epsilon = 0 #TODO\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ " Question 8 Complete the following function to generate the parameters of our neural network. For this you will use the random library to generate parameters in the interval [-0.5, 0.5] using the random.random() function." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import random" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "def init_model(d_input: int, d_hidden: int, d_output: int):\n", " \"\"\"\n", " Args:\n", " d_input (int): dimension of the input\n", " d_hidden (int): dimension of the hidden layer\n", " d_output (int): dimension of the output\n", "\n", " Returns:\n", " dict: Dictionary containing 4 keys, the weights/biases (W1,b1) and (W2,b2) of the neural network.\n", " Each of these weights and biases are lists or list of lists of float.\n", " \"\"\"\n", " # Initialization of random parameters\n", " random.seed(0)\n", " # First layer of size d_input x d_hidden\n", " W1 = [] #TODO\n", " # Bias of the first layer vector of size d_hidden\n", " b1 = [] #TODO\n", " # Second layer of size d_hidden x d_output\n", " W2 = [] #TODO\n", " # The bias of the second layer\n", " b2 = [] #TODO\n", " # The model returned at the end is a dictionary of weights and biases\n", " model = { 'W1': W1, 'b1': b1, 'W2': W2, 'b2': b2}\n", " return model" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ " Question 9 Implement the following mathematical functions which may be useful later. All vectors $v_1,v_2$ are python lists and the matrices $X$ and $W$ are lists of lists." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# dot product between two vectors\n", "def dot_product(v1, v2):\n", " pass #TODO3\n", "\n", "# Add two vectors\n", "def add_bias(v1, v2):\n", " pass #TODO\n", "\n", "# Get the columns number \"index\" of W\n", "def get_columns(W, index):\n", " pass #TODO\n", "\n", "# Transpose a matrix\n", "def transpose(W):\n", " pass #TODO\n", "\n", "# Multiplication between two matrices()\n", "def matrix_multiplication(X, W):\n", " pass #TODO\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ " Question 10 Complete the forward_layer function by doing the following:\n", " $$ X \\times W + b $$\n", " In which X represents the input, W the weights and and b the biases.\n", " Complete the sigmoid and feed_forward function." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "def forward_layer(X, W, b):\n", " pass #TODO" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "def sigmoid(x):\n", " \"\"\"\"\n", " Args:\n", " x (float): input\n", " returns:\n", " float : sigmoid(x)\n", " \n", " \"\"\"\n", " return 0 # TODO" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "def forward_function(X,W1,b1,W2,b2): \n", " #TODO\n", " z1 = 0 # Output of the first layer\n", " a1 = 0 # Sigmoid activation of the first layer\n", " z2 = 0 # Output of the second layer\n", " exp_scores = 0# Compute exp(z2)\n", " probs = 0 #A pply softmax activation function on z2\n", " return probs\n", " " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Test your result:\n", "np.random.seed(1)\n", "model_test = init_model(4,3,2)\n", "X_debug = [[random.random() for i in range(4)]] # Test with an example in dimension 2\n", "forward_function(X_debug, model_test['W1'], model_test['b1'], model_test['W2'], model_test['b2'])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You are supposed to find: [[0.48, 0.52]] (if you used the random.random function to initialize the weights)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ " Question 11 We recall the feed forward equations (everything is in matrix form, so $X\\in\\mathbb{R}^{N\\times d_{input}}$,$W_1\\in\\mathbb{R}^{d_{input }\\times d_{hidden}}$, etc). Complete the back propagation equations and complete the sigmoid function, forward function and train_model. Gradient descent on W1 is provided to you.\n", "Reminder on backprop : https://towardsdatascience.com/backpropagation-the-natural-proof-946c5abf63b1" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "$$\n", "\\begin{aligned}\n", "z_1 & = XW_1 + b_1 \\\\ \n", "a_1 & = sigmoid(z_1)=\\frac{1}{1+\\exp(-z_1)} \\\\\n", "z_2 & = a_1W_2 + b_2 \\\\\n", "a_2 & = \\hat{y} = \\mathrm{softmax}(z_2)\\\\\n", "L(y,\\hat{y}) & = - \\frac{1}{N} \\sum_{n \\in N} \\sum_{i \\in C} y_{n,i} \\log\\hat{y}_{n,i}\n", "\\end{aligned}\n", "$$\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "Back propagation :\n", "$$\n", "\\begin{aligned}\n", "& \\delta_2 = \n", "& \\delta_1 = \n", "& \\frac{\\partial{L}}{\\partial{W_2}} = \n", "& \\frac{\\partial{L}}{\\partial{b_2}} = \n", "& \\frac{\\partial{L}}{\\partial{W_1}} = \n", "& \\frac{\\partial{L}}{\\partial{b_1}} = \n", "\\end{aligned}\n", "$$" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "def train_model(model, nn_hdim, num_epochs=1, print_loss=False):\n", "\n", " W1 = model['W1']\n", " b1 = model['b1']\n", " W2 = model['W2']\n", " b2 = model['b2']\n", "\n", "\n", " # Gradient descent. For each batch...\n", " for i in range(0, num_epochs):\n", " \n", " # Forward propagation (copy/paste inside forward_function previously defined)\n", " z1 = 0 # Output of the first layer\n", " a1 = 0 # Sigmoid activation of the first layer\n", " z2 = 0 # Output of the second layer\n", " exp_scores = 0# Compute exp(z2)\n", " probs = 0\n", " # Estimate the loss (c)\n", " correct_logprobs = 0 # Calculation of cross entropy for each example\n", " data_loss = 1./N * sum(correct_logprobs) # Loss totale\n", " \n", " \n", " # Backpropagation\n", " #TODO\n", " delta2 = 0\n", " dW2 = 0\n", " db2 = 0\n", " delta1 = 0\n", " delta1 = 0\n", " dW1 = 0\n", " db1 = 0\n", " \n", " # Gradient descente\n", " W1 =[[w - epsilon * d for d, w in zip(dW1_row, W1_row)]for dW1_row, W1_row in zip(dW1, W1)]\n", " #TODO\n", " b1 = 0\n", " W2 = 0\n", " b2 = 0\n", " # Updating weights and biases\n", " model = { 'W1': W1, 'b1': b1, 'W2': W2, 'b2': b2}\n", " \n", " # Loss display\n", " if print_loss and i % 50 == 0:\n", " print(\"Loss at epoch %i: %f\" %(i, data_loss))\n", " \n", " return model" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We will need a prediction function that uses our trained model to return predictions. Unlike the model outputs which are floats in [0,1] for each class, the model prediction is 1 on the class whose score is maximum and 0 elsewhere. We use numpy's argmax function to do this automatically.\n", "\n", " Question 12 Complete the function predict() :" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "def predict(model, x):\n", " W1, b1, W2, b2 = model['W1'], model['b1'], model['W2'], model['b2']\n", " # Forward propagation, like before\n", " z1 = 0 #TODO\n", " a1 = 0 #TODO\n", " z2 = 0 #TODO\n", " exp_scores = 0 #TODO\n", " probs = 0 #TODO\n", " return np.argmax(probs, axis=1)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ " Question 13 Train the model for different number of epochs and comment on your results." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "model = init_model(d_input,d_hidden,d_output)\n", "model = train_model(model,d_hidden, num_epochs=200, print_loss=True)\n", "print(\"The final accuracy obtained is :\", accuracy(y, predict(model, X)))\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Plot the decision boundary\n", "plot_decision_boundary(lambda x: predict(model, x))\n", "plt.title(\"Decision Boundary for hidden layer size 3\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Application on a real dataset" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We will now apply our model on a real dataset well known in the world of Machine Learning : le MNIST (https://en.wikipedia.org/wiki/MNIST_database) which is on Sklearn." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "digits = sklearn.datasets.load_digits()\n", "n_samples = len(digits.images)\n", "data = digits.images.reshape((n_samples, -1))\n", "\n", "_, axes = plt.subplots(nrows=1, ncols=10, figsize=(20, 3))\n", "for ax, image, label in zip(axes, digits.images, digits.target):\n", " ax.set_axis_off()\n", " ax.imshow(image, cmap=plt.cm.gray_r, interpolation=\"nearest\")\n", " ax.set_title(\"Training: %i\" % label)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "X = digits.images.reshape((n_samples, -1)) # We reshape the images into vector\n", "\n", "y = digits.target\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ " Question 13 Complete the input and output dimensions of your network so that it is adapted to the MNIST Dataset and restart the training (be careful, the training will now take a few minutes without additional code optimization).\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "N = len(X) \n", "d_input = 0 #TODO\n", "d_output = 0 #TODO \n", "d_hidden = 20 \n", "\n", "# Gradient descent parameter\n", "epsilon = 0.001 # le learning rate doit être plus petit qu'avant sinon l'entrainement diverge" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "model = init_model(d_input,d_hidden,d_output)\n", "model = train_model(model,d_hidden, num_epochs=100, print_loss=True)\n", "print(\"The final accuracy obtained is :\", accuracy(y, predict(model, X)))\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ " Question 14 : \n", "Divide the MNIST dataset into training and validation datasets.\n", "Find good hyper-parameters on your model. Why create a validation dataset?\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ " Question 14 Bonus \n", "How can you ensure that your accuracy on the validation set is significantly different from that of the training set? Find such hyperparameters and plot the accuracy on the training set and on the validation set during the training of your network (according to the epochs). What do you observe?" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ " Questions 15 (Bonus) There are many ways to make your neural network more efficient, you can find out about the following points of improvement of your choice, explain their usefulness and implement those you want by analyzing the new results obtained (on the game of data of your choice). Don’t hesitate to be curious and look for good resources to help you!\n", "\n", "* UUsing the Numpy library to handle matrix operations, rather than using Python lists of lists (this will simplify your code and should reduce training times by several orders of magnitude depending on the size of your network, allowing you to to train larger networks and better optimize hyperparameters on the MNIST dataset)\n", "* Add Weight Decay (https://fr.wikipedia.org/wiki/Weight_decay) \n", "* Stochastic or Batch Gradient Descent (https://fr.wikipedia.org/wiki/Algorithme_du_gradient_stochastique)\n", "* Added more layers (make the number of layers a model parameter)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.6" } }, "nbformat": 4, "nbformat_minor": 4 }