snowex-hackweek · jomey · Aug 21, 2024 · Aug 21, 2024 · Aug 21, 2024 · Aug 21, 2024
diff --git a/book/_toc.yml b/book/_toc.yml
@@ -54,6 +54,14 @@ parts:
           title: Albedo
           sections:
             - file: tutorials/albedo/aviris-ng-data
+        - file: tutorials/NN_with_Pytorch/intro
+          title: Neural Networks with Pytorch
+          sections:
+            - file: tutorials/NN_with_Pytorch/00_ML_refresher
+            - file: tutorials/NN_with_Pytorch/01_Neural_Networks_Basics
+            - file: tutorials/NN_with_Pytorch/02_Pytorch_Basics
+            - file: tutorials/NN_with_Pytorch/03_Data_Download_And_Cleaning
+            - file: tutorials/NN_with_Pytorch/04_Building_And_Training_FFN
 - caption: Projects
   chapters:
     - file: projects/index

diff --git a/book/tutorials/NN_with_Pytorch/00_ML_refresher.ipynb b/book/tutorials/NN_with_Pytorch/00_ML_refresher.ipynb
@@ -0,0 +1,168 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Machine Learning Refresher (5 mins)\n",
+    "\n",
+    "Before we dive into neural networks, let's take a moment to understand the broader context in which they operate: Machine Learning (ML).\n",
+    "\n",
+    "## What is Machine Learning?\n",
+    "\n",
+    "Machine Learning (ML) is a field of artificial intelligence (AI) that focuses on developing algorithms or computer models using data. The goal is to use these “trained” computer models to make decisions. Unlike traditional programming, where we write explicit rules for every situation, ML models learn patterns from data to perform tasks. Here is a more general definition:\n",
+    "\n",
+    "```{epigraph}\n",
+    "Machine Learning is the field of study that gives computers the ability to learn without being explicitly programmed. \n",
+    "**— Arthur Samuel, 1959**\n",
+    "```"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Machine Learning Vs. Traditional Programming\n",
+    "\n",
+    "Machine Learning (ML) and traditional programming are two different paradigms used to solve problems and create intelligent systems. The primary difference lies in how they are instructed to solve a problem.\n",
+    "\n",
+    "## A Simple Task\n",
+    "\n",
+    "Suppose we want to build an intelligent system that identifies whether a given number is even or odd. This intelligent system can be represented mathematically as:\n",
+    "\n",
+    "$$\n",
+    "y=f(x)\n",
+    "$$\n",
+    "\n",
+    "Where:\n",
+    "\n",
+    "- $x \\to$ the number entered also called a feature.\n",
+    "- $y \\to$ the outcome we want to predict.\n",
+    "- $f \\to$ the model that gets the job done.\n",
+    "\n",
+    "\n",
+    "### Traditional Programming Approach\n",
+    "\n",
+    "In traditional programming, the programmer writes explicit rules (code) for the program to follow. The system follows these instructions exactly to produce a solution.\n",
+    "\n",
+    "```\n",
+    "def check_even_odd(number):\n",
+    "    if number % 2 == 0:\n",
+    "        return \"Even\"\n",
+    "    else:\n",
+    "        return \"Odd\"\n",
+    "\n",
+    "# Usage\n",
+    "result = check_even_odd(4)  # Output: Even\n",
+    "```\n",
+    "\n",
+    "```{figure} ./images/traditional_programming_flowchart.jpeg\n",
+    "---\n",
+    "alt: A flowchart showing the traditional programming approach\n",
+    "width: 400px\n",
+    "---\n",
+    "Tradition Programming Flowchart\n",
+    "```\n",
+    "\n",
+    "### Machine Learning Approach\n",
+    "\n",
+    "In Machine Learning, instead of writing explicit instructions, we provide a model with data and let it learn the patterns. The model, after training, can then make predictions or decisions based on what it has learned. \n",
+    "\n",
+    "```{figure} ./images/ML_flowchart.jpeg\n",
+    "---\n",
+    "alt: A flowchart showing the machine learning process\n",
+    "width: 400px\n",
+    "name: ml-flow\n",
+    "---\n",
+    "Machine Learning Flowchart\n",
+    "```\n",
+    "\n",
+    "\n",
+    "```{important}\n",
+    "Machine Learning is useful when the function ($f$) cannot be explicitly programmed or when the relationship between the feature(s) and outcome is unknown.\n",
+    "```"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## How Do We Estimate $f$?\n",
+    "\n",
+    "We must think about ML **only** if it is suitable for our work flow! \n",
+    "\n",
+    "### Machine Learning Algorithms\n",
+    "\n",
+    "Machine learning algorithms can be categorized based on different criteria, each suited to different kinds of problems:\n",
+    "\n",
+    "- **Supervised Learning**: The most common type of ML algorithm, where the model is trained on labeled data. For example, given past snow data, we can train a model to predict future snow water equivalent. Supervised learning is typically divided into two main types:\n",
+    "    * **Classification**: In classification tasks, the model predicts a discrete label or category. For example, a model might be trained to classify different types of snow conditions (e.g., powder, packed, ice) based on weather data.\n",
+    "    * **Regression**: In regression tasks, the model predicts a continuous value. For instance, predicting snow density using weather data.\n",
+    "- **Unsupervised Learning**: Here, the model works with unlabeled data, finding patterns or groupings in the data without explicit instructions. Examples include clustering or anomaly detection For example, grouping  SNOTEL sites based on similar snow accumulation trends or temperature profiles.\n",
+    "- **Reinforcement Learning**: A more advanced type of ML where the model learns by making decisions and receiving feedback (rewards or penalties). It's commonly used in robotics, game playing, and optimization problems.\n",
+    "\n",
+    "```{note}\n",
+    "Most Machine Learning techniques can be characterised as either *parametric* or *non-parametric*. \n",
+    "\n",
+    "- **Parametric:** The parametric approach simplifies the problem of estimating $f$ to a parameter estimation problem. The disadvantage is that we assume a particular shape of $f$ that may not match the true shape of $f$. A common parametric method is the Linear Regression.\n",
+    "\n",
+    "- **Non-parametric:** Non-parametric approaches do not assume any shape for $f$. Instead, they try to estimate $f$ that gets as close to the data points as possible. The disadvantage of non-parametric approaches is that they may require extensive training observations to estimate $f$ accurately. Common examples of non-parametric methods are the tree models, neural networks, etc.\n",
+    "```\n",
+    "\n",
+    "::::{dropdown} 🏋️ Exercise: Where Do Neural Networks Fit In in this Tutorial?\n",
+    "::::{tip}\n",
+    "We're predicting snow density - a continuous value.\n",
+    "::::\n",
+    "::::"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Machine Learning Workflow\n",
+    "\n",
+    "To wrap up our introduction to Machine Learning, here’s a typical workflow that you would follow when developing a machine learning model:\n",
+    "\n",
+    "1. **Problem Definition**\n",
+    "   - Clearly define the problem you’re trying to solve. For example, predicting snow density from weather data.\n",
+    "   \n",
+    "2. **Data Collection**\n",
+    "   - Gather relevant data. In our case, we would collect data from SNOTEL sites, including snow depth, temperature, and precipitation measurements.\n",
+    "\n",
+    "3. **Data Preprocessing**\n",
+    "   - Clean and preprocess the data to make it suitable for modeling. This might include handling missing values, normalizing data, or encoding categorical variables.\n",
+    "   \n",
+    "4. **Feature Engineering**\n",
+    "   - Select or create features (input variables) that will be used by the model to make predictions. For instance, we might use features like average temperature and total precipitation over a certain period.\n",
+    "\n",
+    "5. **Model Selection**\n",
+    "   - Choose the type of model to use. For example, a neural network.\n",
+    "\n",
+    "6. **Model Training**\n",
+    "   - Train the model using the training data. This involves feeding the data into the model and adjusting the model’s parameters to minimize prediction errors.\n",
+    "\n",
+    "7. **Model Evaluation**\n",
+    "   - Evaluate the model’s performance using validation data. This helps ensure that the model generalizes well to unseen data.\n",
+    "\n",
+    "8. **Model Tuning**\n",
+    "   - Fine-tune the model by adjusting hyperparameters or using techniques like cross-validation to improve performance.\n",
+    "\n",
+    "9. **Model Deployment**\n",
+    "   - Deploy the model to a production environment where it can make real-time predictions.\n",
+    "\n",
+    "10. **Monitoring and Maintenance**\n",
+    "    - Continuously monitor the model’s performance and update it as necessary to ensure it remains accurate and relevant.\n",
+    "\n",
+    "This workflow (same as {numref}`ml-flow`) provides a high-level overview of the steps involved in a typical machine learning project. In the rest of this tutorial, we'll be applying parts of this workflow to predict snow density using neural networks and PyTorch.\n"
+   ]
+  }
+ ],
+ "metadata": {
+  "language_info": {
+   "name": "python"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}
diff --git a/book/tutorials/NN_with_Pytorch/01_Neural_Networks_Basics.ipynb b/book/tutorials/NN_with_Pytorch/01_Neural_Networks_Basics.ipynb
@@ -0,0 +1,149 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Introduction to Neural Networks\n",
+    "\n",
+    "## What is a Neural Network?\n",
+    "\n",
+    "A neural network is a computational model that was inspired by the functionality  of biological neurons. It consists of layers of interconnected units called **neurons**, which process input data to produce an output. Neural networks (NN) are a type of non-parametric method. Non-parametric approaches do not assume any shape for $f$. Instead, the try to estimate $f$ (using $\\hat{f}$) that gets as close to the data points as possible. \n",
+    "\n",
+    "```{important}\n",
+    "Neural networks are not models of the brain as there is no proven evidence that the brain operates the same way neural networks learn representations. However, some of the concepts were inspired by understanding the brain.\n",
+    "```\n",
+    "\n",
+    "### Basic Structure of a Neural Network\n",
+    "\n",
+    "Every NN consists of three distinct layers;\n",
+    "\n",
+    "- **Input Layer**: The layer that receives the input data (e.g., features like temperature and snow depth).\n",
+    "- **Hidden Layers**: Layers where the data is processed. Each layer extracts different features or patterns from the data.\n",
+    "- **Output Layer**: The final layer that produces the prediction or classification (e.g., predicted snow density).\n",
+    "\n",
+    "Neural networks can vary in complexity, from simple single-layer networks to deep networks with many hidden layers. Here is an image of a sample neural network:\n",
+    "\n",
+    "```{figure} ./images/feedforward.png\n",
+    "---\n",
+    "alt: A Feed-forward Neural Network\n",
+    "width: 400px\n",
+    "name: ffn\n",
+    "---\n",
+    "A Simple Feed-forward Neural Network\n",
+    "```\n",
+    "\n",
+    "\n",
+    "### A Perceptron or Single Neuron\n",
+    "\n",
+    "To understand the functioning of these layers, we begin by exploring the building blocks of neural networks; a single neuron or perceptron. Each layer in a NN consists of small individual units called neurons (usually represented with a circle). A neuron receives inputs from other neurons, performs some mathematical operations, and then produces an output. Each neuron in the input layer represents a  feature. In essence, the number of neurons in the input layer equals the number of features. Each neuron in the input layer is connected to some or every neuron in the hidden layer. The number of neurons in the hidden layer is not fixed, it is problem dependent and it is often determined via hyperparameter optimization (more on this later). Every inter-neuron connection has an associated weight, these weights are what the neural network learns during the training process.\n",
+    "\n",
+    "```{figure} ./images/perceptron.png\n",
+    "---\n",
+    "alt: A Perceptron\n",
+    "width: 400px\n",
+    "name: perceptron\n",
+    "---\n",
+    "A Single Neuron\n",
+    "```\n",
+    "\n",
+    "#### How the perceptron works\n",
+    "\n",
+    "Consider a dataset $\\mathcal{D}_n = \\left\\lbrace (\\textbf{x}_1, y_1), (\\textbf{x}_2, y_2), \\cdots, (\\textbf{x}_n, y_n) \\right\\rbrace$ where $\\textbf{x}_i^\\top \\equiv ({x}_{i1}, {x}_{i2}, \\cdots, {x}_{ik})$ denotes the $k$-dimensional vector of features, and $y_i$ represents the corresponding outcome. Given a set of input fed into the network through the input layer, the output of a neuron in the hidden layer is\n",
+    "\n",
+    "$$\n",
+    " z = f(\\textbf{x}_i;\\textbf{w}) = g(w_0 + \\textbf{w}^\\top \\textbf{x}_i),\n",
+    "$$\n",
+    "\n",
+    "where $\\textbf{w} = (w_1, w_2, \\cdots, w_k)^\\top$ is a vector of weights and  $w_0$ is the bias term associated with the neuron. The weights can be thought of as the slopes in a linear regression and the bias as the intercept. The function $g(\\cdot)$ is known as the activation function and it is  used to introduce non-linearity into the network. There exists a number of activation functions in practice (see them [here](https://pytorch.org/docs/stable/nn.html#non-linear-activations-weighted-sum-nonlinearity)), the commonly used ones are:\n",
+    "\n",
+    "- **ReLU (Rectified Linear Unit)**: Outputs the input directly if positive, otherwise, it outputs zero. It’s simple and effective, making it the default choice for most hidden layers.\n",
+    "- **Sigmoid**: Squashes the input to a range between 0 and 1, often used in the output layer of a binary classification problem and in hidden layers.\n",
+    "- **Tanh**: Similar to Sigmoid but outputs between -1 and 1, often used in hidden layers."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from utils import plot_activations # Import the function to plot the activations\n",
+    "\n",
+    "plot_activations()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Common Neural Network Architectures\n",
+    "\n",
+    "- **Feedforward Networks**: Often used for structural data, these networks consist of layers of neurons where the data moves forward from the input to the output without cycles.\n",
+    "\n",
+    "- **Convolutional Neural Networks (CNNs)**: The gold standard for image analysis, including, image classification, object detection, and image segmentation. 1D CNN can also be used for sequence data, such as text and time series.\n",
+    "\n",
+    "- **Recurrent Neural Networks (RNNs)**: Well-suited for sequence data such as texts, time series, and even tasks like drawing generation. Variants include LSTM (Long Short-Term Memory) and GRU (Gated Recurrent Units) which address the vanishing gradient problem.\n",
+    "\n",
+    "- **Encoder-Decoders**: Commonly used for tasks like machine translation, where the input data is encoded into a context vector and then decoded into the target language or format.\n",
+    "\n",
+    "- **Generative Adversarial Networks (GANs)**: Used for generating realistic data, including 3D modeling for video games, animation, and generating high-quality images. GANs consist of two networks (a generator and a discriminator) that compete against each other.\n",
+    "\n",
+    "- **Graph Neural Networks (GNNs)**: These networks are designed to work with graph data structures, making them suitable for tasks like social network analysis, molecular structure analysis, and recommendation systems.\n",
+    "\n",
+    "- **Transformers**: A revolutionary architecture primarily used in NLP but increasingly in other areas like vision (Vision Transformers). Transformers rely on self-attention mechanisms to process input data in parallel, making them highly efficient for tasks like language modeling, translation, and more. Transformers are the backbone of models like OPenAI's GPT, Google's BERT, Google's Gemini, and others."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Training a Feedforward Neural Network\n",
+    "\n",
+    "Training a neural network involves adjusting the weights based on the errors made on the training data.\n",
+    "\n",
+    "### Forward Propagation\n",
+    "In forward propagation, the input data passes through the network, layer by layer, to produce an output. The output is then compared to the actual target using a **loss function**.\n",
+    "\n",
+    "### Loss Function\n",
+    "The loss function quantifies how far the network's predictions are from the actual targets. For regression tasks, a common loss function is Mean Squared Error (MSE). See others [here](https://pytorch.org/docs/stable/nn.html#loss-functions).\n",
+    "\n",
+    "### Backward Propagation and Gradient Descent\n",
+    "- **Backward Propagation**: Computes the gradient of the loss function with respect to each weight using the chain rule. This gradient indicates the direction to adjust the weights to minimize the loss.\n",
+    "- **Gradient Descent**: An optimization algorithm that adjusts the weights in the direction that minimizes the loss function. This process is repeated over many iterations to improve the model's accuracy. The algorithm is as follows:\n",
+    "    1. Initialize weights and biases with random numbers.\n",
+    "    2. Loop until convergence:\n",
+    "        1. Compute the gradient using backpropagation ($\\widehat{\\mathcal{L}}_n(\\textbf{w})$ is the loss function); $\\frac{\\partial \\widehat{\\mathcal{R}}_n(\\textbf{w})}{\\partial \\textbf{w}}$\n",
+    "        2. Updated weights; $\\textbf{w} \\leftarrow  \\textbf{w} - \\eta  \\frac{\\partial \\widehat{\\mathcal{R}}_n(\\textbf{w})}{\\partial \\textbf{w}}$\n",
+    "    3. Return the weights\n",
+    "\n",
+    "```{note}\n",
+    "In practice we do not go through the entire dataset before updating the weight as this might be computationally expensive. So, we update the weight in batches. This is called mini-batch gradient descent. When the batch size is 1, it is called stochastic gradient descent. Additionally, we often use optimizers to extend the ideas of gradient descent to improve the efficiency, stability, and convergence of the training process.\n",
+    "\n",
+    "You can think of an optimizer as gradient descent plus some additional optimization techniques that improve or modify the basic gradient descent process.\n",
+    "```\n"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.11.5"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}