diff --git a/README.md b/README.md index dd3b879..4d67343 100644 --- a/README.md +++ b/README.md @@ -15,14 +15,12 @@ This implementation runs on the system's __CPU__ in parallel. The model was test * Create the dataset directory using `mkdir data` * Extract the CSV files found on [Kaggle](https://www.kaggle.com/zalando-research/fashionmnist/data) in `data` directory created before * Execute the project: - * To execute the project using the default settings, use `make run` - * To execute the project using custom model architecture: - * Change directory using `cd build` - * Use `./neural-network -i -h [-h ...] -o ` + * Change directory using `cd build` + * Use `./Neural-Network -i -h [-h ...] -o ` -To rebuild the project, use `make clean` first and then execute `make` and `make run`. + For example `./Neural-Network -i 784 -h 150 -h 100 -h 50 -o 10` -To compile using the Intel Compiler in a Windows environment, use: `icl Accuracy.cpp Activation.cpp Dataset.cpp Driver.cpp Export.cpp Fit.cpp Forward.cpp Interface.cpp Loss.cpp Optimize.cpp Parser.cpp Utilities.cpp /Qopenmp /O3 /Ot /GT /Ob2 /Oi /GA /fp:precise /QxHost /Qstd:c++17 /FeNeural-Network.exe` +To compile using the Intel Compiler in a Windows environment, use: `icl Accuracy.cpp Activation.cpp Dataset.cpp Driver.cpp Export.cpp Fit.cpp Forward.cpp Interface.cpp Loss.cpp Optimize.cpp Parser.cpp Utilities.cpp /Qopenmp /Qunroll /Qipo /O3 /Ot /GT /Ob2 /Oi /GA /fp:precise /QxHost /Qstd:c++17 /FeNeural-Network.exe` ## Model Settings @@ -32,19 +30,19 @@ The model's settings are: * Loss function: **MSE** * Learning rate: **0.1** -With these settings, the training is expected to last around *25 minutes* running on a medium to high end machine. +With these settings, the training is expected to last around *25 minutes* running on a medium to high-end machine. ## Fine tuning -In `Common.hpp` there are parameters that can be tuned for better results. For example, there is a variable called `N_THREADS` that holds the nubmer of threads to request from the OS. This number is recommended to be equal to the number of the system's *logical cores*. Furthermore, in this file the user can edit the number of *epochs* of training for faster results and the model's *learning rate*. +In `Common.hpp` there are parameters that can be tuned for better results. For example, there is a variable called `N_THREADS` that holds the number of threads to request from the OS. This number is recommended to be equal to the number of the system's *logical cores*. Furthermore, in this file the user can edit the number of *epochs* of training for faster results and the model's *learning rate*. ## Results The effective core utilization percentage was around 97 %. An example of execution is: -![Expected Output](expected-output.png) +![Expected Output](expected-output.PNG) -Below, there are multiple model architecures compared for research purposes using the fashion MNIST dataset: +Below, there are multiple model architectures compared for research purposes using the fashion MNIST dataset: | Model ID | First Hidden Layer | Second Hidden Layer | Third Hidden Layer | Activation Function | Epochs | Learning Rate | Accuracy | loss | |:---------: |:------------------: |:-------------------: |:------------------: |:-------------------: |:------: |:-------------: |:--------: |:-------: | @@ -56,7 +54,9 @@ Below, there are multiple model architecures compared for research purposes usin | 5 | 150 | 100 | 50 | Sigmoid | 10 | 0.3 | 86.43 | 0.09927 | | 6 | 150 | 100 | 50 | Sigmoid | 100 | 0.1 | 88.05 | 0.09404 | | 7 | 150 | 100 | 50 | Sigmoid | 100 | 0.01 | 88.18 | 0.09303 | -| 8 | 150 | 100 | 50 | Sigmoid | 1000 | 0.01 | 88.18 | 0.09303 | +| 8 | 150 | 100 | 50 | Sigmoid | 1000 | 0.01 | 88.33 | 0.10240 | + +It's worth to mention that for the eighth model, the training accuracy was around 59200 out of 60000 and had a training loss equal to 0.00967. This means that for a feed-forward model, a test accuracy of 90 % on the test subset of the fashion MNIST dataset is a ceiling. ## Notes diff --git a/expected-output.PNG b/expected-output.PNG new file mode 100644 index 0000000..ada7f7f Binary files /dev/null and b/expected-output.PNG differ diff --git a/expected-output.png b/expected-output.png deleted file mode 100644 index 4257d84..0000000 Binary files a/expected-output.png and /dev/null differ diff --git a/neural-network/Activation.hpp b/neural-network/Activation.hpp index 6a2e868..47be18e 100644 --- a/neural-network/Activation.hpp +++ b/neural-network/Activation.hpp @@ -2,7 +2,7 @@ * Activation.hpp * * In this header file, we define - * all neuron activtion functions. + * all neuron activation functions. * Specifically, there is an * implementation of the sigmoid * and the ReLU activation function. diff --git a/neural-network/Dataset.cpp b/neural-network/Dataset.cpp index 7c3916e..5d52d07 100644 --- a/neural-network/Dataset.cpp +++ b/neural-network/Dataset.cpp @@ -155,7 +155,7 @@ void dataset::read_csv(const char* filename, int dataset_flag, double x_max) if (sscanf(token, "%d", &intval) != 1) /// Masks invalid data error { - fprintf(stderr, "error - not an integer"); /// Expecrting integer value type data + fprintf(stderr, "error - not an integer"); /// Expecting integer value type data } for (y_idx = 0; y_idx < classes; y_idx += 1) /// Convert integer to `Y` value for the model depending on the number of classes diff --git a/neural-network/Driver.cpp b/neural-network/Driver.cpp index d555ea7..9545810 100644 --- a/neural-network/Driver.cpp +++ b/neural-network/Driver.cpp @@ -11,7 +11,7 @@ * * @note For the driver to work properly, adjust the project settings found at the `Common.h` file. * One such adjustment is to define the filepaths of the training and the evaluation subsets. - * Another stronlgy recommended change is the number of threads requested by the OS. This number + * Another strongly recommended change is the number of threads requested by the OS. This number * is recommended to be equal to the number of the hosts's Logical Processors. This will very * possibly optimize execution time and therefore increase performance. */ diff --git a/neural-network/Export.cpp b/neural-network/Export.cpp index 3f6a80c..a3c584c 100644 --- a/neural-network/Export.cpp +++ b/neural-network/Export.cpp @@ -9,7 +9,7 @@ void nn::export_weights(std::string filename) { std::ofstream export_stream; /// Defines an output file stream - export_stream.open("./data/" + filename + ".csv"); /// Associates `export_stream` with a CSV file named after the `filename` variable + export_stream.open("../data/" + filename + ".csv"); /// Associates `export_stream` with a CSV file named after the `filename` variable for (int i = 1; i < layers.size() - 1; i += 1) /// Loops through model's hidden layers { for (int j = 0; j < layers[i] - 1; j += 1) /// Loops through layer's synapses diff --git a/neural-network/Fit.cpp b/neural-network/Fit.cpp index 12883df..a9a4b50 100644 --- a/neural-network/Fit.cpp +++ b/neural-network/Fit.cpp @@ -12,7 +12,7 @@ */ void nn::fit(dataset(&TRAIN)) { - int shuffled_idx; /// Decalres dample "pointer" + int shuffled_idx; /// Decalres sample "pointer" double start, end; /// Declares epoch benchmark checkpoints std::array loss; /// Declares container for training loss std::array validity; /// Declares container for training accuracy @@ -20,7 +20,7 @@ void nn::fit(dataset(&TRAIN)) std::random_device rd; /// Initializes non-deterministic random generator std::mt19937 gen(rd()); /// Seeds mersenne twister std::uniform_int_distribution<> dist(0, TRAIN.samples - 1); /// Distribute results between 0 and sample count exclusive - /// Change this depending on the ammount of loaded datasets + /// Change this depending on the amount of loaded datasets for (int epoch = 0; epoch < EPOCHS; epoch += 1) /// Trains model { loss[epoch] = 0.0; /// Initializes epoch's training loss @@ -29,7 +29,7 @@ void nn::fit(dataset(&TRAIN)) start = omp_get_wtime(); /// Benchmarks epoch for (int sample = 0; sample < TRAIN.samples; sample += 1) /// Iterates through all examples of the training dataset { - shuffled_idx = dist(gen); /// Selects a random example to avoid unshuffled dataset event + shuffled_idx = dist(gen); /// Selects a random example to avoid un-shuffled dataset event zero_grad(TRAIN.X[shuffled_idx]); /// Resets the neurons of the neural network forward(); /// Feeds forward the selected input back_propagation(TRAIN.Y[shuffled_idx]); /// Computes the error for every neuron in the network diff --git a/neural-network/Interface.cpp b/neural-network/Interface.cpp index 68563eb..4322e8a 100644 --- a/neural-network/Interface.cpp +++ b/neural-network/Interface.cpp @@ -303,7 +303,7 @@ void progress_bar::indicate_progress(double checkpoint) /** * Prints epoch stats. More specifically, it prints the epoch's number - * along with the model's acuracy and loss. It also prints the epoch's benchmark. + * along with the model's accuracy and loss. It also prints the epoch's benchmark. * * @param[in] epoch the epoch's number * @param[in] epoch_loss the model's loss during a certain epoch of training or evaluation diff --git a/neural-network/Utilities.cpp b/neural-network/Utilities.cpp index d722b01..5f5d367 100644 --- a/neural-network/Utilities.cpp +++ b/neural-network/Utilities.cpp @@ -87,7 +87,7 @@ void nn::set_z(const std::vector& l) * @note The `a` container for each neuron `i` in layer `l` holds the sum given by the * formula: * f{(z_i)}, \forall i \in `l`, where f is the chosen activation function for every - * neuron i nthe model. + * neuron i the model. */ void nn::set_a(const std::vector& l) { @@ -128,10 +128,10 @@ void nn::set_weights(const std::vector& l, const double min, const double m weights = new double** [l.size() - 1]; /// Allocates memory for the weights container for (int i = 1; i < l.size() - 1; i += 1) { - weights[i - 1] = new double* [l[i] - 1]; /// Allocates memory for the weigts of a layer in a neural network + weights[i - 1] = new double* [l[i] - 1]; /// Allocates memory for the weights of a layer in a neural network for (int j = 0; j < l[i] - 1; j += 1) { - weights[i - 1][j] = new double[l[i - 1]]; /// Allocates memory for the weigths of each neuron in a layer + weights[i - 1][j] = new double[l[i - 1]]; /// Allocates memory for the weights of each neuron in a layer for (int k = 0; k < l[i - 1]; k += 1) { weights[i - 1][j][k] = dist(gen); /// Uses random generator to initialize synapse @@ -141,7 +141,7 @@ void nn::set_weights(const std::vector& l, const double min, const double m weights[l.size() - 2] = new double* [l[l.size() - 1]]; /// Initializes weights in the output layer for (int j = 0; j < l[l.size() - 1]; j += 1) /// There is no bias in the output layer { - weights[l.size() - 2][j] = new double[l[l.size() - 2]]; /// Allocates memory for the weigths of each neuron in the output layer + weights[l.size() - 2][j] = new double[l[l.size() - 2]]; /// Allocates memory for the weights of each neuron in the output layer for (int k = 0; k < l[l.size() - 2]; k += 1) { weights[l.size() - 2][j][k] = dist(gen); /// Uses random generator to initialize synapse