This is the first assignment of the courrse CS698R which consisted of two questions.
- The first question was to implement various strategies like Exploration, Exploitation, Epsilon Greedy, decay-Epsilon greedy, Softmax and UCB to determine the how agents perform in a stochastic Environment.
- The second question was to employ Monte-Carlo and Temporal Difference estimates to determine their relativistic performance of a symmetric random walk environment.
The code implementation is in the main.ipynb
notebook. The notebook is well documented to let a user know the logic of codes I implemented.
The environment is in the dir myenv/myenv/env/.
There are three Environments, each for:
- Two arm bandit problem
- Ten arm bandit problem
- Random Walk
git clone git@github.com:neel-singhania/CS698R-NEELABH-SINGHANIA-190538-ASSIGN-1.git
cd myenv
pip install -e .