Purpose of this project is to cluseter songs by some variables and try to come up with somekind of a relationship between them. There are 3 parts
Before even start,
- You need to add a folder to project. Name it "data_set"
- Download data from kaggle
- Move data to "data_set" folder
- Info : Display info about data and preprocess a little
- Feature Extraction : by using tokenizer, stemmer, deleting stop words, tfidf method and LDA, we will create vectors to feed to the clustering algorithm
- Clustering : By using k-means, we will try to cluster data
- NLTK
- panda
- numpy
- sklearn
- MORE TESTS
- PCA can be added
- Current error rate is 14%. This can be improved.