Skip to content

Latest commit

 

History

History
20 lines (11 loc) · 1.69 KB

project_plan.md

File metadata and controls

20 lines (11 loc) · 1.69 KB

Project Plan A

Title: Ted Talk Rating Analysis

A brief summary.

This project's main goal is to make a classifier of the rating of TED talks. Based on the transcription of each TED talk, the classifier will correctly predict the rating of TED talks. Also, this project aims to find if there is other elements that affects the rating of the talks such as its duration, number of comments, etc.

The DATA portion.

There are quite a lot of dataset of TED talks in Kaggle. I am planning to look at the talks that are less than 20 years old, and since there is not a single file that contains all the information during the past 20 years, I will use several files. All the files that I have found is in .csv format, which is fortunate. Most files are already quite organized, so there will be no need to put a lot of effort to clean up the data. One problem is that the file is quite big (13MB). It might be necessary to narrow down the period. The source of the data is below:

TED Talks dataset

The ANALYSIS portion.

The end goal of this project is to make a classifier that predicts the rating of a TED talk based on its transcription. Along with this, I am also planning to apply statistical methods on other aspects of the talks such as its duration, number of comments, its tags, etc. I do not have any hypothesis at this point, but I believe that there is a certain type of word that draws more attention to the talk or collects more positive ratings in the description. Also, I think the duration of talks is inversely related to the rating of the talk, but I am not very sure about this.

The PRESENTATION portion.

Mostly tons of plots and tables.