GitHub - alejandrod/content-filter: Content filtering service using machine learning as engine

alejandrod / content-filter Public

forked from causania/content-filter

Notifications You must be signed in to change notification settings
Fork 0
Star 1

Content filtering service using machine learning as engine

Notifications

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
project/plugins		project/plugins
src		src
.gitignore		.gitignore
README		README
build.sbt		build.sbt

Repository files navigation

ABSTRACT

Content moderation is widely used today. It's not a simple task, and there are several third party
companies which provide these kind of services.

Trying to automate this is a tricky task.
The idea of this project is to use Machine Learning (ML) techniques to do such classification.
Some ML methods ara capable to "learn" new patterns (like neural networks and logistic regression)

That's exactly what it's required for this problem. Imagine trying to block a word like "badword".
Using a dictionary of regexes will solve the problem for the trivial case. People can start using numbers
for vocals, etc.


So, ML sounds promising. An API can be created to get input from external systems
and keep improving the classification algorithm automatically.


Of course, such solution is not easy. But, it's worth to try.



IMPORTANT

This is a exploration project and it's far for being production ready!!!




RELATED STUFF


http://www.cs.cmu.edu/~biglou/resources/

http://ieeexplore.ieee.org/xpl/freeabs_all.jsp?arnumber=5617090

http://wiki.apache.org/jakarta-lucene/SpellChecker