Skip to content

A Challenge for Network Traffic Analytics.machine learning methods including random-forest, SVM and MLP

Notifications You must be signed in to change notification settings

adodangeh/Malware_Detection_ML

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 

Repository files navigation

NetML: A Challenge for Network Traffic Analytics

Classifying network traffic is the basis for important network applications. Prior research in this area has faced challenges on the availability of representative datasets, and many of the results cannot be readily reproduced. Such a problem is exacerbated by emerging data-driven machine learning based approaches. To address this issue, we provide three open datasets containing almost 1.3M labeled flows in total, with flow features and anonymized raw packets, for the research community. We focus on broad aspects in network traffic analysis, including both malware detection and application classification. We release the datasets in the form of an open challenge called NetML and implement several machine learning methods including random-forest, SVM and MLP. As we continue to grow NetML, we expect the datasets to serve as a common platform for AI driven, reproducible research on network flow analytics.

CICIDS2017

Raw traffic is obtained from https://www.unb.ca/cic/datasets/ids-2017.html. Attack flows are extracted by filtering each workday PCAP files with respect to time interval and IPs described in their webpage. The extracted dataset has 7 types of malware attacks and normal traffic flows.

The total number of flows for different splits:

test-challenge set: 55,128
test-std set : 55,128
traininig set: 441,116

non-vpn2016

PCAP files are downloaded from https://www.unb.ca/cic/datasets/vpn.html. The original dataset has both vpn and non-vpn packet capture files but we only focus on non-vpn captures. In top-level annotation, we categorize the traffic into 7 groups: audio, chat, email, file_transfer, tor, video, P2P. In mid-level annotation, we group into 18 classes according to the application type such as aim_chat, facebook, hangouts, skype, youtube etc. In fine-level annotation, we treat each action as a different category and obtain 31 classes such as facebook_chat, facebook_video, skype_chat, skype_video etc.

The total number of flows for different splits:

test-challenge set: 16,323
test-std set : 16,323
traininig set: 131,065

dataset: https://drive.google.com/drive/folders/1n3z8oCvTrW0jmbv2NM3cBl4QEy_m9UjD?usp=sharing

About

A Challenge for Network Traffic Analytics.machine learning methods including random-forest, SVM and MLP

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published