Skip to content

This is an implementation for COMP90024 Project 1. HPC of counting key info in a big file.

Notifications You must be signed in to change notification settings

pancak3/Hashtag-Extractor

Repository files navigation

Hashtag Extractor

An OpenMPI & OpenMP solution for extracting and ranking hashtags and languages from tweets.

Dependency

  • mpiCC
  • make
  • OpenMP

Third Party Dependencies (included)

  • RapidJson Used to parse a JSON string into a document (DOM).

Usage

Compile and run,

  make && mpirun -np 4 --bind-to none ./tp <tweets.json> lang.csv

NOTE: In <tweets.json>, each line should be a tweet following the format specified in Twitter Docs. The first and last lines should not be tweets. (The file comes from CouchDB using CURL command)

Files

.
├── combine.cpp
│       * Combine results from multiple processes together
├── combine.hpp
├── include
│   └── rapidjson
│       └── rapidjson files
├── job.sh
│       * Invokes job.slurm to submit multiple jobs
├── job.slurm
│       * Slurm script to submit job to Spartan HPC
├── lang.csv
│       * Mappings between languages and language codes
├── line.cpp
│       * Extracts hashtags and languages from tweets (in JSON form)
├── line.hpp
├── main.cpp
│       * Entrypoint of program, divides the input file into sections and assign them to MPI processes
├── Makefile
│       * Directives for make
├── results
│   ├── * Output files (results) from Spartan
├── threading.cpp
│       * Each process further subdivides their assigned sections into chunks and process them with OpenMP threads
└── threading.hpp

About

This is an implementation for COMP90024 Project 1. HPC of counting key info in a big file.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published