Skip to content

Bloom Filters & HyperLoLog & Mirsa Gries Algorithms

Notifications You must be signed in to change notification settings

sebSR/stream-processing

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Stream Processing

There are two ways of run (sbt)

  1. Bloom Filters Collection of input data is hashed (MurmurHash3) -> hash table (represents our input data). Then, we can check if the given item is in the data collection representing by hash table. This method let us save memory and complexity of data.
run BloomFilter <sizeOfHashTable: Int> <"seeds"> <elementsToCheck>
  1. Mirsa - Gries Algorithm The frequency algorithm which finds elements in the stream that occur more than streamLength/k, k is the parameter of the algorithm. There is returned k-1 elements.
run MirsaGries <k: Int>
  1. HyperLogLog Algorithm Let's calculate the number of distinct elements. *(Parameter b describes first bits.)
run HyperLogLog <b: Int>

About

Bloom Filters & HyperLoLog & Mirsa Gries Algorithms

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages