Skip to content

Implementation of the Count-Min-Sketch datastructure for cardinallity estimation on knowelege Graphs. Including experiments on how to reduce noise and make the estimation better.

License

Notifications You must be signed in to change notification settings

datalexum/knowlege_graphs_cms

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

53 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

CMS for join cardinality estimation on knowlege graphs

Description

Experiments to explore the use of count min sketches for the cardinallity estimation of joins on knoweledge graphs.

Getting Started

Dependencies

  • All the packages contained in the requirements.txt

Installing

  • Clone the project
  • Install the requirements using pip install -r requirements.txt
  • Get your dataset in HDF format

Executing program

The most important file for starting experiments is the final_experiments.py file in the experiements folder in the src directory. To configure a run you need to fill in two config files in JSON format. An example is given by the file inal_experiment_config.json and final_experiment_testdata_subject.json.

This configuration is for experiments on the WatDiv dataset. In the first file general options are set like the paths, the used hash functions and many more. Just check it out!

The second file contains the joins. For every join only the predicates are given splitted into prefix and ending. Besides that you need to set the type of join.

If everything is configured you can start the experiment:

python src/experiments/final_experiments.py

Example

The following is a plot showing the mean Q-Error over all the queries in the experiments directory. As you can see the best technique for reducing the collision seems to be using addition and subtraction and minimum noise.

Plot for CMS with size 1024

Help

The python HDT library seems to only work on Linux (Mac OS has not been tested but should work). If you want to run this experiments from Windows you should be able to use this with WSL 2.

Authors

Contributors names and contact info

Alexander Schröder - [email protected] - LinkedIn
Lukas Trost - [email protected] - LinkedIn

License

This project is licensed under the MIT License - see the LICENSE.md file for details.

About

Implementation of the Count-Min-Sketch datastructure for cardinallity estimation on knowelege Graphs. Including experiments on how to reduce noise and make the estimation better.

Topics

Resources

License

Stars

Watchers

Forks