Skip to content

Hacktoberfest

elenasamuylova edited this page Sep 29, 2022 · 21 revisions

Evidently Hacktoberfest 2022

Thanks for your interest in contributing to Evidently!

This page describes how you can contribute during Hacktoberfest (and beyond!).

If you are new to Evidently

evidently reports

Evidently is an open-source Python library for data scientists and ML engineers. It helps evaluate, test, and monitor the performance of ML models from validation to production.

Evidently evaluates different aspects of the data and ML model performance: from data integrity to the ML model quality. You can get the results as interactive visual dashboards in the Jupyter notebook or export them as JSON or a Python dictionary.

If you have not used Evidently before, you can go through the Getting Started tutorial. It will take you about 10 minutes to understand the basic functionality.

How to contribute

There are different ways how you can contribute to Evidently. You can read our Contribution Guide.

We welcome all improvements or fixes, even the tiny ones, and non-code contributions. Do you see a typo in the documentation? Don’t be shy, and send us a pull request. No contribution is too small!

In addition, during Hacktobfest, we invite you to make a specific type of contribution: help us add new statistical tests and metrics to detect data drift.

add new drift metric

Here is what it means:

  • Evidently helps users detect data drift (to check if the distributions of the input feature remain similar) and prediction drift (to detect when model outputs change).
  • To do this, you typically need to run a statistical test (like Kolmogorov–Smirnov) or calculate statistical distance using a metric like Wasserstein distance. Evidently already has implementations of several tests and metrics inside the library.
  • We invite you to add more metrics and tests as available drift detection methods.

If you want to know more about approaches to data drift detection, here is a blog post.

Why is this useful?

Right now, users can:

Some users rely on custom tests as they have their own preferences or want to use a test they are familiar with. Adding more drift methods to the “library of statistical tests” will give users more options to choose from. This will reduce the need for custom implementations.

Which drift detection methods are already there?

You can see it here in the code.

Which drift detection methods should I contribute?

We added several ideas to the issues. They are labeled as hacktoberfest, or good first issue.

You are welcome to propose your ideas, too. Is there a popular metric we overlooked? Is there something you are using in your work to detect drift? Open an issue to let us know that you want to add a different metric and started working on it!

Instructions to add the new data drift metric

For general instructions (e.g., how to clone the repository), head to the Contribution Guide.

Once you have chosen the drift method you want to implement, take the following steps.

Clone this wiki locally