Skip to content

MaastrichtU-BISS/v6-summary-py

 
 

Repository files navigation


vantage6

A privacy preserving federated learning solution

Federated Summary

⚠️ priVAcy preserviNg federaTed leArninG infrastructurE for Secure Insight eXchange (VANTAGE6)
This algorithm is part of VANTAGE6. A docker build of this algorithm can be obtained from harbor.vantage6.ai/algorithms/dsummary

Algorithm that is inspired by the Summary function in R. It report the Min, Q1, Mean, Median, Q3, Max and number of Nan values per column from each Node.

On top of the functionality provided by IKNL's version of this algorithm, it looks up the amount of NANs in non-numeric columns. It also automatically detects which columns are present and whether they are numeric or categorical.

This algorithm can also be used for SPARQL queries, building this is as simple as commenting out the standard docker wrapper and uncommenting the SPARQL one. The query can then be supplied under kwargs['query'].

Possible Privacy Issues

🚨 Categorial column with only one category
🚨 Min an Max for each column is reported
🚨 Column names are returned

Privacy Protection

✔️ If column names do not match nothing else is reported
✔️ If dataset has less that 10 rows, no statistical analysis is performed
✔️ Only statistical results Min, Q1, Mean, Median, Q3, Max and number of Nan values per column are reported.

Usage

from vantage6.client import Client
from pathlib import Path

# Create, athenticate and setup client
client = Client("http://127.0.0.1", 5000, "")
client.authenticate("frank@iknl.nl", "password")
client.setup_encryption(None)

# Define algorithm input
# include the columns you want to summarize, 
# and specify if they are categorical ("category" or "c") or numeric ("numeric" or "n")
input_ = {
    "master": True,
    "method":"master",
    "args": [],
    "kwargs": {
        "query": "SELECT * WHERE {?s ?p ?o}"     
    }
}

# Send the task to the central server
task = client.task.create(name="algo_testing-summary",
                          image="harbor2.vantage6.ai/testing/summary:latest",
                          input=input_,
                          collaboration=1, 
                          organizations=[2],
                          description=""
                          )

# Retrieve the results
print("Waiting for results")
task_id = task.get("id")
task_info = client.task.get(task_id)
while not task_info.get("complete"):
    task_info = client.task.get(task_id, include_results=True)
    print("Waiting for results")
    time.sleep(3)
print("Results are ready!")

result_info = client.result.get(task_info.get("results")[0].get("id"))
result = result_info["result"]

Test / Develop

You need to have Docker installed.

To Build (assuming you are in the project-directory):

docker build -t harbor.vantage6.ai/algorithms/summary .

To test/run locally the folder local is included in the repository. The following command mounts these files and sets the docker ENVIROMENT_VARIABLE DATABASE_URI.

docker run -e DATABASE_URI=/app/database.csv -v .\local\input.txt:/app/input.txt -v .\local\output.txt:/app/output.txt -v .\local\database.csv:/app/database.csv harbor.vantage6.ai/algorithms/summary

About

Summary of the federated dataset

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Languages

  • Python 52.3%
  • Jupyter Notebook 46.6%
  • Dockerfile 1.1%