Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extend adapter with the functionality to export into cuGraph #64

Merged
merged 38 commits into from
Mar 4, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
38 commits
Select commit Hold shift + click to select a range
2b6f919
add cuGraph code
maxkernbach Feb 22, 2022
408772d
fix formatting
maxkernbach Feb 22, 2022
8521308
add rapids logo
maxkernbach Feb 22, 2022
c34bcc3
README.md: add cuGraph
maxkernbach Feb 22, 2022
fd8e9be
format with black
maxkernbach Feb 22, 2022
fcccc8a
rerun black with recent version
maxkernbach Feb 22, 2022
10d742e
fix imports with isort
maxkernbach Feb 22, 2022
013dc88
add conda-incubator/setup-miniconda
maxkernbach Feb 23, 2022
f5f1d77
add conda packages
maxkernbach Feb 23, 2022
d45c358
cudatoolkit installation
maxkernbach Feb 23, 2022
d7e7923
change setup order
maxkernbach Feb 23, 2022
c3c4c10
add default shell
maxkernbach Feb 23, 2022
05851e6
run pytest in conda env
maxkernbach Feb 23, 2022
fad18e9
add driver
maxkernbach Feb 24, 2022
899b2de
run on ubuntu-18.04
maxkernbach Feb 24, 2022
7d9da50
test sdist build
maxkernbach Feb 25, 2022
79c2aac
test self-hosted
maxkernbach Feb 25, 2022
7c87997
run conda
maxkernbach Feb 25, 2022
facf457
fix typo
maxkernbach Feb 25, 2022
0f0e462
activate cugraph conda env
maxkernbach Feb 25, 2022
1d70f08
<xx
maxkernbach Feb 25, 2022
404e55c
init conda
maxkernbach Feb 25, 2022
01bd40d
test conda env
maxkernbach Feb 25, 2022
0e6560d
test gpu build
maxkernbach Feb 25, 2022
5204ce3
fix conda env name
maxkernbach Feb 25, 2022
69d9206
init conda
maxkernbach Feb 25, 2022
7ea930e
run in same step
maxkernbach Feb 25, 2022
4b43e18
run pytest in conda env
maxkernbach Feb 25, 2022
2c9cdf9
use python version matrix
maxkernbach Feb 25, 2022
530c57b
remove duplicate step
maxkernbach Feb 25, 2022
47861b0
run on different conda envs
maxkernbach Feb 28, 2022
e4390d1
pip setup in conda env
maxkernbach Feb 28, 2022
c0dd887
remove old build job
maxkernbach Mar 1, 2022
77997ea
replace build job to run on self-hosted runner
maxkernbach Mar 1, 2022
1886b32
make cuGraph imports optional
maxkernbach Mar 2, 2022
868dcce
add cugraph test coverage
maxkernbach Mar 2, 2022
0b7d6c7
fix formatting
maxkernbach Mar 2, 2022
20fdddd
Merge branch 'master' into add-cugraph
cw00dw0rd Mar 3, 2022
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
27 changes: 18 additions & 9 deletions .github/workflows/build.yml
Original file line number Diff line number Diff line change
Expand Up @@ -24,21 +24,27 @@ env:
TESTS_DIR: tests
jobs:
build:
runs-on: ubuntu-latest
runs-on: self-hosted
defaults:
run:
shell: bash -l {0}
strategy:
matrix:
python: ["3.6", "3.7", "3.8", "3.9", "3.10"]
name: Python ${{ matrix.python }}
python: ["3.7", "3.8"]
name: gpu
steps:
- uses: actions/checkout@v2
- name: Setup Python ${{ matrix.python }}
- name: Setup python
uses: actions/setup-python@v2
with:
python-version: ${{ matrix.python }}
- name: Setup pip
run: python -m pip install --upgrade pip setuptools wheel
- name: Install packages
run: pip install .[dev]
- name: Install dependencies
run: |
source ~/anaconda3/etc/profile.d/conda.sh
conda activate ${{ matrix.python }}
pip install .[dev]
- name: Run black
run: black --check --verbose --diff --color ${{env.PACKAGE_DIR}} ${{env.TESTS_DIR}}
- name: Run flake8
Expand All @@ -47,10 +53,13 @@ jobs:
run: isort --check --profile=black ${{env.PACKAGE_DIR}} ${{env.TESTS_DIR}}
- name: Run mypy
run: mypy ${{env.PACKAGE_DIR}} ${{env.TESTS_DIR}}
- name: Run pytest
run: py.test --cov=${{env.PACKAGE_DIR}} --cov-report xml -v --color=yes --no-cov-on-fail --code-highlight=yes
- name: Run pytest in conda env
run: |
source ~/anaconda3/etc/profile.d/conda.sh
conda activate ${{ matrix.python }}
conda run -n ${{ matrix.python }} py.test --cov=${{env.PACKAGE_DIR}} --cov-report xml -v --color=yes --no-cov-on-fail --code-highlight=yes
- name: Publish to coveralls.io
if: matrix.python == '3.8'
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
run: coveralls --service=github
run: coveralls --service=github
25 changes: 17 additions & 8 deletions .github/workflows/release.yml
Original file line number Diff line number Diff line change
Expand Up @@ -8,21 +8,27 @@ env:
TESTS_DIR: tests
jobs:
build:
runs-on: ubuntu-latest
runs-on: self-hosted
defaults:
run:
shell: bash -l {0}
strategy:
matrix:
python: ["3.6", "3.7", "3.8", "3.9", "3.10"]
name: Python ${{ matrix.python }}
python: ["3.7", "3.8"]
name: gpu
steps:
- uses: actions/checkout@v2
- name: Setup Python ${{ matrix.python }}
- name: Setup python
uses: actions/setup-python@v2
with:
python-version: ${{ matrix.python }}
- name: Setup pip
run: python -m pip install --upgrade pip setuptools wheel
- name: Install packages
run: pip install .[dev]
- name: Install dependencies
run: |
source ~/anaconda3/etc/profile.d/conda.sh
conda activate ${{ matrix.python }}
pip install .[dev]
- name: Run black
run: black --check --verbose --diff --color ${{env.PACKAGE_DIR}} ${{env.TESTS_DIR}}
- name: Run flake8
Expand All @@ -31,8 +37,11 @@ jobs:
run: isort --check --profile=black ${{env.PACKAGE_DIR}} ${{env.TESTS_DIR}}
- name: Run mypy
run: mypy ${{env.PACKAGE_DIR}} ${{env.TESTS_DIR}}
- name: Run pytest
run: py.test --cov=${{env.PACKAGE_DIR}} --cov-report xml -v --color=yes --no-cov-on-fail --code-highlight=yes
- name: Run pytest in conda env
run: |
source ~/anaconda3/etc/profile.d/conda.sh
conda activate ${{ matrix.python }}
conda run -n ${{ matrix.python }} py.test --cov=${{env.PACKAGE_DIR}} --cov-report xml -v --color=yes --no-cov-on-fail --code-highlight=yes
- name: Publish to coveralls.io
if: matrix.python == '3.8'
env:
Expand Down
50 changes: 44 additions & 6 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# ArangoDB-Networkx Adapter
# ArangoDB-Networkx-cuGraph Adapter
[![build](https://github.com/arangoml/networkx-adapter/actions/workflows/build.yml/badge.svg?branch=master)](https://github.com/arangoml/networkx-adapter/actions/workflows/build.yml)
[![CodeQL](https://github.com/arangoml/networkx-adapter/actions/workflows/analyze.yml/badge.svg?branch=master)](https://github.com/arangoml/networkx-adapter/actions/workflows/analyze.yml)
[![Coverage Status](https://coveralls.io/repos/github/arangoml/networkx-adapter/badge.svg?branch=master)](https://coveralls.io/github/arangoml/networkx-adapter)
Expand All @@ -12,9 +12,12 @@
[![Downloads](https://img.shields.io/badge/dynamic/json?style=for-the-badge&color=282661&label=Downloads&query=total_downloads&url=https://api.pepy.tech/api/projects/adbnx-adapter)](https://pepy.tech/project/adbnx-adapter)

<a href="https://www.arangodb.com/" rel="arangodb.com">![](./examples/assets/logos/ArangoDB_logo.png)</a>
<a href="https://networkx.org/" rel="networkx.org">![](./examples/assets/logos/networkx_logo.svg)</a>
<a href="https://networkx.org/" rel="networkx.org">![](./examples/assets/logos/networkx_logo.svg)</a>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
<a href="https://github.com/rapidsai/cugraph" rel="github.com/rapidsai/cugraph"><img src="./examples/assets/logos/rapids-logo.png" width=30% height=30%>
</a>

The ArangoDB-Networkx Adapter exports Graphs from ArangoDB, a multi-model Graph Database, into NetworkX, the swiss army knife for graph analysis with python, and vice-versa.

The ArangoDB-Networkx-cuGraph Adapter exports Graphs from ArangoDB, a multi-model Graph Database, into NetworkX, the swiss army knife for graph analysis with python, and vice-versa. Additionally you can export ArangoDB graphs into RAPIDS cuGraph library, which is a collection of GPU accelerated graph algorithms.



Expand All @@ -24,13 +27,17 @@ Networkx is a commonly used tool for analysis of network-data. If your analytics

1. An algorithm for your use case is available in Networkx.
2. A library that you want to use for your use case works with Networkx Graphs as input.

## About RAPIDS cuGraph

While offering a similar API and set of graph algorithms to NetworkX, RAPIDS cuGraph library is GPU based. Especially for large graphs, this results in a significant performance improvement of cuGraph compared to NetworkX. Please note that storing node attributes is currently not supported by cuGraph. In order to run cuGraph, a Nvidia CUDA enabled GPU is required.

## Quickstart
## Quickstart: ArangoDB &rarr; NetworkX

Get Started on Colab: <a href="https://colab.research.google.com/github/arangoml/networkx-adapter/blob/master/examples/ArangoDB_NetworkX_Adapter.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

```py
# Import the ArangoDB-NetworkX Adapter
# Import the ArangoDB-NetworkX-cuGraph Adapter
from adbnx_adapter.adapter import ADBNX_Adapter

# Import a sample graph from NetworkX
Expand Down Expand Up @@ -84,6 +91,37 @@ adb_grid_edge_definitions = [
]
adb_grid_graph = adbnx_adapter.networkx_to_arangodb("Grid", nx_grid_graph, adb_grid_edge_definitions)
```
## Quickstart: ArangoDB &rarr; cuGraph

```py
# Import the ArangoDB-NetworkX-cuGraph Adapter
from adbnx_adapter.adapter import ADBNX_Adapter

# This is the connection information for your ArangoDB instance
# (Let's assume that the ArangoDB fraud-detection data dump is imported to this endpoint)
con = {
"hostname": "localhost",
"protocol": "http",
"port": 8529,
"username": "root",
"password": "rootpassword",
"dbName": "_system",
}

# This instantiates your ADBNX Adapter with your connection credentials
adbnx_adapter = ADBNX_Adapter(con)

# ArangoDB to cuGraph via Graph
nx_fraud_graph = adbnx_adapter.arangodb_graph_to_cugraph("fraud-detection")

# ArangoDB to cuGraph via Collections
nx_fraud_graph_2 = adbnx_adapter.arangodb_collections_to_cugraph(
"fraud-detection",
{"account", "bank", "branch", "Class", "customer"}, # Specify vertex collections
{"accountHolder", "Relationship", "transaction"} # Specify edge collections
)
```


## Development & Testing

Expand All @@ -94,4 +132,4 @@ Prerequisite: `arangorestore` must be installed
3. `python -m venv .venv`
4. `source .venv/bin/activate` (MacOS) or `.venv/scripts/activate` (Windows)
5. `pip install -e .[dev]`
6. `pytest`
6. `pytest`
34 changes: 34 additions & 0 deletions adbnx_adapter/abc.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,13 @@
from typing import Any, List, Set

from arango.graph import Graph as ArangoDBGraph

try:
from cugraph import MultiGraph as cuGraphMultiGraph

cugraph = True
except ImportError:
cugraph = False
from networkx.classes.graph import Graph as NetworkXGraph
from networkx.classes.multidigraph import MultiDiGraph

Expand Down Expand Up @@ -33,6 +40,33 @@ def arangodb_collections_to_networkx(
) -> MultiDiGraph:
raise NotImplementedError # pragma: no cover

if cugraph is False:
pass
else:

def arangodb_to_cugraph(
self,
name: str,
metagraph: ArangoMetagraph,
is_keep: bool = True,
**query_options: Any,
) -> cuGraphMultiGraph(directed=True): # type: ignore
raise NotImplementedError # pragma: no cover

def arangodb_collections_to_cugraph(
self,
name: str,
v_cols: Set[str],
e_cols: Set[str],
**query_options: Any,
) -> cuGraphMultiGraph(directed=True): # type: ignore
raise NotImplementedError # pragma: no cover

def arangodb_graph_to_cugraph(
self, name: str, **query_options: Any
) -> cuGraphMultiGraph(directed=True): # type: ignore
raise NotImplementedError # pragma: no cover

def arangodb_graph_to_networkx(
self, name: str, **query_options: Any
) -> MultiDiGraph:
Expand Down
144 changes: 144 additions & 0 deletions adbnx_adapter/adapter.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,21 @@
from arango.cursor import Cursor
from arango.graph import Graph as ArangoDBGraph
from arango.result import Result

try:
from cudf import DataFrame

cudf = True
except ImportError as e:
print(e)
cudf = False
try:
from cugraph import MultiGraph as cuGraphMultiGraph

cugraph = True
except ImportError as e:
print(e)
cugraph = False
from networkx import MultiDiGraph
from networkx.classes.graph import Graph as NetworkXGraph
from networkx.classes.multidigraph import MultiDiGraph as NetworkXMultiDiGraph
Expand Down Expand Up @@ -310,6 +325,135 @@ def networkx_to_arangodb(
print(f"ArangoDB: {name} created")
return adb_graph

if cugraph is False or cudf is False:
print(
"You are currently solely using the NetworkX export functionality.",
"Please note that modules 'cudf' and 'cugraph' are required to perform",
"exports into cuGraph. ",
)
else:

def arangodb_to_cugraph(
self,
name: str,
metagraph: ArangoMetagraph,
is_keep: bool = True,
**query_options: Any,
) -> cuGraphMultiGraph(directed=True): # type: ignore
"""Create a cuGraph graph from graph attributes.

:param name: The cuGraph graph name.
:type name: str
:param metagraph: An object defining vertex & edge collections to import to
cuGraph, along with their associated attributes to keep.
:type metagraph: adbnx_adapter.typings.ArangoMetagraph
:param is_keep: Only keep the document attributes specified in **metagraph**
when importing to cuGraph (is True by default).
:type is_keep: bool
:param query_options: Keyword arguments to specify AQL query options when
fetching documents from the ArangoDB instance.
:type query_options: Any
:return: A Multi-Directed cuGraph Graph.
:rtype: cugraph.structure.graph_classes.MultiDiGraph
:raise ValueError: If missing required keys in metagraph

Here is an example entry for parameter **metagraph**:

.. code-block:: python
{
"vertexCollections": {
"account": {"Balance", "account_type", "customer_id", "rank"},
"bank": {"Country", "Id", "bank_id", "bank_name"},
"customer": {"Name", "Sex", "Ssn", "rank"},
},
"edgeCollections": {
"accountHolder": {},
"transaction": {
"transaction_amt", "receiver_bank_id", "sender_bank_id"
},
},
}
"""
self.__validate_attributes("graph", set(metagraph), self.METAGRAPH_ATRIBS)

# Maps ArangoDB vertex IDs to cuGraph node IDs
adb_map: Dict[str, Dict[str, Union[NxId, str]]] = dict()
cg_edges: List[Tuple[NxId, NxId]] = []

adb_v: Json
for col, atribs in metagraph["vertexCollections"].items():
for adb_v in self.__fetch_adb_docs(col, atribs, is_keep, query_options):
adb_id: str = adb_v["_id"]
nx_id = self.__cntrl._prepare_arangodb_vertex(adb_v, col)
adb_map[adb_id] = {"nx_id": nx_id, "collection": col}

adb_e: Json
for col, atribs in metagraph["edgeCollections"].items():
for adb_e in self.__fetch_adb_docs(col, atribs, is_keep, query_options):
from_node_id: NxId = adb_map[adb_e["_from"]]["nx_id"]
to_node_id: NxId = adb_map[adb_e["_to"]]["nx_id"]
self.__cntrl._prepare_arangodb_edge(adb_e, col)
cg_edges.append((from_node_id, to_node_id))

srcs = [s for (s, _) in cg_edges]
dsts = [d for (_, d) in cg_edges]
cg_graph = cuGraphMultiGraph(directed=True)
cg_graph.from_cudf_edgelist(
DataFrame({"source": srcs, "destination": dsts})
)

print(f"cuGraph: {name} created")
return cg_graph

def arangodb_collections_to_cugraph(
self,
name: str,
v_cols: Set[str],
e_cols: Set[str],
**query_options: Any,
) -> cuGraphMultiGraph(directed=True): # type: ignore
"""Create a cuGraph graph from ArangoDB collections.
:param name: The cuGraph graph name.
:type name: str
:param v_cols: A set of vertex collections to import to cuGraph.
:type v_cols: Set[str]
:param e_cols: A set of edge collections to import to cuGraph.
:type e_cols: Set[str]
:param query_options: Keyword arguments to specify AQL query options when
fetching documents from the ArangoDB instance.
:type query_options: Any
:return: A Multi-Directed cuGraph Graph.
:rtype: cugraph.structure.graph_classes.MultiDiGraph
"""
metagraph: ArangoMetagraph = {
"vertexCollections": {col: set() for col in v_cols},
"edgeCollections": {col: set() for col in e_cols},
}

return self.arangodb_to_cugraph(
name, metagraph, is_keep=True, **query_options
)

def arangodb_graph_to_cugraph(
self, name: str, **query_options: Any
) -> cuGraphMultiGraph(directed=True): # type: ignore
"""Create a cuGraph graph from an ArangoDB graph.
:param name: The ArangoDB graph name.
:type name: str
:param query_options: Keyword arguments to specify AQL query options when
fetching documents from the ArangoDB instance.
:type query_options: Any
:return: A Multi-Directed cuGraph Graph.
:rtype: cugraph.structure.graph_classes.MultiDiGraph
"""
graph = self.__db.graph(name)
v_cols = graph.vertex_collections()
e_cols = {col["edge_collection"] for col in graph.edge_definitions()}

return self.arangodb_collections_to_cugraph(
name, v_cols, e_cols, **query_options
)

def __validate_attributes(
self, type: str, attributes: Set[str], valid_attributes: Set[str]
) -> None:
Expand Down
Binary file added examples/assets/logos/rapids-logo.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading