Skip to content

Commit

Permalink
Extend adapter with the functionality to export into cuGraph (#64)
Browse files Browse the repository at this point in the history
* add cuGraph code

* fix formatting

* add rapids logo

* README.md: add cuGraph

* format with black

* rerun black with recent version

* fix imports with isort

* add conda-incubator/setup-miniconda

* add conda packages

* cudatoolkit installation

* change setup order

* add default shell

* run pytest in conda env

* add driver

* run on ubuntu-18.04

* test sdist build

* test self-hosted

* run conda

* fix typo

* activate cugraph conda env

* <xx

* init conda

* test conda env

* test gpu build

* fix conda env name

* init conda

* run in same step

* run pytest in conda env

* use python version matrix

* remove duplicate step

* run on different conda envs

* pip setup in conda env

* remove old build job

* replace build job to run on self-hosted runner

* make cuGraph imports optional

* add cugraph test coverage

* fix formatting

Co-authored-by: Chris Woodward <cw00dw0rd@gmail.com>
  • Loading branch information
maxkernbach and cw00dw0rd committed Mar 4, 2022
1 parent d784f93 commit 808d9bf
Show file tree
Hide file tree
Showing 7 changed files with 388 additions and 23 deletions.
27 changes: 18 additions & 9 deletions .github/workflows/build.yml
Original file line number Diff line number Diff line change
Expand Up @@ -24,21 +24,27 @@ env:
TESTS_DIR: tests
jobs:
build:
runs-on: ubuntu-latest
runs-on: self-hosted
defaults:
run:
shell: bash -l {0}
strategy:
matrix:
python: ["3.6", "3.7", "3.8", "3.9", "3.10"]
name: Python ${{ matrix.python }}
python: ["3.7", "3.8"]
name: gpu
steps:
- uses: actions/checkout@v2
- name: Setup Python ${{ matrix.python }}
- name: Setup python
uses: actions/setup-python@v2
with:
python-version: ${{ matrix.python }}
- name: Setup pip
run: python -m pip install --upgrade pip setuptools wheel
- name: Install packages
run: pip install .[dev]
- name: Install dependencies
run: |
source ~/anaconda3/etc/profile.d/conda.sh
conda activate ${{ matrix.python }}
pip install .[dev]
- name: Run black
run: black --check --verbose --diff --color ${{env.PACKAGE_DIR}} ${{env.TESTS_DIR}}
- name: Run flake8
Expand All @@ -47,10 +53,13 @@ jobs:
run: isort --check --profile=black ${{env.PACKAGE_DIR}} ${{env.TESTS_DIR}}
- name: Run mypy
run: mypy ${{env.PACKAGE_DIR}} ${{env.TESTS_DIR}}
- name: Run pytest
run: py.test --cov=${{env.PACKAGE_DIR}} --cov-report xml -v --color=yes --no-cov-on-fail --code-highlight=yes
- name: Run pytest in conda env
run: |
source ~/anaconda3/etc/profile.d/conda.sh
conda activate ${{ matrix.python }}
conda run -n ${{ matrix.python }} py.test --cov=${{env.PACKAGE_DIR}} --cov-report xml -v --color=yes --no-cov-on-fail --code-highlight=yes
- name: Publish to coveralls.io
if: matrix.python == '3.8'
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
run: coveralls --service=github
run: coveralls --service=github
25 changes: 17 additions & 8 deletions .github/workflows/release.yml
Original file line number Diff line number Diff line change
Expand Up @@ -8,21 +8,27 @@ env:
TESTS_DIR: tests
jobs:
build:
runs-on: ubuntu-latest
runs-on: self-hosted
defaults:
run:
shell: bash -l {0}
strategy:
matrix:
python: ["3.6", "3.7", "3.8", "3.9", "3.10"]
name: Python ${{ matrix.python }}
python: ["3.7", "3.8"]
name: gpu
steps:
- uses: actions/checkout@v2
- name: Setup Python ${{ matrix.python }}
- name: Setup python
uses: actions/setup-python@v2
with:
python-version: ${{ matrix.python }}
- name: Setup pip
run: python -m pip install --upgrade pip setuptools wheel
- name: Install packages
run: pip install .[dev]
- name: Install dependencies
run: |
source ~/anaconda3/etc/profile.d/conda.sh
conda activate ${{ matrix.python }}
pip install .[dev]
- name: Run black
run: black --check --verbose --diff --color ${{env.PACKAGE_DIR}} ${{env.TESTS_DIR}}
- name: Run flake8
Expand All @@ -31,8 +37,11 @@ jobs:
run: isort --check --profile=black ${{env.PACKAGE_DIR}} ${{env.TESTS_DIR}}
- name: Run mypy
run: mypy ${{env.PACKAGE_DIR}} ${{env.TESTS_DIR}}
- name: Run pytest
run: py.test --cov=${{env.PACKAGE_DIR}} --cov-report xml -v --color=yes --no-cov-on-fail --code-highlight=yes
- name: Run pytest in conda env
run: |
source ~/anaconda3/etc/profile.d/conda.sh
conda activate ${{ matrix.python }}
conda run -n ${{ matrix.python }} py.test --cov=${{env.PACKAGE_DIR}} --cov-report xml -v --color=yes --no-cov-on-fail --code-highlight=yes
- name: Publish to coveralls.io
if: matrix.python == '3.8'
env:
Expand Down
50 changes: 44 additions & 6 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# ArangoDB-Networkx Adapter
# ArangoDB-Networkx-cuGraph Adapter
[![build](https://github.com/arangoml/networkx-adapter/actions/workflows/build.yml/badge.svg?branch=master)](https://github.com/arangoml/networkx-adapter/actions/workflows/build.yml)
[![CodeQL](https://github.com/arangoml/networkx-adapter/actions/workflows/analyze.yml/badge.svg?branch=master)](https://github.com/arangoml/networkx-adapter/actions/workflows/analyze.yml)
[![Coverage Status](https://coveralls.io/repos/github/arangoml/networkx-adapter/badge.svg?branch=master)](https://coveralls.io/github/arangoml/networkx-adapter)
Expand All @@ -12,9 +12,12 @@
[![Downloads](https://img.shields.io/badge/dynamic/json?style=for-the-badge&color=282661&label=Downloads&query=total_downloads&url=https://api.pepy.tech/api/projects/adbnx-adapter)](https://pepy.tech/project/adbnx-adapter)

<a href="https://www.arangodb.com/" rel="arangodb.com">![](./examples/assets/logos/ArangoDB_logo.png)</a>
<a href="https://networkx.org/" rel="networkx.org">![](./examples/assets/logos/networkx_logo.svg)</a>
<a href="https://networkx.org/" rel="networkx.org">![](./examples/assets/logos/networkx_logo.svg)</a>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
<a href="https://github.com/rapidsai/cugraph" rel="github.com/rapidsai/cugraph"><img src="./examples/assets/logos/rapids-logo.png" width=30% height=30%>
</a>

The ArangoDB-Networkx Adapter exports Graphs from ArangoDB, a multi-model Graph Database, into NetworkX, the swiss army knife for graph analysis with python, and vice-versa.

The ArangoDB-Networkx-cuGraph Adapter exports Graphs from ArangoDB, a multi-model Graph Database, into NetworkX, the swiss army knife for graph analysis with python, and vice-versa. Additionally you can export ArangoDB graphs into RAPIDS cuGraph library, which is a collection of GPU accelerated graph algorithms.



Expand All @@ -24,13 +27,17 @@ Networkx is a commonly used tool for analysis of network-data. If your analytics

1. An algorithm for your use case is available in Networkx.
2. A library that you want to use for your use case works with Networkx Graphs as input.

## About RAPIDS cuGraph

While offering a similar API and set of graph algorithms to NetworkX, RAPIDS cuGraph library is GPU based. Especially for large graphs, this results in a significant performance improvement of cuGraph compared to NetworkX. Please note that storing node attributes is currently not supported by cuGraph. In order to run cuGraph, a Nvidia CUDA enabled GPU is required.

## Quickstart
## Quickstart: ArangoDB &rarr; NetworkX

Get Started on Colab: <a href="https://colab.research.google.com/github/arangoml/networkx-adapter/blob/master/examples/ArangoDB_NetworkX_Adapter.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

```py
# Import the ArangoDB-NetworkX Adapter
# Import the ArangoDB-NetworkX-cuGraph Adapter
from adbnx_adapter.adapter import ADBNX_Adapter

# Import a sample graph from NetworkX
Expand Down Expand Up @@ -84,6 +91,37 @@ adb_grid_edge_definitions = [
]
adb_grid_graph = adbnx_adapter.networkx_to_arangodb("Grid", nx_grid_graph, adb_grid_edge_definitions)
```
## Quickstart: ArangoDB &rarr; cuGraph

```py
# Import the ArangoDB-NetworkX-cuGraph Adapter
from adbnx_adapter.adapter import ADBNX_Adapter

# This is the connection information for your ArangoDB instance
# (Let's assume that the ArangoDB fraud-detection data dump is imported to this endpoint)
con = {
"hostname": "localhost",
"protocol": "http",
"port": 8529,
"username": "root",
"password": "rootpassword",
"dbName": "_system",
}

# This instantiates your ADBNX Adapter with your connection credentials
adbnx_adapter = ADBNX_Adapter(con)

# ArangoDB to cuGraph via Graph
nx_fraud_graph = adbnx_adapter.arangodb_graph_to_cugraph("fraud-detection")

# ArangoDB to cuGraph via Collections
nx_fraud_graph_2 = adbnx_adapter.arangodb_collections_to_cugraph(
"fraud-detection",
{"account", "bank", "branch", "Class", "customer"}, # Specify vertex collections
{"accountHolder", "Relationship", "transaction"} # Specify edge collections
)
```


## Development & Testing

Expand All @@ -94,4 +132,4 @@ Prerequisite: `arangorestore` must be installed
3. `python -m venv .venv`
4. `source .venv/bin/activate` (MacOS) or `.venv/scripts/activate` (Windows)
5. `pip install -e .[dev]`
6. `pytest`
6. `pytest`
34 changes: 34 additions & 0 deletions adbnx_adapter/abc.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,13 @@
from typing import Any, List, Set

from arango.graph import Graph as ArangoDBGraph

try:
from cugraph import MultiGraph as cuGraphMultiGraph

cugraph = True
except ImportError:
cugraph = False
from networkx.classes.graph import Graph as NetworkXGraph
from networkx.classes.multidigraph import MultiDiGraph

Expand Down Expand Up @@ -33,6 +40,33 @@ def arangodb_collections_to_networkx(
) -> MultiDiGraph:
raise NotImplementedError # pragma: no cover

if cugraph is False:
pass
else:

def arangodb_to_cugraph(
self,
name: str,
metagraph: ArangoMetagraph,
is_keep: bool = True,
**query_options: Any,
) -> cuGraphMultiGraph(directed=True): # type: ignore
raise NotImplementedError # pragma: no cover

def arangodb_collections_to_cugraph(
self,
name: str,
v_cols: Set[str],
e_cols: Set[str],
**query_options: Any,
) -> cuGraphMultiGraph(directed=True): # type: ignore
raise NotImplementedError # pragma: no cover

def arangodb_graph_to_cugraph(
self, name: str, **query_options: Any
) -> cuGraphMultiGraph(directed=True): # type: ignore
raise NotImplementedError # pragma: no cover

def arangodb_graph_to_networkx(
self, name: str, **query_options: Any
) -> MultiDiGraph:
Expand Down
144 changes: 144 additions & 0 deletions adbnx_adapter/adapter.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,21 @@
from arango.cursor import Cursor
from arango.graph import Graph as ArangoDBGraph
from arango.result import Result

try:
from cudf import DataFrame

cudf = True
except ImportError as e:
print(e)
cudf = False
try:
from cugraph import MultiGraph as cuGraphMultiGraph

cugraph = True
except ImportError as e:
print(e)
cugraph = False
from networkx import MultiDiGraph
from networkx.classes.graph import Graph as NetworkXGraph
from networkx.classes.multidigraph import MultiDiGraph as NetworkXMultiDiGraph
Expand Down Expand Up @@ -310,6 +325,135 @@ def networkx_to_arangodb(
print(f"ArangoDB: {name} created")
return adb_graph

if cugraph is False or cudf is False:
print(
"You are currently solely using the NetworkX export functionality.",
"Please note that modules 'cudf' and 'cugraph' are required to perform",
"exports into cuGraph. ",
)
else:

def arangodb_to_cugraph(
self,
name: str,
metagraph: ArangoMetagraph,
is_keep: bool = True,
**query_options: Any,
) -> cuGraphMultiGraph(directed=True): # type: ignore
"""Create a cuGraph graph from graph attributes.
:param name: The cuGraph graph name.
:type name: str
:param metagraph: An object defining vertex & edge collections to import to
cuGraph, along with their associated attributes to keep.
:type metagraph: adbnx_adapter.typings.ArangoMetagraph
:param is_keep: Only keep the document attributes specified in **metagraph**
when importing to cuGraph (is True by default).
:type is_keep: bool
:param query_options: Keyword arguments to specify AQL query options when
fetching documents from the ArangoDB instance.
:type query_options: Any
:return: A Multi-Directed cuGraph Graph.
:rtype: cugraph.structure.graph_classes.MultiDiGraph
:raise ValueError: If missing required keys in metagraph
Here is an example entry for parameter **metagraph**:
.. code-block:: python
{
"vertexCollections": {
"account": {"Balance", "account_type", "customer_id", "rank"},
"bank": {"Country", "Id", "bank_id", "bank_name"},
"customer": {"Name", "Sex", "Ssn", "rank"},
},
"edgeCollections": {
"accountHolder": {},
"transaction": {
"transaction_amt", "receiver_bank_id", "sender_bank_id"
},
},
}
"""
self.__validate_attributes("graph", set(metagraph), self.METAGRAPH_ATRIBS)

# Maps ArangoDB vertex IDs to cuGraph node IDs
adb_map: Dict[str, Dict[str, Union[NxId, str]]] = dict()
cg_edges: List[Tuple[NxId, NxId]] = []

adb_v: Json
for col, atribs in metagraph["vertexCollections"].items():
for adb_v in self.__fetch_adb_docs(col, atribs, is_keep, query_options):
adb_id: str = adb_v["_id"]
nx_id = self.__cntrl._prepare_arangodb_vertex(adb_v, col)
adb_map[adb_id] = {"nx_id": nx_id, "collection": col}

adb_e: Json
for col, atribs in metagraph["edgeCollections"].items():
for adb_e in self.__fetch_adb_docs(col, atribs, is_keep, query_options):
from_node_id: NxId = adb_map[adb_e["_from"]]["nx_id"]
to_node_id: NxId = adb_map[adb_e["_to"]]["nx_id"]
self.__cntrl._prepare_arangodb_edge(adb_e, col)
cg_edges.append((from_node_id, to_node_id))

srcs = [s for (s, _) in cg_edges]
dsts = [d for (_, d) in cg_edges]
cg_graph = cuGraphMultiGraph(directed=True)
cg_graph.from_cudf_edgelist(
DataFrame({"source": srcs, "destination": dsts})
)

print(f"cuGraph: {name} created")
return cg_graph

def arangodb_collections_to_cugraph(
self,
name: str,
v_cols: Set[str],
e_cols: Set[str],
**query_options: Any,
) -> cuGraphMultiGraph(directed=True): # type: ignore
"""Create a cuGraph graph from ArangoDB collections.
:param name: The cuGraph graph name.
:type name: str
:param v_cols: A set of vertex collections to import to cuGraph.
:type v_cols: Set[str]
:param e_cols: A set of edge collections to import to cuGraph.
:type e_cols: Set[str]
:param query_options: Keyword arguments to specify AQL query options when
fetching documents from the ArangoDB instance.
:type query_options: Any
:return: A Multi-Directed cuGraph Graph.
:rtype: cugraph.structure.graph_classes.MultiDiGraph
"""
metagraph: ArangoMetagraph = {
"vertexCollections": {col: set() for col in v_cols},
"edgeCollections": {col: set() for col in e_cols},
}

return self.arangodb_to_cugraph(
name, metagraph, is_keep=True, **query_options
)

def arangodb_graph_to_cugraph(
self, name: str, **query_options: Any
) -> cuGraphMultiGraph(directed=True): # type: ignore
"""Create a cuGraph graph from an ArangoDB graph.
:param name: The ArangoDB graph name.
:type name: str
:param query_options: Keyword arguments to specify AQL query options when
fetching documents from the ArangoDB instance.
:type query_options: Any
:return: A Multi-Directed cuGraph Graph.
:rtype: cugraph.structure.graph_classes.MultiDiGraph
"""
graph = self.__db.graph(name)
v_cols = graph.vertex_collections()
e_cols = {col["edge_collection"] for col in graph.edge_definitions()}

return self.arangodb_collections_to_cugraph(
name, v_cols, e_cols, **query_options
)

def __validate_attributes(
self, type: str, attributes: Set[str], valid_attributes: Set[str]
) -> None:
Expand Down
Binary file added examples/assets/logos/rapids-logo.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading

0 comments on commit 808d9bf

Please sign in to comment.