Skip to content

Commit

Permalink
Add integration tests (#7, PR #21)
Browse files Browse the repository at this point in the history
- Integration tests for the scrapyd-k8s API
- Run against Docker and Kubernetes
- Deploy sample manifest in minikube and run integration tests there

The integration tests can still be improved and expanded.
  • Loading branch information
wvengen committed Feb 27, 2024
1 parent ce13a81 commit 1e0fda4
Show file tree
Hide file tree
Showing 11 changed files with 395 additions and 39 deletions.
37 changes: 37 additions & 0 deletions .github/workflows/test-docker.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
name: Tests on Docker
on:
push:
branches:
- main
pull_request:

jobs:
container:
runs-on: ubuntu-latest
steps:
- name: Checkout
uses: actions/checkout@v4

- name: Set up Python
uses: actions/setup-python@v4
with:
python-version: 3.11
cache: 'pip'

- name: Install dependencies
run: |
pip install -r requirements.txt
pip install -r requirements-test.txt
- name: Pull example spider
run: docker pull ghcr.io/q-m/scrapyd-k8s-spider-example

- name: Run scrapyd-k8s
run: |
cp scrapyd_k8s.sample-docker.conf scrapyd_k8s.conf
python -m scrapyd_k8s &
while ! nc -q 1 localhost 6800 </dev/null; do sleep 1; done
curl http://localhost:6800/daemonstatus.json
- name: Run tests
run: pytest -vv test_api.py
47 changes: 47 additions & 0 deletions .github/workflows/test-k8s.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,47 @@
name: Tests on Kubernetes
on:
push:
branches:
- main
pull_request:

jobs:
container:
runs-on: ubuntu-latest
steps:
- name: Checkout
uses: actions/checkout@v4

- name: Set up Python
uses: actions/setup-python@v4
with:
python-version: 3.11
cache: 'pip'

- name: Install dependencies
run: |
pip install -r requirements.txt
pip install -r requirements-test.txt
- name: Start minikube
uses: medyagh/setup-minikube@master

- name: Prepare Kubernetes environment
run: |
kubectl create secret generic example-env-secret --from-literal=FOO_1=bar
kubectl create configmap example-env-configmap --from-literal=FOO_2=baz
# already pull image so we don't have to wait for it later
minikube image pull ghcr.io/q-m/scrapyd-k8s-spider-example:latest
- name: Run scrapyd-k8s
run: |
cp scrapyd_k8s.sample-k8s.conf scrapyd_k8s.conf
python -m scrapyd_k8s &
while ! nc -q 1 localhost 6800 </dev/null; do sleep 1; done
curl http://localhost:6800/daemonstatus.json
- name: Run tests
run: |
TEST_MAX_WAIT=60 \
TEST_AVAILABLE_VERSIONS=latest,`skopeo list-tags docker://ghcr.io/q-m/scrapyd-k8s-spider-example | jq -r '.Tags | map(select(. != "latest" and (startswith("sha-") | not))) | join(",")'` \
pytest -vv --color=yes test_api.py
61 changes: 61 additions & 0 deletions .github/workflows/test-manifest.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,61 @@
name: Test Kubernetes manifest
on:
push:
branches:
- main
pull_request:

jobs:
container:
runs-on: ubuntu-latest
steps:
- name: Checkout
uses: actions/checkout@v4

- name: Set up Python
uses: actions/setup-python@v4
with:
python-version: 3.11
cache: 'pip'

- name: Install dependencies
run: |
pip install -r requirements.txt
pip install -r requirements-test.txt
- name: Set up Docker Buildx
uses: docker/setup-buildx-action@v3

- name: Build container
uses: docker/build-push-action@v5
with:
context: .
push: false
load: true
tags: test:latest
cache-from: type=gha
cache-to: type=gha,mode=max

- name: Start minikube
uses: medyagh/setup-minikube@master

- name: Deploy to minikube
run: |
minikube image load test:latest
# already pull image so we don't have to wait for it later
minikube image pull ghcr.io/q-m/scrapyd-k8s-spider-example:latest
# load manifest
sed -i 's/\(imagePullPolicy:\s*\)\w\+/\1Never/' kubernetes.yaml
sed -i 's/\(image:\s*\)ghcr\.io\/q-m\/scrapyd-k8s:/\1test:/' kubernetes.yaml
sed -i 's/\(type:\s*\)ClusterIP/\1NodePort/' kubernetes.yaml
kubectl create -f kubernetes.yaml
# and wait for scrapyd-k8s to become ready
kubectl wait --for=condition=Available deploy/scrapyd-k8s --timeout=60s
curl --retry 10 --retry-delay 2 --retry-all-errors `minikube service scrapyd-k8s --url`/daemonstatus.json
- name: Run tests
run: |
TEST_BASE_URL=`minikube service scrapyd-k8s --url` \
TEST_MAX_WAIT=60 \
TEST_AVAILABLE_VERSIONS=latest,`skopeo list-tags docker://ghcr.io/q-m/scrapyd-k8s-spider-example | jq -r '.Tags | map(select(. != "latest" and (startswith("sha-") | not))) | join(",")'` \
pytest -vv --color=yes test_api.py
7 changes: 3 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -65,9 +65,8 @@ things out.

### Kubernetes

1. Create the spider namespace: `kubectl create namespace scrapyd`
2. Adapt the spider configuration in [`kubernetes.yaml`](./kubernetes.yaml) (`scrapyd_k8s.conf` in configmap)
3. Create the resources: `kubectl create -f kubernetes.yaml`
1. Adapt the spider configuration in [`kubernetes.yaml`](./kubernetes.yaml) (`scrapyd_k8s.conf` in configmap)
2. Create the resources: `kubectl create -f kubernetes.yaml`

You'll be able to talk to the `scrapyd-k8s` service on port `6800`.

Expand Down Expand Up @@ -134,7 +133,7 @@ curl 'http://localhost:6800/listspiders.json?project=example&_version=latest'
```
> ```json
> {"spiders":["quotes"],"status":"ok"}
> {"spiders":["quotes","static"],"status":"ok"}
> ```
```sh
Expand Down
19 changes: 13 additions & 6 deletions kubernetes.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -72,12 +72,13 @@ data:
repository = scrapyd_k8s.repository.Remote
launcher = scrapyd_k8s.launcher.K8s
namespace = scrapyd
namespace = default
# This is an example spider that should work out of the box.
# Adapt the spider config to your use-case, with an otherwise unused secret.
# Adapt the spider config to your use-case.
[project.example]
env_secret = spider-example-env
env_config = spider-example-env
repository = ghcr.io/q-m/scrapyd-k8s-spider-example
# It is strongly recomended to set resource requests and limits on production.
Expand All @@ -98,6 +99,15 @@ stringData:
FOO_API_KEY: "1234567890abcdef"
---
apiVersion: v1
kind: ConfigMap
metadata:
name: spider-example-env
labels:
app.kubernetes.io/name: spider-example
data:
BAR_VALUE: "baz"
---
apiVersion: v1
kind: Service
metadata:
name: scrapyd-k8s
Expand All @@ -117,20 +127,18 @@ apiVersion: v1
kind: ServiceAccount
metadata:
name: scrapyd-k8s
namespace: scrapyd
---
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
name: scrapyd-k8s
namespace: scrapyd
rules:
- apiGroups: [""]
resources: ["pods"]
verbs: ["get", "list"]
- apiGroups: [""]
resources: ["pods/exec"]
verbs: ["create"]
verbs: ["get"]
- apiGroups: ["batch"]
resources: ["jobs"]
verbs: ["get", "list", "create", "delete"]
Expand All @@ -139,7 +147,6 @@ apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: scrapyd-k8s
namespace: scrapyd
subjects:
- kind: ServiceAccount
name: scrapyd-k8s
Expand Down
2 changes: 2 additions & 0 deletions requirements-test.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
pytest>=6.0.0
requests>=2.0.0
5 changes: 3 additions & 2 deletions scrapyd_k8s.sample-k8s.conf
Original file line number Diff line number Diff line change
Expand Up @@ -14,9 +14,10 @@ repository = scrapyd_k8s.repository.Remote
launcher = scrapyd_k8s.launcher.K8s

# Namespace to work in (needs to exist).
namespace = scrapyd
# Check RBAC if you run scrapyd-k8s in a different namespace than spiders.
namespace = default
# Optional pull secret, in case you have private spiders.
pull_secret = ghcr-registry
#pull_secret = ghcr-registry

# For each project, define a project section.
# This contains a repository that points to the remote container repository.
Expand Down
4 changes: 2 additions & 2 deletions scrapyd_k8s/api.py
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,7 @@ def api_schedule():
return error('project missing in form parameters', status=400)
project = config.project(project_id)
if not project:
return error('project not found in configuration', status=404)
return error('project not found in configuration', status=400)
spider = request.form.get('spider')
if not spider:
return error('spider not found in form parameters', status=400)
Expand All @@ -55,7 +55,7 @@ def api_cancel():
job_id = request.form.get('job')
if not job_id:
return error('job missing in form parameters', status=400)
signal = request.form.get('signal', 'TERM')
signal = request.form.get('signal', 'TERM') # TODO validate signal?
prevstate = launcher.cancel(project_id, job_id, signal)
if not prevstate:
return error('job not found', status=404)
Expand Down
2 changes: 2 additions & 0 deletions scrapyd_k8s/launcher/docker.py
Original file line number Diff line number Diff line change
Expand Up @@ -38,6 +38,8 @@ def schedule(self, project, version, spider, job_id, settings, args):
command=['scrapy', 'crawl', spider, *_args, *_settings],
environment=env,
labels={
self.LABEL_PROJECT: project.id(),
self.LABEL_SPIDER: spider,
self.LABEL_JOB_ID: job_id,
},
name='_'.join(['scrapyd', project.id(), job_id]),
Expand Down
40 changes: 15 additions & 25 deletions scrapyd_k8s/launcher/k8s.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,7 @@
import kubernetes
import kubernetes.stream
from signal import Signals
from subprocess import check_output, CalledProcessError

from ..utils import native_stringify_dict

Expand All @@ -9,18 +11,6 @@ class K8s:
LABEL_SPIDER = 'org.scrapy.spider'
LABEL_JOB_ID = 'org.scrapy.job_id'

# translates status to scrapyd terminology
STATUS_MAP = {
'Pending': 'pending',
'Waiting': 'pending',
'Running': 'running',
'Succeeded': 'finished',
'Completed': 'finished',
'Terminated': 'finished',
# Failed
# Unknown
}

def __init__(self, config):
self._namespace = config.scrapyd().get('namespace', 'default')
self._pull_secret = config.scrapyd().get('pull_secret')
Expand All @@ -29,6 +19,7 @@ def __init__(self, config):
kubernetes.config.load_incluster_config()
except kubernetes.config.config_exception.ConfigException:
kubernetes.config.load_kube_config()

self._k8s = kubernetes.client.CoreV1Api()
self._k8s_batch = kubernetes.client.BatchV1Api()

Expand Down Expand Up @@ -79,6 +70,7 @@ def schedule(self, project, version, spider, job_id, settings, args):
metadata=kubernetes.client.V1ObjectMeta(name=job_name, labels=labels),
spec=kubernetes.client.V1PodSpec(
containers=[container],
share_process_namespace=True, # an init process for cancel
restart_policy='Never',
image_pull_secrets=[kubernetes.client.V1LocalObjectReference(s) for s in [self._pull_secret] if s]
)
Expand Down Expand Up @@ -108,10 +100,8 @@ def cancel(self, project, job_id, signal):
elif prevstate == 'running':
# kill pod (retry is disabled, so there should be only one pod)
pod = self._get_pod(project, job_id)
if not pod:
# job apparently just ended, fine
return None
self._k8s_kill(pod.metadata.name, signal)
if pod: # if a pod has just ended, we're good already, don't kill
self._k8s_kill(pod.metadata.name, Signals['SIG' + signal].value)
else:
# not started yet, delete job
self._k8s_batch.delete_namespaced_job(
Expand Down Expand Up @@ -158,28 +148,28 @@ def _get_pod(self, project, job_id):

return pod

def _k8s_to_scrapyd_status(self, status):
return self.STATUS_MAP.get(status, status.lower())

def _k8s_job_to_scrapyd_status(self, job):
if job.status.ready:
return 'running'
elif job.status.succeeded:
return 'finished'
else: # including failure modes
elif job.status.failed:
return 'finished'
else:
return 'pending'

def _k8s_job_name(self, project, job_id):
return '-'.join(('scrapyd', project, job_id))

def _k8s_kill(self, pod_name, signal):
# exec needs stream, which modified client, so use separate instance
# exec needs stream, which modifies client, so use separate instance
k8s = kubernetes.client.CoreV1Api()
resp = kubernetes.stream(
resp = kubernetes.stream.stream(
k8s.connect_get_namespaced_pod_exec,
pod_name,
'default',
namespace=self._namespace,
# this is a bit blunt, bit it works and is usually available
command=['/usr/sbin/killall5', '-' + signal]
command=['/usr/sbin/killall5', '-' + str(signal)],
stderr=True
)
# TODO figure out how to get return value
# TODO figure out how to get return value
Loading

0 comments on commit 1e0fda4

Please sign in to comment.