Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HowTo guide - one worker for all stores and it's benefits #2011

Merged
merged 13 commits into from
Jul 27, 2023
Merged
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -0,0 +1,93 @@
---
title: "HowTo: Reduce Jenkins execution by up to 80% without P&S and Data importers refactoring"
description: Save Jenkins related costs or speedup background jobs processing by implementing a custom single Worker for all stores.
AlexSlawinski marked this conversation as resolved.
Show resolved Hide resolved
last_updated: Jul 15, 2023
template: howto-guide-template
originalLink: ??
AlexSlawinski marked this conversation as resolved.
Show resolved Hide resolved
originalArticleId: ??
AlexSlawinski marked this conversation as resolved.
Show resolved Hide resolved
redirect_from:
---


# The Challenge

Our out-of-the-box (OOTB) system requires a specific command (Worker - `queue:worker:start`) to be continuously running for each store to process queues and ensure propagation of information. In addition to this command, there are other commands such as OMS processing, import, export, etc. When these processes are not functioning or running slowly, there is a delay in data changes being reflected on the frontend, causing dissatisfaction among customers and leading to disruption of business processes.
AlexSlawinski marked this conversation as resolved.
Show resolved Hide resolved

# Explanation

By default, our system has a limit of two Jenkins executors for each environment. This limit is usually not a problem for single-store setups, but it becomes critical when there are multiple stores. Without increasing this limit, processing becomes slow because only two Workers are scanning queues and running tasks at a time, while other Workers for different stores have to wait. Moreover, even when some stores don't have messages to process, we still need to run a Worker just for scanning purposes, which occupies Jenkins executors, CPU time, and memory.
AlexSlawinski marked this conversation as resolved.
Show resolved Hide resolved

Increasing the number of processes per queue can lead to issues such as Jenkins hanging, crashing, or becoming unresponsive. Although memory consumption and CPU utilisation are not generally high (around 20-30%), there can be spikes in memory consumption due to a random combination of several workers processing heavy messages for multiple stores simultaneously.
AlexSlawinski marked this conversation as resolved.
Show resolved Hide resolved

There are two potential solutions to address this problem: application optimisation and better background job orchestration.
AlexSlawinski marked this conversation as resolved.
Show resolved Hide resolved

# Proposed Solution

![image](https://spryker.s3.eu-central-1.amazonaws.com/docs/scos/dev/tutorials-and-howtos/howtos/howto-reduce-jenkins-execution-cost-without-refactoring/OneWorker-diagram.png)

The solution described here is targeted at small to medium projects but can be improved and applied universally, but it hasn't been tested fully in such conditions yet.
AlexSlawinski marked this conversation as resolved.
Show resolved Hide resolved

The proposed solution is to use one Worker (`queue:worker:start`) for all stores, regardless of the number of stores. Instead of executing these steps for one store within one process and having multiple processes for multiple stores, we can have one process that scans all queues for all stores and spawns child processes the same way as the OOTB solution. However, instead of determining the number of processes based on the presence of a single message, we can analyse the total number of messages in the queue to make an informed decision on how many processes should be launched at any given moment.

## The Process Pool

In computer science, a pool refers to a collection of resources that are kept in memory and ready to use. In this context, we have a fixed-sized pool (fixed-size array) where new processes are only run if there is space available among the other running processes. This approach allows us to have better control over the number of processes launched by the OOTB solution, resulting in more predictable memory consumption.
AlexSlawinski marked this conversation as resolved.
Show resolved Hide resolved

![image](https://spryker.s3.eu-central-1.amazonaws.com/docs/scos/dev/tutorials-and-howtos/howtos/howto-reduce-jenkins-execution-cost-without-refactoring/NewWorker+Flow.png)

We define the total number of simultaneously running processes for the entire setup on the EC2 instance level. This makes it easier to manage, as we can monitor the average memory consumption for the process pool. If it's too low, we can increase the pool size, and if it's too high, we can decrease it. Additionally, we check the available memory (RAM) and prevent spawning additional processes if it is too low, ensuring system stability. Execution statistics provide valuable insights for decision-making, including adjusting the pool size or scaling up/down the EC2 instance.
ishakuta marked this conversation as resolved.
Show resolved Hide resolved
AlexSlawinski marked this conversation as resolved.
Show resolved Hide resolved

The following params exist:
AlexSlawinski marked this conversation as resolved.
Show resolved Hide resolved

- pool size (default 5-10)
- free mem buffer - min amount of RAM (MB) system should have in order to spawn a new child process (default 750mb)
AlexSlawinski marked this conversation as resolved.
Show resolved Hide resolved

## Worker Statistics and Logs

With the proposed solution, we gather better statistics to understand the "health" of the worker and make informed decisions. We can track the number of tasks executed per queue/store, distribution of error codes, cycles, and various metrics related to skipping cycles, cooldown, available slots, and memory limitations. These statistics help us monitor performance, identify bottlenecks, and optimize resource allocation.
AlexSlawinski marked this conversation as resolved.
Show resolved Hide resolved

![image](https://spryker.s3.eu-central-1.amazonaws.com/docs/scos/dev/tutorials-and-howtos/howtos/howto-reduce-jenkins-execution-cost-without-refactoring/stats-log.png)

![image](https://spryker.s3.eu-central-1.amazonaws.com/docs/scos/dev/tutorials-and-howtos/howtos/howto-reduce-jenkins-execution-cost-without-refactoring/stats-summary.png)

## Error Logging

In addition to statistics, we also capture the output of children's processes in the standard output of the main worker process. This simplifies troubleshooting by providing logs with store and queue names.

![image](https://spryker.s3.eu-central-1.amazonaws.com/docs/scos/dev/tutorials-and-howtos/howtos/howto-reduce-jenkins-execution-cost-without-refactoring/stats-error-log.png)

## Edge Cases/Limitation

Child processes are killed at the end of each minute, which means those batches that were in progress - will be abandoned and return to the source queue to be processed during the next run. While we didn’t notice any issues with this approach, please note that this is still experimental approach and may or may not improve in future. The recommendation to mitigate this is to use smaller batches to ensure children processes are running within seconds or up to 10s (rough estimate) - to reduce the number of messages that will be retried.
AlexSlawinski marked this conversation as resolved.
Show resolved Hide resolved

## Implementation


Two ways are possible
AlexSlawinski marked this conversation as resolved.
Show resolved Hide resolved

1. Applying a patch, although it may require conflict resolution, since it is applied on project level and each project may have unique customisations already in place.

```bash
ishakuta marked this conversation as resolved.
Show resolved Hide resolved
git apply one-worker.diff
```

2. Integrating it manually, using the patch as a source.


Please see attached diffs fror an example implementation. [Here's a diff](https://spryker.s3.eu-central-1.amazonaws.com/docs/scos/dev/tutorials-and-howtos/howtos/howto-reduce-jenkins-execution-cost-without-refactoring/one-worker.diff).

In the project the functionality can be activated and deactivated using the following configuration flag:
AlexSlawinski marked this conversation as resolved.
Show resolved Hide resolved

```php
$config[QueueConstants::QUEUE_ONE_WORKER_ALL_STORES] = (bool)getenv('QUEUE_ONE_WORKER_ALL_STORES') ?? false;
ishakuta marked this conversation as resolved.
Show resolved Hide resolved
```

You can set the flag by setting the following environment variable:

```
QUEUE_ONE_WORKER_ALL_STORES
```

# Summary

The proposed solution was developed was tested in a project environment. It has shown positive results, with significant improvements in data-import processing time. While this solution is suitable for small to medium projects, it has the potential to be applied universally. Code Examples can be found in the attached diff files that show the implementation in a project.
Loading