https://github.com/redpanda-data/redpanda/pull/10398 #8

juntao · 2023-04-27T05:08:34Z

No description provided.

juntao · 2023-04-27T05:08:36Z

Hello, I am a code review bot on flows.network. Here are my reviews of changed source code files in this PR.

src/v/kafka/server/handlers/fetch.cc

Upon review, here are a few observations:

Copyright header: Ensure that you have permission to use the code.
Headers and includes: The code seems to import a reasonable set of headers, though there might be some that are unnecessary (e.g., <chrono>). Make sure to keep only the imports that are being utilized and remove the rest.
Usage of likely and unlikely macros is appreciated as it helps in branch prediction.
Exception handling: The code utilizes exception handling, which is good. But consider improving the error handling code by using custom error types instead of assuming std::exception_ptr.
Proper usage of const: The code uses const appropriately in most places, which is a good practice.
Error handling: Many error checks are performed using the if (unlikely(...)) approach, and most of these are followed by error handling code. Ensure that your error cases are covered properly and that the compiler can optimize this code path appropriately.
Code readability: Some parts of the code contain nested lambda functions with large conditional statements. Consider breaking this code down into smaller, more readable functions.
Fill_fetch_responses logic: In fill_fetch_responses function you have an if-else to handle the case when the response has a data or not, you can simplify the code by first initializing resp.records to an empty value and check if the response has data and then change the value of resp.records.
Use of co_return: It seems like the code is using coroutines, which is a modern C++ feature. However, coroutines are only available since C++20, so make sure your build environment supports that.

In general, the code appears to follow good practices, but there are areas that could be improved for better readability and maintainability.

The key changes in this patch are:

Added two new includes for seastar's scheduling functionality:
- #include <seastar/core/scheduling.hh>
- #include <seastar/core/with_scheduling_group.hh>
Modified the fetch_handler::handle function:
- Added a new variable named sg which retrieves the fetch scheduling group from the request_context's connection and server.
- Wrapped the existing code in the function with ss::with_scheduling_group(sg, ...), ensuring that the code inside the lambda will be executed under the defined scheduling group.

These changes improve the code by introducing a proper scheduling group to handle fetch requests, which can potentially enhance the performance and resource usage of the application.

src/v/kafka/server/server.cc

The provided source code seems to be the implementation of a Kafka server based on the Seastar library. From the given source code, I have found the following potential issues:

Missing Comments:
The source code does not contain enough comments that explain the functionality and flow of the code. Adding comments to the code is always a good practice, as it helps in understanding the functionality of each statement better and makes it easier to maintain or modify if needed.
Lack of Error Handling:
Although some error handling is done in certain parts of the code, there are other places where error handling could be improved. For example, when getting the distinguished name for 'dn' in the 'get_mtls_principal_state' function, it can throw exceptions or timeout. These errors should be caught and handled gracefully.
Use of Magic Numbers:
In the 'get_mtls_principal_state' function, a timeout of 5 seconds is hardcoded as 5s. It's better to define constants at the top of the file or in a configuration file for values like these, avoiding magic numbers and making it easier to change the timeout value in the future if needed.
Inconsistent Variable Naming Conventions:
There is inconsistency in variable naming conventions throughout the code. Some variables use snake_case, while others use camelCase. To maintain good readability and consistency, it's better to follow a single naming convention throughout the project.
Throwing Exceptions:
In the 'apply' function, the exception is caught and logged in the 'catch' block, and then it's rethrown using std::rethrow_exception. It's unclear what the purpose of this is since the function returns a ss::future<>, and it might be better to handle the exception without rethrowing it.

Aside from these mentioned issues, the code looks fine in terms of its structure and organization. However, please note that my analysis is limited by the fact that the code is truncated and not all the dependencies are provided.

This patch introduces a new member variable _fetch_scheduling_group that represents the scheduling group for fetch operations. The key changes are as follows:

The server constructor signature has been updated to include a new parameter: ss::scheduling_group fetch_sg. This new parameter is a scheduling group used in the Seastar framework for managing and assigning resources to tasks such as processing fetch requests.
The new _fetch_scheduling_group member variable has been added to the server class to store the provided fetch scheduling group fetch_sg.
The provided fetch_sg is now stored in the _fetch_scheduling_group member variable during the construction of the server class.

These changes suggest that the developer intends to introduce a customization in fetch request handling which would allow more fine-grained control over the resources assigned to those tasks.

src/v/kafka/server/server.h

Based on the given source code, I would like to comment on a few notable points and potential issues:

Proper use of = default and = delete for constructors and assignment operators.
Good use of final for the server class, which prevents further inheritance.
Use of std::string_view for the name() method is a good and efficient choice.
In the server class, many member variables are stored as references to ss::sharded objects. Consider using shared pointers, such as std::shared_ptr, to ensure proper lifetime management and to avoid potential dangling references.
The member variables _is_idempotence_enabled and _are_transactions_enabled are initialized as false. If these variables are not set during the object's construction or use, it might lead to unintended behavior. Consider initializing them properly, either by providing values during the server class's construction or by using a configuration object to set default values.
The server class lacks documentation for its member functions and member variables. Consider adding detailed comments to describe their functionality and purpose.
There is no clear error handling or logging mechanism in place. Consider adding logging statements and error handling to provide better visibility when debugging issues.
If possible, prefer using std::filesystem::path instead of std::string for paths like 'licenses/BSL.md' in the Copyright header. This will improve platform independence.

Please note that, due to the provided source code being potentially truncated, some of the above points may already be addressed in parts of the code that are not visible.

The patch introduces the following key changes to the source code:

Addition of #include <seastar/core/scheduling.hh> header file, which allows the use of ss::scheduling_group class.
Modification of the server class constructor to accept a new ss::scheduling_group parameter.
Addition of a new member function fetch_scheduling_group() that returns the _fetch_scheduling_group member variable. This function is marked as const to ensure that it does not modify the object state.
Addition of a new private member variable _fetch_scheduling_group of type ss::scheduling_group in the server class.

These changes suggest that the server class now supports scheduling groups while handling requests, which can be useful for managing resources and controlling the prioritization of request processing.

src/v/redpanda/application.cc

Based on the provided source code, here are some potential issues I have identified:

There are no comments explaining what application::run and application::initialize do. This might make the code difficult to understand for new developers or anyone trying to make modifications in the future. It would be helpful to add comments outlining the purposes of these methods in the code.
There seem to be many hard-coded values in the code. Examples include crash_reset_duration, which is set to 1 hour, and default memory allocation (1536_MiB, 0.07 * total_mem). It might be better to either move these values to configuration files or set them as constants at the beginning of the file. This would make it easier to update or modify these values in the future.
There are many uses of .get() and .get0() in the code, which can be dangerous as these cause the calling coroutine to block, deteriorating performance. Consider using alternatives such as .then() chained continuations or co_await to handle futures more safely.
Although not necessarily a problem, it is worth noting that there is a heavy use of "shard local" variables, which makes the code more difficult to reason about. This may not need to be changed, but it is essential to be aware of the complications that arise from managing shard local variables.
When handling exceptions, application::run() only logs details about the current exception. However, it might be useful to add more specific error messages, descriptions, or handling for specific exceptions, such as a custom error message for configuration file errors, bad input parameters, or resource limitations.
While initializing zstd compression workspaces, the code assumes that config::shard_local_cfg().zstd_decompress_workspace_bytes() returns a value that is safe to use. There should be a check to ensure that the value returned is within a reasonable range and not a negative or excessively large value. This would help prevent potential issues caused by misconfiguration or invalid values.

The key changes in this patch include:

Renaming the variable _scheduling_groups to sched_groups throughout the code.
Adding a new parameter sched_groups.fetch_sg() to the start method of the Kafka protocol handling service.
Updating all scheduling group references to use the new sched_groups variable.

These changes mainly involve renaming a variable and passing a new fetch scheduling group parameter to the Kafka service instantiation. Overall, the changes don't affect the functionality of the code but improve its readability and organization.

src/v/redpanda/application.h

There are no obvious issues with the provided code snippet. However, here are a few recommendations:

Consider adding comments to describe the purpose and usage of each class, method, and member variable in the code for better maintainability.
Keep in mind, due to the nature of sharding involved across the services, pay special attention to potential concurrency issues that might arise.
While not an issue with the provided code, keep an eye out for any potential circular dependencies, especially with the list of included header files (which are quite extensive). Circular dependencies can cause unexpected behavior and anomalies during the build process.
In crash_tracker_metadata structure, consider adding comment to describe the use of each member variable and their purpose.

Overall, the code seems organized, and the structure of the application class is well defined. Remember to always follow good coding practices, avoid code duplication, and maintain consistency throughout the entire code base.

The key change in the provided patch is the removal of the member variable scheduling_groups _scheduling_groups; from the application class and the addition of the member variable scheduling_groups sched_groups; to the public section of the same class.

This change is moving the scheduling_groups member variable from the private section to the public section of the application class.

src/v/redpanda/tests/fixture.h

Here's the code review with minor suggestions:

In overloaded constructors of redpanda_thread_fixture, you are passing arguments forcibly in each initializer. You can use constructor delegation instead, where you can chain the constructors by delegating the overloads. This may make the code clear, shorter and avoid unnecessary repeated initializations.
Since the source code appears to be modular, add file-level documentation comments to briefly explain the purpose of the class implementations and functions included in this file. This can help other developers to quickly understand without going through every line of the code.
The usage of auto in some cases might make code obscure. For instance, here: auto base_path = std::filesystem::path(data_dir);, it would be better to use std::filesystem::path directly. This enhances readability by explicitly stating the type of variable.
The wait_for_controller_leadership() function mixes both co_await and co_return statements with semicolon-less style and normal function-style syntax having semicolons. It would be more consistent to stick to a single style.
Consider adding documentation at the class-level, before constructors, and before important member functions in multiline comment format. This will help in understanding the reasoning behind specific implementations and improve the maintainability of the code.
In most initializer lists of member variables in constructors, variables are separated with , except sometimes you use :. That code inconsistency should be resolved for better readability.
It is generally a good practice to specify public, private, and protected access and their corresponding members to make the code more modular and easy-to-read. In the current code, public and private members are intermixed, and access specifiers aren't explicitly used.

Overall, the code looks well-structured and the indentation and formatting are consistent. By improving the documentation and following these suggestions, you can make the code even more understandable and easier to maintain.

The key change in this patch is the addition of a new argument in the kafka::server constructor. The new argument being passed is app.sched_groups.fetch_sg(). This change implies that an additional parameter is now used in the kafka::server class initialization, possibly to manage fetch scheduling or request handling.

src/v/resource_mgmt/cpu_scheduling.h

Here are my comments and suggestions for the provided source code:

Missing include:
The header <vector> is missing, which is needed for std::vector.
Namespace encapsulation:
It's good practice to encapsulate the code within a namespace to prevent naming conflicts and improve code organization.
Missing return type for create_groups:
The create_groups function is missing a return type. Since it's an asynchronous function, it should return ss::future<>.
Remove co_return in destroy_groups:
In destroy_groups function, the use of co_return; statement is redundant, and can be removed.
Include guards:
Although the code contains #pragma once, which serves the same purpose, it's a good idea to also provide include guards using the #ifndef, #define, and #endif preprocessor directives for better compatibility with different compilers and systems.

Here's an updated version of the code with the suggested changes:

// missing include
#include <vector>

// namespace encapsulation
namespace redpanda {

class scheduling_groups final {
public:
    // missing return type for create_groups
    ss::future<> create_groups() {
        _admin = co_await ss::create_scheduling_group("admin", 100);
        _raft = co_await ss::create_scheduling_group("raft", 1000);
        _kafka = co_await ss::create_scheduling_group("kafka", 1000);
        _cluster = co_await ss::create_scheduling_group("cluster", 300);
        _coproc = co_await ss::create_scheduling_group("coproc", 100);
        _cache_background_reclaim = co_await ss::create_scheduling_group(
          "cache_background_reclaim", 200);
        _compaction = co_await ss::create_scheduling_group(
          "log_compaction", 100);
        _raft_learner_recovery = co_await ss::create_scheduling_group(
          "raft_learner_recovery", 50);
        _archival_upload = co_await ss::create_scheduling_group(
          "archival_upload", 100);
        _node_status = co_await ss::create_scheduling_group("node_status", 50);
        _self_test = co_await ss::create_scheduling_group("self_test", 100);
        _fetch = co_await ss::create_scheduling_group("fetch", 1000);
        co_return; // can be removed
    }

    // other functions unchanged

private:
    // class members unchanged
};

} // namespace redpanda

The patch makes the following key changes:

Adds a new scheduling group called _fetch in the create_groups() function: This fetch scheduling group is created with a priority of 1000.
Adds a new function fetch_sg() with a brief description: This function returns the _fetch scheduling group. The description mentions that this group is used for consumer fetch processing, and it has the same priority as the default group to prevent non-fetch requests from being significantly delayed when fetch requests use all the CPU.
Adds the _fetch scheduling group to the destroy_groups() function: This ensures that the _fetch group is properly destroyed when required.
Adds the _fetch scheduling group to the all_scheduling_groups() function: This ensures that the _fetch group is included in the vector containing all scheduling groups.
Adds a new class member variable _fetch: This variable is of type ss::scheduling_group and will hold the fetch scheduling group.

cc redpanda-data/redpanda#10398

juntao · 2023-04-27T05:19:16Z

flows summarize

juntao · 2023-04-27T05:19:17Z

Hello, I am a code review bot on flows.network. Here are my reviews of code commits in this PR.

Overall Summary:

The proposed patches introduce changes primarily focused on providing a separate scheduling group for handling consumer fetch processing in the Redpanda application. Key modifications across these patches include making the _scheduling_groups variable public, creating a new scheduling group _fetch, updating associated functions to include the new group, and adding the fetch_scheduling_group to the kafka::server and fetch handler.

Potential issues and errors:

The changes assume that making the _scheduling_groups variable public will not violate any encapsulation principles or cause unintended side effects.
Parts of the code that previously relied on _scheduling_groups should be carefully reviewed to ensure no bugs or inconsistencies are introduced as a result of the variable being made public.
The priority of the new _fetch scheduling group might need further review or configuration options to determine the most appropriate value.
The possibility of other scheduling groups encountering similar issues addressed by the new _fetch group should be analyzed, to determine if additional separate groups are also needed.
Merge conflicts could arise if other features or changes are being developed alongside these patches.
Performance impact due to the new scheduling group must be monitored, tested, and addressed if necessary.
Developers must be made aware of the new scheduling group and its implications on other parts of the code for future maintenance and development.

Important findings:

These patches provide a solution to prevent fetch requests from causing significant delays for non-fetch requests by introducing a new scheduling group specifically for fetching.
Functions related to scheduling groups have been updated to accommodate the newly created _fetch group.
The new fetch_scheduling_group has been added to the kafka::server and fetch handler to handle fetches and appropriately switch the Kafka scheduling group.

Details

Commit d5056932505428a60e81334b72799408490e7f51

This is a patch for the redpanda application, which makes the _scheduling_groups variable public by renaming it to sched_groups. The changes are mainly found in src/v/redpanda/application.cc and src/v/redpanda/application.h. The motivation behind this change is to have access to the scheduling groups outside of application.cc, to pass them to service instantiations in the redpanda fixture.

Key changes:

The private variable _scheduling_groups is renamed to sched_groups and made public in src/v/redpanda/application.h.
All occurrences of _scheduling_groups in src/v/redpanda/application.cc are replaced with sched_groups.

Potential problems:

The patch assumes that making the _scheduling_groups variable public as sched_groups won't violate any encapsulation principles or cause side effects in other parts of the code. It can be crucial to analyze the effects of making this variable public on the overall application's behavior.
It's also important to review how other parts of the code that relied on _scheduling_groups handle the new public variable to ensure this change doesn't introduce any bugs or inconsistencies.

Commit d133843a574bdebdfea715915922bd7e3cd7ff68

Summary of key changes:

A new scheduling group, _fetch, has been created with the same priority as the default group (1000). This group is specifically used for consumer fetch processing.
The _fetch group has been added to prevent non-fetch requests from being significantly delayed when fetch requests use all the CPU.
The functions destroy_groups() and all_scheduling_groups() have been updated to include the _fetch group.
A new member function fetch_sg() has been added to return the _fetch scheduling group.

Potential problems:

It is not immediately clear if the priority of the _fetch group (1000) is the most appropriate value. It might be useful to determine if this value should be configurable or if other priority levels should be considered.
While the change addresses the issue of fetch requests causing delays for non-fetch requests, it might be worthwhile to analyze the possibility of other scheduling groups encountering similar issues and if they require separate groups as well.
No modification has been made to the actual fetch processing code to assign it to the new scheduling group. This could lead to the new scheduling group not being used, and the intended benefits might not be realized.

Commit 9a93a9c22238e145dfa9d7fe297eb494d7c5f0bf

Summary of key changes:

The fetch_scheduling_group is added to the kafka::server and fetch handler. This allows to handle fetches with a new scheduling group.
ss::with_scheduling_group is called with the fetch_scheduling_group to switch the Kafka scheduling group in the fetch handling flow.
src/v/kafka/server/server.h, src/v/kafka/server/server.cc, src/v/kafka/server/handlers/fetch.cc, and src/v/redpanda/tests/fixture.h have been updated with the changes.

Potential problems:

Merge conflicts: The changes might lead to merge conflicts if other features or changes are being developed simultaneously.
Performance impact: The addition of a new scheduling group can introduce bottlenecks or alter the fetch performance. Properly monitoring and testing the performance should be done to ensure it meets expectations.
Code maintenance: As the new scheduling group is introduced, the developers need to be aware of the new group and its impact on other parts of the code.

cc redpanda-data/redpanda#10398

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

https://github.com/redpanda-data/redpanda/pull/10398 #8

https://github.com/redpanda-data/redpanda/pull/10398 #8

juntao commented Apr 27, 2023

juntao commented Apr 27, 2023 •

edited

Loading

juntao commented Apr 27, 2023

juntao commented Apr 27, 2023 •

edited

Loading

https://github.com/redpanda-data/redpanda/pull/10398 #8

https://github.com/redpanda-data/redpanda/pull/10398 #8

Comments

juntao commented Apr 27, 2023

juntao commented Apr 27, 2023 • edited Loading

juntao commented Apr 27, 2023

juntao commented Apr 27, 2023 • edited Loading

Details

juntao commented Apr 27, 2023 •

edited

Loading

juntao commented Apr 27, 2023 •

edited

Loading