-
-
Notifications
You must be signed in to change notification settings - Fork 542
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Revisit usage of database connection pooling #1935
Comments
Dabbled with this further, dumping my progress so far: The secondary connection factory is not only used for schema and value generation, but also for non-transactional operations:
This behavior can be turned off via Most queries performed by DT are not in the context of a transaction, and thus qualify as non-transactional. If this wasn't the case, we could've made the secondary connection factory non-pooled (I asked the DataNucleus maintainer about it). However, in my testing, the secondary connection factory used for non-transactional operations always had significantly more active connections than the primary connection factory used for transactional operations. Here's an example showing active connections after I uploaded 250 BOMs in bulk, using two fixed-size connection pools: Keep in mind that "transactional" is supposed to be the "primary" pool. 🤪 |
By creating a temporary PMF based on the same properties as the `PersistenceFactoryInitializer`, the `UpgradeInitializer` would create *two* connection pools of `10` (per default) connections each, for the duration of its execution. Because upgrades are executed in serial, connection pooling is not beneficial. Creating temporary connection pools is a wasteful operation and should be avoided. Partially addresses DependencyTrack#1935 Signed-off-by: nscuro <nscuro@protonmail.com>
Closing this one, as there is nothing more we can do here. Ideally, even read-only operations should make use of transactions. I wouldn't expect any performance impact of that, as the majority of RDBMSes will use transactions behind the scenes anyway. Once no non-transactional operations are performed anymore, we can disable the secondary connection pool. But as of now, the secondary pool is even more actively used than the primary one. Exposition of usage metrics (see also #2245) will allow users to tweak the pools better according to their needs. With #2238, it is possible to (optionally) configure both pools separately. I will add documentation about this on the Database Support page. |
By creating a temporary PMF based on the same properties as the `PersistenceFactoryInitializer`, the `UpgradeInitializer` would create *two* connection pools of `10` (per default) connections each, for the duration of its execution. Because upgrades are executed in serial, connection pooling is not beneficial. Creating temporary connection pools is a wasteful operation and should be avoided. Partially addresses DependencyTrack#1935 Signed-off-by: nscuro <nscuro@protonmail.com> Signed-off-by: mulder999 <nospam099-github@yahoo.com>
This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs. |
Current Behavior:
While experimenting with persistence metrics exposition, I noticed that there are more database connections (and even more connection pools) than I expected:
This was also reflected in the database itself, where I could see 21 active JDBC connections originating from DT.
According to the DataNucleus documentation:
If not specified otherwise, the configured connection pool will thus be created twice, for DN's primary and secondary connection factory. That means that whatever we configure the pool size to be via Alpine, in reality the number will be doubled. Which may explain one or the other connections issue we had reported to us in the past. This behavior is unexpected from the user perspective.
Similarly, notice how the metrics above say
HikariPool-3
andHikariPool-4
. This is because two other instances were created temporarily by the upgrade framework:dependency-track/src/main/java/org/dependencytrack/upgrade/UpgradeInitializer.java
Line 66 in e9304da
This temporarily spins up two connection pools with (per default) 10 idle connections each. Because upgrades are executed in a serial fashion, a connection pool may be a little overkill. A single connection should probably suffice in this case.
Proposed Behavior
There are not many big OSS projects using DataNucleus, but Apache Hive is one of them. I looked into how they handle the connection pool situation, and they settled for using a smaller, fixed-size connection pool for DN's secondary connection factory: https://github.com/hsnusonic/hive/blob/714c260e4a7c6b147c897718a33e693699267792/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/PersistenceManagerProvider.java#L256-L259
We should test whether we can do something similar to limit the number of connections we hoard. This will need to be tested in high-load situations, to ensure that it doesn't slow down the system.
Additionally, the upgrade framework should be adjusted to not use a connection pool.
The text was updated successfully, but these errors were encountered: