Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HIVE-27895: Remove dependency jodd-util #4888

Closed
wants to merge 2 commits into from

Conversation

pan3793
Copy link
Member

@pan3793 pan3793 commented Nov 21, 2023

What changes were proposed in this pull request?

Remove dependency jodd-util.

Why are the changes needed?

HIVE-25054(only present on 4.0.0) fixed the jodd CVE by upgrading it to a new version, when I'm looking to backport this to branch-2.3, I find Hive only uses a few code of this lib, so I think copy such code snippets and remove this dependency should be a better way.

Does this PR introduce any user-facing change?

No

Is the change a dependency upgrade?

No, but it removes a dependency.

How was this patch tested?

Pass CI

@pan3793
Copy link
Member Author

pan3793 commented Nov 21, 2023

cc @sunchao @wangyum
jodd CVE is listed in https://issues.apache.org/jira/browse/SPARK-44757 at first place, with scored 9.8, I would like to include this patch in 2.3.10

* limitations under the License.
*/

// This class is ported from org.jodd:jodd-util:6.0.0 and remove unreferenced code
Copy link
Member Author

@pan3793 pan3793 Nov 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

document where the code comes from, and also preserve the original license header

@zhangbutao
Copy link
Contributor

Sorry, I can not understand why we choose this way(copy source code) to fix CVE. https://github.com/oblac/jodd-util is still a active a repo, i think we can still use the dependecy to fix CVE.
Thanks.

@pan3793
Copy link
Member Author

pan3793 commented Nov 22, 2023

@zhangbutao I mostly stand on a downstream project perspective.

Hive has a low release rate. The latest stable versions(except for alpha and beta) are

Once CVEs are reported as caused by Hive's transitive dependencies, such a release rate makes downstream projects like Spark awkward.

Spark uses Hive 2.3.9 now. Hive 2.3.9 has many dependencies which have CVEs:

  • log4j 2.6.2 - suffered by log4shell, but fortunately, the new log4j has good API compatibility so Spark could upgrade the log4j deps directly
  • Guava 14 - EOL and has many CVEs, and the new versions of Guava have breaking API changes so we can not do upgrading like log4j does, Spark sticks on Guava 14 because of Hive
  • Jackson 1.x - EOL and has many CVEs, and the new Jackson 2.x has breaking API changes so we can not do upgrading like log4j does, Spark must ship those jars otherwise may break Hive class invocation
  • Thrift 0.9.x - EOL and has many CVEs, and the new Thrift 0.13+ has breaking API changes so we can not do upgrading like log4j does, Spark sticks on Thrift 0.12 because of Hive
  • jodd-core 3.5.2 - has CVE-2018-21234, and the new versions of jodd have breaking API changes so we can not do upgrading like log4j does, actually, Hive does not use those CVE code paths.

Hive only uses a few codes (~200 lines) of the jodd-util, copy code is a clean and cheap way, then Hive and the downstream projects will not be suffered if there are new CVEs found in the left codes of jodd-util in the future.

* </ul>
*/
public static String text(final CharSequence text) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just found it could be replaced with org.apache.commons.lang3.StringEscapeUtils#escapeHtml4

Copy link
Contributor

@zabetak zabetak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy-pasting code from third party projects is not a good idea. There is no reason to do it especially since there is no real issue to address in this case. I don't think we should start doing such changes.

Moreover, the copy of JulianDate here class has license ambiguities and this is a definitely a reason for -1.

@pan3793
Copy link
Member Author

pan3793 commented Nov 22, 2023

the copy of JulianDate here class has license ambiguities

@zabetak the jodd-util is under BSD 2-clause, which is listed in CATEGORY A, what's the issue here?

@pan3793
Copy link
Member Author

pan3793 commented Nov 22, 2023

Copy-pasting code from third party projects is not a good idea.

In most cases, I agree, but the size should be counted, I don't think it is worth pulling a dependency because of a few lines of code reference.

There is no reason to do it especially since there is no real issue to address in this case.

For master branch, there is no real issue, my real target is branch-2.3, just following the common "upstream first" practice to do change on the master branch and then do backport. As you oppose such change on master branch, is it acceptable if I only do such change for branch-2.3? @zabetak

Copy link

sonarcloud bot commented Nov 22, 2023

Kudos, SonarCloud Quality Gate passed!    Quality Gate passed

Bug C 1 Bug
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 22 Code Smells

No Coverage information No Coverage information
No Duplication information No Duplication information

warning The version of Java (11.0.8) you have used to run this analysis is deprecated and we will stop accepting it soon. Please update to at least Java 17.
Read more here

Copy link

@aturoczy aturoczy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @pan3793,
I do not think this is the right approach to fix a CVE. Yes, the releases are not as frequent as it should be. But we as a community try to change on this. Moreover to move a code internally to the repository it is just a very short-term gain. Yes, it is not so big, also not so complex. BUT the jodd-util responsibility for this move to Hive repo is not so good imho.

-1

@zabetak
Copy link
Contributor

zabetak commented Nov 28, 2023

the jodd-util is under BSD 2-clause, which is listed in CATEGORY A, what's the issue here?

@pan3793 Please check the respective section about how to treat third party works in ASF projects.

@pan3793
Copy link
Member Author

pan3793 commented Nov 30, 2023

Do not add the standard Apache License header to the top of third-party source files.

the jodd-util is under BSD 2-clause, which is listed in CATEGORY A, what's the issue here?

@pan3793 Please check the respective section about how to treat third party works in ASF projects.

@zabetak thanks for pointing it out. so copying is legal as long as we follow the guidance of ASF.

Seems reducing dependencies is not a mission in the Hive project, close it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants