Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mistake in detecting encoding of pom.xml #734

Closed
leveretconey opened this issue Feb 6, 2024 · 3 comments · Fixed by #735
Closed

Mistake in detecting encoding of pom.xml #734

leveretconey opened this issue Feb 6, 2024 · 3 comments · Fixed by #735
Assignees
Labels
bug Something isn't working

Comments

@leveretconey
Copy link

What version of OpenRewrite are you using?

I am using

  • OpenRewrite 8.13.1
  • rewrite-maven-plugin 5.18.0
  • Maven 3.5.0

How are you running OpenRewrite?

pom.xml of the project:

<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
  xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd">
  <modelVersion>4.0.0</modelVersion>
  <groupId>org.example</groupId>
  <artifactId>debug</artifactId>
  <version>1.0-SNAPSHOT</version>
  <properties>
    <!-- UTF-8编码的中文-->
    <!-- ↑ Chinese characters in utf-8-->
    <project.build.sourceEncoding>gbk</project.build.sourceEncoding>
  </properties>
</project>

In a legacy application that java source files are using gbk encoding while pom.xml is using utf-8 encoding, OpenRewrite has recognized the file encoding of pom.xml as gbk. This leads to some problem in our scenerio.

After some debugging, I noticed that OpenRewrite reads project.build.sourceEncoding property from pom and sets it as the encoding of all files in this project. I think this behavior is not correct because this property is only applicable to source code files by Maven's design.

What did you expect to see?

pom.xml should be detected as using utf-8 encoding

What did you see instead?

pom.xml is detected as using gbk encoding

Are you interested in contributing a fix to OpenRewrite?

I won’t contribute to this because the code changes may be huge and involve multiple code repositories.

@timtebeek
Copy link
Contributor

Thanks for calling this out @leveretconey ! Sounds like an oversight; adding a bit of context here already.

We indeed read project.build.sourceEncoding in MavenMojoProjectParser.listSourceFiles and use that to parse source files for the project.

public Stream<SourceFile> listSourceFiles(MavenProject mavenProject, @Nullable Xml.Document maven, List<Marker> projectProvenance, List<NamedStyles> styles,
ExecutionContext ctx) throws DependencyResolutionRequiredException, MojoExecutionException {
Stream<SourceFile> sourceFiles = Stream.empty();
Set<Path> alreadyParsed = new HashSet<>();
if (maven != null) {
sourceFiles = Stream.of(maven);
alreadyParsed.add(baseDir.resolve(maven.getSourcePath()));
}
Object mavenSourceEncoding = mavenProject.getProperties().get("project.build.sourceEncoding");
if (mavenSourceEncoding != null) {
ParsingExecutionContextView.view(ctx).setCharset(Charset.forName(mavenSourceEncoding.toString()));
}
JavaParser.Builder<? extends JavaParser, ?> javaParserBuilder = JavaParser.fromJavaVersion()
.styles(styles)
.logCompilationWarningsAndErrors(false);

At the point where we read project.build.sourceEncoding we have already parsed the Maven files just before. I don't see any reference to sourceEncoding in other projects than rewrite-maven-plugin; that makes me think the pom.xml is parsed without taking sourceEncoding into account. What files are affected by the problem you're reporting? Is it the pom.xml files themselves? Files inside src/? Or files outside of src/?

If the problem you're described only affects files outside src/ then this might be a first step towards seeing that resolved:

Would you be able to give that a go and see if that helps you there?

@timtebeek timtebeek self-assigned this Feb 6, 2024
@leveretconey
Copy link
Author

leveretconey commented Feb 7, 2024

Thanks for your reply @timtebeek !

Let me add how we are currently using the MavenMojoProjectParser: Currently we set skipMavenParsing=true to skip the built-in POM parsing (otherwise sometimes it hangs, and we have never figured out the reason, which might be related to the Great Firewall), then we set parseAdditionalResources=true to parse the pom.xml as a regular XML file and then implement own recipes to modify pom.xml. In addition, we have implemented some recipes to modify other configuration files that are outside of the src/ directory. Currently, it appears that these configuration files may also be affected by project.build.sourceEncoding, which could lead to incorrect encoding.

I have tested #735 and that does solve my issue.

@timtebeek
Copy link
Contributor

Hi @leveretconey ; did you you have a chance to try out the latest snapshot already? I'm hoping this is now fixed for you, but would like to be sure.

This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

2 participants