Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: Out of date archive test distribution may be causing plugin IT test failures #953

Closed
downsrob opened this issue Nov 12, 2021 · 4 comments
Assignees
Labels
bug Something isn't working

Comments

@downsrob
Copy link
Contributor

downsrob commented Nov 12, 2021

Describe the bug

Parallel issue on core which has been identified as a release blocker for 1.2: opensearch-project/OpenSearch#1473

Integration tests for the Index Management plugin which extend from OpenSearchIntegTestCase are failing during cleanup following the addition of shard indexing pressure stats in this PR, which are causing a serialization mismatch. The new fields are included in serialization when the version is on or after 1.2.0, and debug logging has shown that both sides of the stream are on version 1.2.0, but the fields are written and not read.

This failure can be replicated with an empty test extending from OpenSearchIntegTestCase, on at least Alerting, Index Management, and Anomaly Detection if the testDistribution is set to ARCHIVE and the integTest gradle task is run. These test failures have been isolated to Index Management as only this test setup is only used in Index Management.

It appears that when the test distribution is set to archive, this artifact is used: https://artifacts.opensearch.org/snapshots/core/opensearch/1.2.0-SNAPSHOT/opensearch-min-1.2.0-SNAPSHOT-linux-x64-latest.tar.gz
After unpacking the snapshot, you can see that the shard indexing pressure changes are not included in the code, and the state of the snapshot code would precisely cause these errors. Additionally, the equivalent snapshot is not yet available for 1.3.0, so there may be an issue with building these artifacts.

To reproduce

On a plugin, using this empty integration test with the test distribution set to ARCHIVE, the test always fails during cleanup:

import org.opensearch.test.OpenSearchIntegTestCase;

public class EmptyTestsIT extends OpenSearchIntegTestCase {

    public void testEmpty() {
        System.out.println("This does print, we fail in cleanup");
    }
}

Expected behavior

The test setup as described should not fail. This same test setup has been used in Index Management for a long time now, and changes to serialization in the NodeStats API should not cause it to fail.

Screenshots

No response

Host / Environment

I have been able to replicate this locally on my Amazon Linux 2 desktop, and it occurs every time on Index Management's CI run. Here is a failing run.

Additional context

No response

Relevant log output

REPRODUCE WITH: ./gradlew ':integTest' --tests "org.opensearch.indexmanagement.indexstatemanagement.MetadataRegressionIT.test move metadata service" -Dtests.seed=E194FF25655D5338 -Dtests.security.manager=false -Dtests.locale=ga-IE -Dtests.timezone=BET -Druntime.java=14

org.opensearch.indexmanagement.indexstatemanagement.MetadataRegressionIT > test move metadata service FAILED
    TransportSerializationException[Failed to deserialize response from handler [org.opensearch.transport.TransportService$ContextRestoreResponseHandler/org.opensearch.action.ActionListenerResponseHandler@14f300dd/org.opensearch.client.transport.TransportClientNodesService$RetryListener@58571732]]; nested: EOFException;
        at __randomizedtesting.SeedInfo.seed([E194FF25655D5338:43E7FA64742871E1]:0)
        at org.opensearch.transport.InboundHandler.handleResponse(InboundHandler.java:301)
        at org.opensearch.transport.InboundHandler.messageReceived(InboundHandler.java:154)
        at org.opensearch.transport.InboundHandler.inboundMessage(InboundHandler.java:108)
        at org.opensearch.transport.TcpTransport.inboundMessage(TcpTransport.java:759)
        at org.opensearch.transport.InboundPipeline.forwardFragments(InboundPipeline.java:170)
        at org.opensearch.transport.InboundPipeline.doHandleBytes(InboundPipeline.java:145)
        at org.opensearch.transport.InboundPipeline.handleBytes(InboundPipeline.java:110)
        at org.opensearch.transport.nio.MockNioTransport$MockTcpReadWriteHandler.consumeReads(MockNioTransport.java:341)
        at org.opensearch.nio.SocketChannelContext.handleReadBytes(SocketChannelContext.java:246)
        at org.opensearch.nio.BytesChannelContext.read(BytesChannelContext.java:59)
        at org.opensearch.nio.EventHandler.handleRead(EventHandler.java:152)
        at org.opensearch.transport.nio.TestEventHandler.handleRead(TestEventHandler.java:167)
        at org.opensearch.nio.NioSelector.handleRead(NioSelector.java:438)
        at org.opensearch.nio.NioSelector.processKey(NioSelector.java:264)
        at org.opensearch.nio.NioSelector.singleLoop(NioSelector.java:191)
        at org.opensearch.nio.NioSelector.runLoop(NioSelector.java:148)
        at java.base/java.lang.Thread.run(Thread.java:832)

        Caused by:
        java.io.EOFException
            at org.opensearch.common.io.stream.InputStreamStreamInput.readByte(InputStreamStreamInput.java:71)
            at org.opensearch.common.io.stream.FilterStreamInput.readByte(FilterStreamInput.java:53)
            at org.opensearch.common.io.stream.StreamInput.readVInt(StreamInput.java:223)
            at org.opensearch.common.io.stream.StreamInput.readArraySize(StreamInput.java:1298)
            at org.opensearch.common.io.stream.StreamInput.readCollection(StreamInput.java:1227)
            at org.opensearch.common.io.stream.StreamInput.readList(StreamInput.java:1184)
            at org.opensearch.action.support.nodes.BaseNodesResponse.<init>(BaseNodesResponse.java:58)
            at org.opensearch.action.admin.cluster.node.stats.NodesStatsResponse.<init>(NodesStatsResponse.java:51)
            at org.opensearch.action.ActionListenerResponseHandler.read(ActionListenerResponseHandler.java:82)
            at org.opensearch.action.ActionListenerResponseHandler.read(ActionListenerResponseHandler.java:49)
            at org.opensearch.transport.TransportService$ContextRestoreResponseHandler.read(TransportService.java:1338)
            at org.opensearch.transport.TransportService$ContextRestoreResponseHandler.read(TransportService.java:1325)
            at org.opensearch.transport.InboundHandler.handleResponse(InboundHandler.java:298)
            ... 16 more
@downsrob downsrob added bug Something isn't working untriaged Issues that have not yet been triaged labels Nov 12, 2021
@peternied
Copy link
Member

@peternied
Copy link
Member

After looking into the system and the scripts, I don't see an indication of what is or isn't producing these specific artifacts. This might be related to how the distribution downloader is operating at runtime as opposed to the snapshot build is producing.

For the time being lets keep this issue on the books in the event that the build job is not correct and needs to be updated

@bbarani
Copy link
Member

bbarani commented Nov 12, 2021

The change to add -min to artifact name has been recently merged to gradle script in OpenSearch repo as part of this PR but it was not removed from the build script (which was already adding a -min to this artifact name) causing 2 min (-min-min) to be added to artifact name. The distribution downloader has been downloading the stale artifact (with one -min) for the integration testing. We are working on fixing this issue now.

@peternied
Copy link
Member

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants