Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Request _snapshot/repo/snapshot/_status returns wrong status #7952

Closed
uzhinskiy opened this issue Jun 7, 2023 · 7 comments · Fixed by #12812
Closed

[BUG] Request _snapshot/repo/snapshot/_status returns wrong status #7952

uzhinskiy opened this issue Jun 7, 2023 · 7 comments · Fixed by #12812
Assignees
Labels
bug Something isn't working distributed framework good first issue Good for newcomers

Comments

@uzhinskiy
Copy link

Describe the bug
The /_snapshot/repo/snapshot/_status returns snapshot.state=SUCCESS for snapshots that have failed shards and are actually in PARTIAL status.

To reproduce

  1. Make sure you have failed snapshots:
    curl https://endpoint:9200/_cat/snapshots/repo/
snap-2023.06.06 SUCCESS 1686109923 03:52:03 1686112172 04:29:32 37.4m 1 15 0 15
snap-2023.06.07 PARTIAL 1686117675 06:01:15 1686119799 06:36:39 35.4m 1 28 1 29
  1. Get common info of the failed snapshot:
    curl -ks https://endpoint:9200/_snapshot/repo/snap-2023.06.07/?pretty
     {
       "snapshot" : "snap-2023.06.07",
       "version" : "2.4.1",
       "indices" : [
         "index-2023.06.06"
       ],
       "include_global_state" : true,
       "state" : "PARTIAL",  - - - !!! true !!!
       "failure" : [
         {
           "index" : "index-2023.06.06",
           "shard_id" : 1,
           "reason" : "node shutdown",
           "node_id" : "123",
           "status" : "INTERNAL_SERVER_ERROR"
         }
       ],
       "shards" : {
         "total" : 29
         "failed" : 1,
         "successful" : 28
       }
     }
  1. Request the status of this snapshot:
    curl -ks https://endpoint:9200/_snapshot/repo/snap-2023.06.07/_status?pretty
   "snapshots" : [
     {
       "snapshot" : "snap-2023.06.07",
       "repository" : "repo",
       "state" : "SUCCESS", - - - !!! false !!!
       "include_global_state" : true,
       "shards_stats" : {
         "initializing" : 0,
         "started" : 0,
         "finalizing" : 0,
         "done" : 28,
         "failed" : 1,
         "total" : 29
       },
       "indices" : {
         "index-2023.06.06" : {
           "shards_stats" : {
             "initializing" : 0,
             "started" : 0,
             "finalizing" : 0,
             "done" : 28,
             "failed" : 1,
             "total" : 29
           },
           "shards" : {
             ...
             "1" : {
               "stage" : "FAILURE",
               "stat" : {
                 "incremental" : {
                   "file_count" : 0,
                   "size_in_bytes" : 0
                 },
                 "total" : {
                   "file_count" : 0,
                   "size_in_bytes" : 0
                 },
                 "start_time_in_millis" : 0,
                 "time_in_millis" : 0
               },
               "reason" : "node shutdown"
             }
           }
         }
       }
     }
   ]

Expected behavior
The request must return the current snapshot status

Host/Environment (please complete the following information):
OS: Amazon Linux 2

  • Version Opensearch 2.4.1
@uzhinskiy uzhinskiy added bug Something isn't working untriaged labels Jun 7, 2023
@uzhinskiy
Copy link
Author

probably here needs to add :

        INIT((byte) 0, false),
        STARTED((byte) 1, false),
        SUCCESS((byte) 2, true),
        FAILED((byte) 3, true),
        ABORTED((byte) 4, false);
        PARTIAL((byte) 5, false);
        .....
        public static State fromValue(byte value) {
            switch (value) {
                case 0:
                    return INIT;
                case 1:
                    return STARTED;
                case 2:
                    return SUCCESS;
                case 3:
                    return FAILED;
                case 4:
                    return ABORTED;
                case 5:
                    return PARTIAL;
                default:
                    throw new IllegalArgumentException("No snapshot state for value [" + value + "]");
            }
        }        

and this part must be re-written as:

                        case SUCCESS:
                            state = SnapshotsInProgress.State.SUCCESS;
                            break;
                        case PARTIAL:
                            state = SnapshotsInProgress.State.PARTIAL;
                            break;

@kotwanikunal
Copy link
Member

Hey @uzhinskiy! Thanks for reporting the bug and the dive.
Since you have done all the groundwork, do you want to contribute the fix with a PR?

@kotwanikunal kotwanikunal added the good first issue Good for newcomers label Jun 8, 2023
@Ayan07
Copy link

Ayan07 commented Jun 20, 2023

@kotwanikunal I'm interested in contributing to this project as a good first issue.
Can you assign this issue to me if this is open?
Thanks!

@kotwanikunal
Copy link
Member

Thanks for the interest @Ayan07! This issue is open and ready to be picked up.
I have assigned it to you.
You can refer to the developer guidelines for following the correct process and environment setup - https://github.com/opensearch-project/OpenSearch/blob/main/DEVELOPER_GUIDE.md

Feel free to drop in any concerns/questions on this issue and someone will help you out.

@aggarwalShivani
Copy link
Contributor

Hi @Ayan07, are you still working on this issue? If not, I could raise a PR with the fix. cc: @kotwanikunal

@aggarwalShivani
Copy link
Contributor

Hi @kotwanikunal
I could pick up this issue and raise a PR if its okay :) since we've not had any activity on this since last 7 months. Pls let me know your view.


One query though -> I see this section in the code, where the comment says this behaviour would be changed in next major release. Since the main branch is on 3.x version now, is it okay to make the change now to the main branch?

                        case PARTIAL:
                            // Translating both PARTIAL and SUCCESS to SUCCESS for now
                            // TODO: add the differentiation on the metadata level in the next major release

@kotwanikunal
Copy link
Member

Hey @aggarwalShivani! Thanks for the contribution.
I have assigned the issue to you. You can work on your changes and raise a PR against main.
The linked comment is still a TODO and I don't think it has been worked on yet.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working distributed framework good first issue Good for newcomers
Projects
None yet
6 participants