Skip to content
This repository has been archived by the owner on Jun 19, 2024. It is now read-only.

Indexer randomly halts #155

Closed
opsecx opened this issue Feb 26, 2024 · 18 comments
Closed

Indexer randomly halts #155

opsecx opened this issue Feb 26, 2024 · 18 comments
Assignees

Comments

@opsecx
Copy link
Contributor

opsecx commented Feb 26, 2024

The indexer occasionally halts without much warning. it just gets stuck. requires manual restart after which it processes the remaining blocks fine. Zenode has this issue too.

@zenodeapp
Copy link
Contributor

Yes! Got the same happening! I'm still unsure how this happens. I do use the make postgres, make run_server and make run_indexer separately whereas some make use of make compose.

I also may have the issue where the indexer gets overly used for it being utilized on my site. But don't think that should be the case.

I checked my logs and this is the only log I could find, happening before a halt that may give some insight to this.

2024-02-26 09:44:43.854 UTC [484264] FATAL:  password authentication failed for
user "psql"
2024-02-26 09:44:43.854 UTC [484264] DETAIL:  Role "psql" does not exist.
        Connection matched pg_hba.conf line 100: "host all all all scram-sha-256"
2024-02-26 09:45:01.189 UTC [484267] FATAL:  password authentication failed for
user "psql"
2024-02-26 09:45:01.189 UTC [484267] DETAIL:  Role "psql" does not exist.
        Connection matched pg_hba.conf line 100: "host all all all scram-sha-256"
2024-02-26 09:45:18.794 UTC [484271] FATAL:  password authentication failed for
user "psql"
2024-02-26 09:45:18.794 UTC [484271] DETAIL:  Role "psql" does not exist.
        Connection matched pg_hba.conf line 100: "host all all all scram-sha-256"
2024-02-26 09:45:36.433 UTC [484275] FATAL:  password authentication failed for
user "psql"
2024-02-26 09:45:36.433 UTC [484275] DETAIL:  Role "psql" does not exist.
        Connection matched pg_hba.conf line 100: "host all all all scram-sha-256"
2024-02-26 10:21:46.226 UTC [484354] FATAL:  password authentication failed for
user "postgres"
2024-02-26 10:21:46.226 UTC [484354] DETAIL:  Connection matched pg_hba.conf line 100: "host all all all scram-sha-256"
2024-02-26 10:21:46.296 UTC [484355] FATAL:  password authentication failed for
user "postgres"
2024-02-26 10:21:46.296 UTC [484355] DETAIL:  Connection matched pg_hba.conf line 100: "host all all all scram-sha-256"
2024-02-26 10:21:46.367 UTC [484356] FATAL:  password authentication failed for
user "postgres"
2024-02-26 10:21:46.367 UTC [484356] DETAIL:  Connection matched pg_hba.conf line 100: "host all all all scram-sha-256"
2024-02-26 10:21:46.457 UTC [484357] FATAL:  password authentication failed for
user "postgres"
2024-02-26 10:21:46.457 UTC [484357] DETAIL:  Connection matched pg_hba.conf line 100: "host all all all scram-sha-256"
2024-02-26 10:21:46.556 UTC [484358] FATAL:  password authentication failed for
user "postgres"
2024-02-26 10:21:46.556 UTC [484358] DETAIL:  Connection matched pg_hba.conf line 100: "host all all all scram-sha-256"
2024-02-26 10:21:46.685 UTC [484359] FATAL:  password authentication failed for
user "postgres"
2024-02-26 10:21:46.685 UTC [484359] DETAIL:  Connection matched pg_hba.conf line 100: "host all all all scram-sha-256"
2024-02-26 10:21:46.776 UTC [484360] FATAL:  password authentication failed for
user "postgres"
2024-02-26 10:21:46.776 UTC [484360] DETAIL:  Connection matched pg_hba.conf line 100: "host all all all scram-sha-256"
2024-02-26 10:21:46.931 UTC [484361] FATAL:  password authentication failed for
user "postgres"
2024-02-26 10:21:46.931 UTC [484361] DETAIL:  Connection matched pg_hba.conf line 100: "host all all all scram-sha-256"
2024-02-26 10:21:47.023 UTC [484362] FATAL:  password authentication failed for
user "postgres"
2024-02-26 10:21:47.023 UTC [484362] DETAIL:  Connection matched pg_hba.conf line 100: "host all all all scram-sha-256"
2024-02-26 10:21:47.085 UTC [484363] FATAL:  password authentication failed for
user "postgres"
2024-02-26 10:21:47.085 UTC [484363] DETAIL:  Connection matched pg_hba.conf line 100: "host all all all scram-sha-256"
2024-02-26 10:21:47.180 UTC [484364] FATAL:  password authentication failed for
user "postgres"
2024-02-26 10:21:47.180 UTC [484364] DETAIL:  Connection matched pg_hba.conf line 100: "host all all all scram-sha-256"
2024-02-26 10:21:47.265 UTC [484365] FATAL:  password authentication failed for
user "postgres"
2024-02-26 10:21:47.265 UTC [484365] DETAIL:  Connection matched pg_hba.conf line 100: "host all all all scram-sha-256"
2024-02-26 10:21:47.321 UTC [484366] FATAL:  password authentication failed for
user "postgres"
2024-02-26 10:21:47.321 UTC [484366] DETAIL:  Connection matched pg_hba.conf line 100: "host all all all scram-sha-256"
2024-02-26 10:21:47.392 UTC [484367] FATAL:  password authentication failed for
user "postgres"
2024-02-26 10:21:47.392 UTC [484367] DETAIL:  Connection matched pg_hba.conf line 100: "host all all all scram-sha-256"
2024-02-26 10:21:47.445 UTC [484368] FATAL:  password authentication failed for
user "postgres"
2024-02-26 10:21:47.445 UTC [484368] DETAIL:  Connection matched pg_hba.conf line 100: "host all all all scram-sha-256"
2024-02-26 10:21:47.514 UTC [484369] FATAL:  password authentication failed for
user "postgres"
2024-02-26 10:21:47.514 UTC [484369] DETAIL:  Connection matched pg_hba.conf line 100: "host all all all scram-sha-256"

@zenodeapp
Copy link
Contributor

zenodeapp commented Feb 26, 2024

Lol wait a sec. Are people trying to get in?

I see usernames in the log that don't match my own. Also password authentication failing.

Could this cause postgres <=> indexer connection to get interrupted for long enough to time out?

@zenodeapp
Copy link
Contributor

Lol yeah, I see admin, root, a bunch of other attempts. Hmm.

@zenodeapp
Copy link
Contributor

I saw that my server (in the config) was serving at 0.0.0.0:30303, changed it to 127.0.0.1:30303. Though unsure if this solves this particular issue.

I'm a bit of a noob when it comes to sql or just networking in general. Perhaps you might know better @opsecx?

@zenodeapp
Copy link
Contributor

Oh I see this also being new in the recent version of namadexer:

cors_allow_origins = []

@opsecx
Copy link
Contributor Author

opsecx commented Feb 27, 2024

I saw that my server (in the config) was serving at 0.0.0.0:30303, changed it to 127.0.0.1:30303. Though unsure if this solves this particular issue.

I'm a bit of a noob when it comes to sql or just networking in general. Perhaps you might know better @opsecx?

the server part has nothing to do with how the indexer part runs afaik

@zenodeapp
Copy link
Contributor

zenodeapp commented Feb 27, 2024

I saw that my server (in the config) was serving at 0.0.0.0:30303, changed it to 127.0.0.1:30303. Though unsure if this solves this particular issue.

I'm a bit of a noob when it comes to sql or just networking in general. Perhaps you might know better @opsecx?

the server part has nothing to do with how the indexer part runs afaik

I've now configured my sql server to only allow local connections. I now see which ip address keeps trying to log in. It's coming from China. Sigh.

# TYPE DTABASE        USER            ADDRESS                 METHOD
# TYPE DATABASE        USER            ADDRESS                 METHOD
# TYPE DATABASE        USER            ADDRESS                 METHOD
# "local" is for Unix domain socket connections only
local   all             all                                     scram-sha-256
# IPv4 local connections:
host    all             all             127.0.0.1/32            scram-sha-256
# IPv6 local connections:
host    all             all             ::1/128                 scram-sha-256
# Allow replication connections from localhost, by a user with the
# replication privilege.
local   replication     all                                     scram-sha-256
host    replication     all             127.0.0.1/32            scram-sha-256
host    replication     all             ::1/128                 scram-sha-256

Did this in the pg_hba.conf file plus one extra line below IPv4 to allow the namadexer server to contact the postgres docker (omitted that line).

Dunno if this is okay, I'm no expert when it comes to this.

@zenodeapp
Copy link
Contributor

zenodeapp commented Feb 27, 2024

One final thing, @rllola do you reckon people who implement the indexer without taking any security measures, could become victims to cybercrime dicktwats? I'm not sure what they could do if they logged into the postgres docker.

Are there things that could be implemented to make the indexer a bit more secure from the get-go? Or is this really in the hands of the person integrating this?

@rllola
Copy link
Contributor

rllola commented Feb 28, 2024

people who implement the indexer without taking any security measures

Do you mean 'run' the indexer ?

Regarding your database logs, it is because you let your database open to the world and there is people (good and bad) masscanning the internet and they will attempt to get in whatever they found (see this defcon talk https://www.youtube.com/watch?v=nX9JXI4l3-E)

If they happen to connect they can actually read the data or erase them. Blockchain data are public so no much harm there. If they erase the tables, you will need to resync again... However they "could" access what else is on a server. I believe having it containerized would mitigate it.

But as a best practice just don't make your database public (or anything). Also replace the password wow (this one wasn't meant to be used).

Regarding the halt now, I don't think it is being related to what you saw in the postgres logs. My guess is that the JSON-RPC request could have been hanging and never finalize, leaving the indexer waiting for an answer. An easy way to find out would be to change the timeout value to a small one and see if it works.

@opsecx
Copy link
Contributor Author

opsecx commented Feb 28, 2024

That's the request between the indexer component and the RPC? Could setting a bigger timeout value make it not "halt"?

@zenodeapp
Copy link
Contributor

people who implement the indexer without taking any security measures

Do you mean 'run' the indexer ?

Regarding your database logs, it is because you let your database open to the world and there is people (good and bad) masscanning the internet and they will attempt to get in whatever they found (see this defcon talk https://www.youtube.com/watch?v=nX9JXI4l3-E)

If they happen to connect they can actually read the data or erase them. Blockchain data are public so no much harm there. If they erase the tables, you will need to resync again... However they "could" access what else is on a server. I believe having it containerized would mitigate it.

But as a best practice just don't make your database public (or anything). Also replace the password wow (this one wasn't meant to be used).

Right! Thank you for the thorough explanation and reference!

What I mean are the ones who simply follow the steps to create a namadexer, but don't consider at all to think of security measures (which is pretty silly). I'm now figuring it out by experiencing it, though others could perhaps be saved from this by being made attentive or be linked to a page explaining how to secure their postgres db further.

Though yes, it's really a lack of experience with postgres that I stumbled upon this. I like to learn usually by hands on experience, so I'm "glad" this happened :)!

Regarding the halt now, I don't think it is being related to what you saw in the postgres logs. My guess is that the JSON-RPC request could have been hanging and never finalize, leaving the indexer waiting for an answer. An easy way to find out would be to change the timeout value to a small one and see if it works.

Ah so the new timeout in the Settings.toml file?

@rllola
Copy link
Contributor

rllola commented Feb 29, 2024

That's the request between the indexer component and the RPC? Could setting a bigger timeout value make it not "halt"?

Yes, it is the request between the indexer and the RPC. We actually want it to unstuck and to do so it need to fail fast so we can retry the query. There could be an issue in the side of the node that make it stuck too. It could be entirely something else too. We have been able to reproduce it so I am confident we can find what is happening.

@rllola
Copy link
Contributor

rllola commented Feb 29, 2024

What I mean are the ones who simply follow the steps to create a namadexer, but don't consider at all to think of security measures (which is pretty silly). I'm now figuring it out by experiencing it, though others could perhaps be saved from this by being made attentive or be linked to a page explaining how to secure their postgres db further.

A good first would be indeed to improve the documentation and to highlight what need to be change for good practice. Even people with experience will disregard cyber security. We could also remove the default value but it will force people to fill them and they might be overwhelmed and discourage to actually run it.

In my experience, tutorial and guides (videos or blog post) are the best way to help people start. So if anyone from the community want to create one from their perspective we would greatly appreciate it.

@rllola
Copy link
Contributor

rllola commented Mar 4, 2024

An update on this issue

Unfortunately the fix that I thought would take 2sec to implement is not that trivial. Indeed tendermint-rpc doesn't have an option for changing the timeout and it is by default in no timeout mode (see informalsystems/tendermint-rs#1379). It does, however, validate our hypothesis that it is the request being stuck that halt the indexer.

Fortunately, there is an opened PR to fix it but is not yet merged. Let's see if we can get it merged in the next weeks if not we can just get rid of the this lib and write our own request functions.

@rllola rllola added the BLOCKED label Mar 4, 2024
@opsecx
Copy link
Contributor Author

opsecx commented Mar 4, 2024

An update on this issue

Unfortunately the fix that I thought would take 2sec to implement is not that trivial. Indeed tendermint-rpc doesn't have an option for changing the timeout and it is by default in no timeout mode (see informalsystems/tendermint-rs#1379). It does, however, validate our hypothesis that it is the request being stuck that halt the indexer.

Fortunately, there is an opened PR to fix it but is not yet merged. Let's see if we can get it merged in the next weeks if not we can just get rid of the this lib and write our own request functions.

Sounds great! Unfortunately for me, I'm working with the tables that were deleted, so leaves me in a bit of an in-between situation. Any chance of the same data being provided as views or similar?

@ainhoa-a
Copy link
Member

This has now been merged informalsystems/tendermint-rs#1379 we are unblocked

@ainhoa-a ainhoa-a removed the BLOCKED label Mar 19, 2024
@rllola rllola self-assigned this Mar 20, 2024
@rllola
Copy link
Contributor

rllola commented Mar 20, 2024

This PR should fix the issue definitively : #168

@opsecx
Copy link
Contributor Author

opsecx commented Mar 20, 2024

This PR should fix the issue definitively : #168

Great work!

@rllola rllola closed this as completed Mar 24, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants