Indexer randomly halts #155

opsecx · 2024-02-26T23:16:35Z

The indexer occasionally halts without much warning. it just gets stuck. requires manual restart after which it processes the remaining blocks fine. Zenode has this issue too.

zenodeapp · 2024-02-26T23:35:37Z

Yes! Got the same happening! I'm still unsure how this happens. I do use the make postgres, make run_server and make run_indexer separately whereas some make use of make compose.

I also may have the issue where the indexer gets overly used for it being utilized on my site. But don't think that should be the case.

I checked my logs and this is the only log I could find, happening before a halt that may give some insight to this.

2024-02-26 09:44:43.854 UTC [484264] FATAL:  password authentication failed for
user "psql"
2024-02-26 09:44:43.854 UTC [484264] DETAIL:  Role "psql" does not exist.
        Connection matched pg_hba.conf line 100: "host all all all scram-sha-256"
2024-02-26 09:45:01.189 UTC [484267] FATAL:  password authentication failed for
user "psql"
2024-02-26 09:45:01.189 UTC [484267] DETAIL:  Role "psql" does not exist.
        Connection matched pg_hba.conf line 100: "host all all all scram-sha-256"
2024-02-26 09:45:18.794 UTC [484271] FATAL:  password authentication failed for
user "psql"
2024-02-26 09:45:18.794 UTC [484271] DETAIL:  Role "psql" does not exist.
        Connection matched pg_hba.conf line 100: "host all all all scram-sha-256"
2024-02-26 09:45:36.433 UTC [484275] FATAL:  password authentication failed for
user "psql"
2024-02-26 09:45:36.433 UTC [484275] DETAIL:  Role "psql" does not exist.
        Connection matched pg_hba.conf line 100: "host all all all scram-sha-256"
2024-02-26 10:21:46.226 UTC [484354] FATAL:  password authentication failed for
user "postgres"
2024-02-26 10:21:46.226 UTC [484354] DETAIL:  Connection matched pg_hba.conf line 100: "host all all all scram-sha-256"
2024-02-26 10:21:46.296 UTC [484355] FATAL:  password authentication failed for
user "postgres"
2024-02-26 10:21:46.296 UTC [484355] DETAIL:  Connection matched pg_hba.conf line 100: "host all all all scram-sha-256"
2024-02-26 10:21:46.367 UTC [484356] FATAL:  password authentication failed for
user "postgres"
2024-02-26 10:21:46.367 UTC [484356] DETAIL:  Connection matched pg_hba.conf line 100: "host all all all scram-sha-256"
2024-02-26 10:21:46.457 UTC [484357] FATAL:  password authentication failed for
user "postgres"
2024-02-26 10:21:46.457 UTC [484357] DETAIL:  Connection matched pg_hba.conf line 100: "host all all all scram-sha-256"
2024-02-26 10:21:46.556 UTC [484358] FATAL:  password authentication failed for
user "postgres"
2024-02-26 10:21:46.556 UTC [484358] DETAIL:  Connection matched pg_hba.conf line 100: "host all all all scram-sha-256"
2024-02-26 10:21:46.685 UTC [484359] FATAL:  password authentication failed for
user "postgres"
2024-02-26 10:21:46.685 UTC [484359] DETAIL:  Connection matched pg_hba.conf line 100: "host all all all scram-sha-256"
2024-02-26 10:21:46.776 UTC [484360] FATAL:  password authentication failed for
user "postgres"
2024-02-26 10:21:46.776 UTC [484360] DETAIL:  Connection matched pg_hba.conf line 100: "host all all all scram-sha-256"
2024-02-26 10:21:46.931 UTC [484361] FATAL:  password authentication failed for
user "postgres"
2024-02-26 10:21:46.931 UTC [484361] DETAIL:  Connection matched pg_hba.conf line 100: "host all all all scram-sha-256"
2024-02-26 10:21:47.023 UTC [484362] FATAL:  password authentication failed for
user "postgres"
2024-02-26 10:21:47.023 UTC [484362] DETAIL:  Connection matched pg_hba.conf line 100: "host all all all scram-sha-256"
2024-02-26 10:21:47.085 UTC [484363] FATAL:  password authentication failed for
user "postgres"
2024-02-26 10:21:47.085 UTC [484363] DETAIL:  Connection matched pg_hba.conf line 100: "host all all all scram-sha-256"
2024-02-26 10:21:47.180 UTC [484364] FATAL:  password authentication failed for
user "postgres"
2024-02-26 10:21:47.180 UTC [484364] DETAIL:  Connection matched pg_hba.conf line 100: "host all all all scram-sha-256"
2024-02-26 10:21:47.265 UTC [484365] FATAL:  password authentication failed for
user "postgres"
2024-02-26 10:21:47.265 UTC [484365] DETAIL:  Connection matched pg_hba.conf line 100: "host all all all scram-sha-256"
2024-02-26 10:21:47.321 UTC [484366] FATAL:  password authentication failed for
user "postgres"
2024-02-26 10:21:47.321 UTC [484366] DETAIL:  Connection matched pg_hba.conf line 100: "host all all all scram-sha-256"
2024-02-26 10:21:47.392 UTC [484367] FATAL:  password authentication failed for
user "postgres"
2024-02-26 10:21:47.392 UTC [484367] DETAIL:  Connection matched pg_hba.conf line 100: "host all all all scram-sha-256"
2024-02-26 10:21:47.445 UTC [484368] FATAL:  password authentication failed for
user "postgres"
2024-02-26 10:21:47.445 UTC [484368] DETAIL:  Connection matched pg_hba.conf line 100: "host all all all scram-sha-256"
2024-02-26 10:21:47.514 UTC [484369] FATAL:  password authentication failed for
user "postgres"
2024-02-26 10:21:47.514 UTC [484369] DETAIL:  Connection matched pg_hba.conf line 100: "host all all all scram-sha-256"

zenodeapp · 2024-02-26T23:35:58Z

Lol wait a sec. Are people trying to get in?

I see usernames in the log that don't match my own. Also password authentication failing.

Could this cause postgres <=> indexer connection to get interrupted for long enough to time out?

zenodeapp · 2024-02-26T23:41:03Z

Lol yeah, I see admin, root, a bunch of other attempts. Hmm.

zenodeapp · 2024-02-27T00:01:11Z

I saw that my server (in the config) was serving at 0.0.0.0:30303, changed it to 127.0.0.1:30303. Though unsure if this solves this particular issue.

I'm a bit of a noob when it comes to sql or just networking in general. Perhaps you might know better @opsecx?

zenodeapp · 2024-02-27T00:04:27Z

Oh I see this also being new in the recent version of namadexer:

cors_allow_origins = []

opsecx · 2024-02-27T03:23:02Z

I saw that my server (in the config) was serving at 0.0.0.0:30303, changed it to 127.0.0.1:30303. Though unsure if this solves this particular issue.

I'm a bit of a noob when it comes to sql or just networking in general. Perhaps you might know better @opsecx?

the server part has nothing to do with how the indexer part runs afaik

zenodeapp · 2024-02-27T10:33:20Z

I saw that my server (in the config) was serving at 0.0.0.0:30303, changed it to 127.0.0.1:30303. Though unsure if this solves this particular issue.

I'm a bit of a noob when it comes to sql or just networking in general. Perhaps you might know better @opsecx?

the server part has nothing to do with how the indexer part runs afaik

I've now configured my sql server to only allow local connections. I now see which ip address keeps trying to log in. It's coming from China. Sigh.

# TYPE DTABASE        USER            ADDRESS                 METHOD
# TYPE DATABASE        USER            ADDRESS                 METHOD
# TYPE DATABASE        USER            ADDRESS                 METHOD
# "local" is for Unix domain socket connections only
local   all             all                                     scram-sha-256
# IPv4 local connections:
host    all             all             127.0.0.1/32            scram-sha-256
# IPv6 local connections:
host    all             all             ::1/128                 scram-sha-256
# Allow replication connections from localhost, by a user with the
# replication privilege.
local   replication     all                                     scram-sha-256
host    replication     all             127.0.0.1/32            scram-sha-256
host    replication     all             ::1/128                 scram-sha-256

Did this in the pg_hba.conf file plus one extra line below IPv4 to allow the namadexer server to contact the postgres docker (omitted that line).

Dunno if this is okay, I'm no expert when it comes to this.

zenodeapp · 2024-02-27T11:21:08Z

One final thing, @rllola do you reckon people who implement the indexer without taking any security measures, could become victims to cybercrime dicktwats? I'm not sure what they could do if they logged into the postgres docker.

Are there things that could be implemented to make the indexer a bit more secure from the get-go? Or is this really in the hands of the person integrating this?

rllola · 2024-02-28T13:30:38Z

people who implement the indexer without taking any security measures

Do you mean 'run' the indexer ?

Regarding your database logs, it is because you let your database open to the world and there is people (good and bad) masscanning the internet and they will attempt to get in whatever they found (see this defcon talk https://www.youtube.com/watch?v=nX9JXI4l3-E)

If they happen to connect they can actually read the data or erase them. Blockchain data are public so no much harm there. If they erase the tables, you will need to resync again... However they "could" access what else is on a server. I believe having it containerized would mitigate it.

But as a best practice just don't make your database public (or anything). Also replace the password wow (this one wasn't meant to be used).

Regarding the halt now, I don't think it is being related to what you saw in the postgres logs. My guess is that the JSON-RPC request could have been hanging and never finalize, leaving the indexer waiting for an answer. An easy way to find out would be to change the timeout value to a small one and see if it works.

opsecx · 2024-02-28T23:33:04Z

That's the request between the indexer component and the RPC? Could setting a bigger timeout value make it not "halt"?

zenodeapp · 2024-02-28T23:52:47Z

people who implement the indexer without taking any security measures

Do you mean 'run' the indexer ?

Regarding your database logs, it is because you let your database open to the world and there is people (good and bad) masscanning the internet and they will attempt to get in whatever they found (see this defcon talk https://www.youtube.com/watch?v=nX9JXI4l3-E)

If they happen to connect they can actually read the data or erase them. Blockchain data are public so no much harm there. If they erase the tables, you will need to resync again... However they "could" access what else is on a server. I believe having it containerized would mitigate it.

But as a best practice just don't make your database public (or anything). Also replace the password wow (this one wasn't meant to be used).

Right! Thank you for the thorough explanation and reference!

What I mean are the ones who simply follow the steps to create a namadexer, but don't consider at all to think of security measures (which is pretty silly). I'm now figuring it out by experiencing it, though others could perhaps be saved from this by being made attentive or be linked to a page explaining how to secure their postgres db further.

Though yes, it's really a lack of experience with postgres that I stumbled upon this. I like to learn usually by hands on experience, so I'm "glad" this happened :)!

Regarding the halt now, I don't think it is being related to what you saw in the postgres logs. My guess is that the JSON-RPC request could have been hanging and never finalize, leaving the indexer waiting for an answer. An easy way to find out would be to change the timeout value to a small one and see if it works.

Ah so the new timeout in the Settings.toml file?

rllola · 2024-02-29T08:45:08Z

That's the request between the indexer component and the RPC? Could setting a bigger timeout value make it not "halt"?

Yes, it is the request between the indexer and the RPC. We actually want it to unstuck and to do so it need to fail fast so we can retry the query. There could be an issue in the side of the node that make it stuck too. It could be entirely something else too. We have been able to reproduce it so I am confident we can find what is happening.

rllola · 2024-02-29T08:53:52Z

What I mean are the ones who simply follow the steps to create a namadexer, but don't consider at all to think of security measures (which is pretty silly). I'm now figuring it out by experiencing it, though others could perhaps be saved from this by being made attentive or be linked to a page explaining how to secure their postgres db further.

A good first would be indeed to improve the documentation and to highlight what need to be change for good practice. Even people with experience will disregard cyber security. We could also remove the default value but it will force people to fill them and they might be overwhelmed and discourage to actually run it.

In my experience, tutorial and guides (videos or blog post) are the best way to help people start. So if anyone from the community want to create one from their perspective we would greatly appreciate it.

rllola · 2024-03-04T13:34:54Z

An update on this issue

Unfortunately the fix that I thought would take 2sec to implement is not that trivial. Indeed tendermint-rpc doesn't have an option for changing the timeout and it is by default in no timeout mode (see informalsystems/tendermint-rs#1379). It does, however, validate our hypothesis that it is the request being stuck that halt the indexer.

Fortunately, there is an opened PR to fix it but is not yet merged. Let's see if we can get it merged in the next weeks if not we can just get rid of the this lib and write our own request functions.

opsecx · 2024-03-04T17:41:27Z

An update on this issue

Unfortunately the fix that I thought would take 2sec to implement is not that trivial. Indeed tendermint-rpc doesn't have an option for changing the timeout and it is by default in no timeout mode (see informalsystems/tendermint-rs#1379). It does, however, validate our hypothesis that it is the request being stuck that halt the indexer.

Fortunately, there is an opened PR to fix it but is not yet merged. Let's see if we can get it merged in the next weeks if not we can just get rid of the this lib and write our own request functions.

Sounds great! Unfortunately for me, I'm working with the tables that were deleted, so leaves me in a bit of an in-between situation. Any chance of the same data being provided as views or similar?

ainhoa-a · 2024-03-19T10:02:48Z

This has now been merged informalsystems/tendermint-rs#1379 we are unblocked

rllola · 2024-03-20T10:54:50Z

This PR should fix the issue definitively : #168

opsecx · 2024-03-20T13:06:39Z

This PR should fix the issue definitively : #168

Great work!

rllola added the BLOCKED label Mar 4, 2024

ainhoa-a removed the BLOCKED label Mar 19, 2024

rllola self-assigned this Mar 20, 2024

rllola closed this as completed Mar 24, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Indexer randomly halts #155

Indexer randomly halts #155

opsecx commented Feb 26, 2024

zenodeapp commented Feb 26, 2024

zenodeapp commented Feb 26, 2024 •

edited

Loading

zenodeapp commented Feb 26, 2024

zenodeapp commented Feb 27, 2024

zenodeapp commented Feb 27, 2024

opsecx commented Feb 27, 2024

zenodeapp commented Feb 27, 2024 •

edited

Loading

zenodeapp commented Feb 27, 2024 •

edited

Loading

rllola commented Feb 28, 2024

opsecx commented Feb 28, 2024

zenodeapp commented Feb 28, 2024

rllola commented Feb 29, 2024

rllola commented Feb 29, 2024

rllola commented Mar 4, 2024

opsecx commented Mar 4, 2024

An update on this issue

ainhoa-a commented Mar 19, 2024

rllola commented Mar 20, 2024

opsecx commented Mar 20, 2024

Indexer randomly halts #155

Indexer randomly halts #155

Comments

opsecx commented Feb 26, 2024

zenodeapp commented Feb 26, 2024

zenodeapp commented Feb 26, 2024 • edited Loading

zenodeapp commented Feb 26, 2024

zenodeapp commented Feb 27, 2024

zenodeapp commented Feb 27, 2024

opsecx commented Feb 27, 2024

zenodeapp commented Feb 27, 2024 • edited Loading

zenodeapp commented Feb 27, 2024 • edited Loading

rllola commented Feb 28, 2024

opsecx commented Feb 28, 2024

zenodeapp commented Feb 28, 2024

rllola commented Feb 29, 2024

rllola commented Feb 29, 2024

rllola commented Mar 4, 2024

An update on this issue

opsecx commented Mar 4, 2024

An update on this issue

ainhoa-a commented Mar 19, 2024

rllola commented Mar 20, 2024

opsecx commented Mar 20, 2024

zenodeapp commented Feb 26, 2024 •

edited

Loading

zenodeapp commented Feb 27, 2024 •

edited

Loading

zenodeapp commented Feb 27, 2024 •

edited

Loading