postgres-sync-demo

A demo on how to use triggers, queues, etc. to sync the app's data somewhere else.

Context

Imagine you have a system MyMovieDb (MDB).

MyMovieDb uses a postgres database.

Now we want to create the system MyMovieWatchList (WL).

WL wants to use data from MDB.

WL wants to be certain it has gotten all the data from MDB.

WL wants to still work even if MDB is down (e.g maintenance).

So WL can't use a REST API or similiar.

WL has an unspecified datastore.

Decision

A postgres trigger is used to create another table entry as part of the transaction when data is changed. Due to transactionality the "queue table" entry is only added if the data was actually changed.

To update the target, select an entry from the queue table and insert/update/delete the data. The entry is only deleted from the queue if the write to the target was successful (e.g. commit).

To allow parallel processing the rows are locked with SELECT ... FOR UPDATE.

To skip locked rows ... SKIP LOCKED is used.

The entry is immediately deleted via DELETE FROM ... USING ... RETURNING.

To process the entries immediately NOTIFY/LISTEN is used.

Every 60 seconds the table is checked for events where the NOTIFY was missed.

You can find this implemented in postgres-schema.sql and MovieSyncEventRepository.java.

LISTEN is implemented in MovieSyncEventPostgresR2dbcNotificationListener.java.

The scheduled check is implemented in MovieSyncServiceTrigger.java.

To see it in action run MovieSyncServiceTest.java.

Alternatives not chosen

Write data to e.g. RabbitMQ or Kafka as part of the transaction or after the commit
- when the write to RabbitMQ/Kafka fails there is no way of knowing which data is not in sync anymore
- this is a two phase commit style problem
Debezium could be used but was not chosen due to a small number of writes

Consequences

We can't easily add more consumers.
- We could write the data to e.g. RabbitMQ when we process the queue table entries
- This shows that the proposed design is evolvable.
The target datastore must be fine with data being only partially in sync
You can't parallelize writing the target datastore - otherwise a DELETE can move before the INSERT

Performance

Run in a Docker environment.

Just inserts: 903-912 operations/second

Inserts and the trigger to write to the queue table: 812-833 operations/second

Inserts, the trigger and direct sync to a different table: 186-185 operations/second

Inserts, the trigger and sync via RabbitMQ to a different table: 118-126 operations/second

Name		Name	Last commit message	Last commit date
Latest commit History 294 Commits
.github		.github
.idea		.idea
.mvn/wrapper		.mvn/wrapper
.settings		.settings
.vscode		.vscode
src		src
.editorconfig		.editorconfig
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
docker-compose.yml		docker-compose.yml
jitpack.yml		jitpack.yml
lombok.config		lombok.config
mvnw		mvnw
mvnw.cmd		mvnw.cmd
pom.xml		pom.xml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

postgres-sync-demo

Context

Decision

Alternatives not chosen

Consequences

Performance

Resources

About

Releases

Packages

Contributors 2

Languages

License

ArloL/postgres-sync-demo

Folders and files

Latest commit

History

Repository files navigation

postgres-sync-demo

Context

Decision

Alternatives not chosen

Consequences

Performance

Resources

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages