Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] Ability to recover / recreate one of the nodes when replication is enabled #406

Open
taxilian opened this issue Feb 22, 2024 · 3 comments
Labels
feature replication Asynchronous replication

Comments

@taxilian
Copy link

taxilian commented Feb 22, 2024

Is your feature request related to a problem? Please describe.

I had an instance where I lost one of my nodes and was using storage mapped to the local node -- that means the pod and its data is effectively gone and had to be created.

When the pod came back up it was not part of the cluster and could not be used.

Describe the solution you'd like
The controller should detect this happening and automatically recreate / reinitialize / whatever the pod so that it joins the cluster

Describe alternatives you've considered
I'm sure it's possible to do manually, but I can't find a concise explanation of how to do so. I could also do a backup and make a new cluster from that backup, of course -- but it's not really ideal.

Environment details:

  • Kubernetes version: 1.27.11
  • Kubernetes distribution: kubeadm
  • mariadb-operator version: v0.0.25
  • Install method: helm
  • Install flavor: recommended

MariaDB manifest

This is what I'm using:

apiVersion: mariadb.mmontes.io/v1alpha1
kind: MariaDB
metadata:
  name: mariadb-wordpress
  namespace: wordpress
spec:
  rootPasswordSecretKeyRef:
    name: mariadb
    key: root-password

  database: mariadb
  username: mariadb
  passwordSecretKeyRef:
    name: mariadb
    key: password

  image: mariadb:11.0.3

  port: 3306

  replicas: 3

  replication:
    enabled: true
    primary:
      automaticFailover: true
    replica:
      waitPoint: AfterSync
      gtid: CurrentPos
      replPasswordSecretKeyRef:
        name: mariadb
        key: password
      connectionTimeout: 10s
      connectionRetries: 10
      syncTimeout: 10s
    syncBinlog: true

  affinity:
    podAntiAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        - topologyKey: "kubernetes.io/hostname"

  podDisruptionBudget:
    maxUnavailable: 1

  updateStrategy:
    type: RollingUpdate

  primaryService:
    type: LoadBalancer

  myCnf: |
    [mariadb]
    bind-address=*
    default_storage_engine=InnoDB
    binlog_format=row
    innodb_autoinc_lock_mode=2
    max_allowed_packet=256M
    innodb_buffer_pool_size=6442450944
    query_cache_size=134217728

  volumeClaimTemplate:
    resources:
      requests:
        storage: 20Gi
    accessModes:
      - ReadWriteOnce
    storageClassName: local-hostpath

  resources:
    requests:
      cpu: 500m
      memory: 8Gi
    limits:
      cpu: 1000m
      memory: 12Gi
  
  env:
  - name: MARIADB_AUTO_UPGRADE
    value: "true"
@mmontes11 mmontes11 added the replication Asynchronous replication label Mar 13, 2024
@mmontes11
Copy link
Member

Hey there! Thanks for reporting.

We need to detect this situation and perform a new replica bootstrap, we have an issue for this in the roadmap:

As of today, you can do this manually by following the steps described in this comment:
#141 (comment)

@taxilian
Copy link
Author

Hmm; I've been trying to do so and it's not recovering :-( there doesn't seem to be any replication set up on the "new" slave.

It would be helpful to have some docs in the repo with this information and possibly some troubleshooting steps :-/

Currently when I use mariadb -u root -p"${MARIADB_ROOT_PASSWORD}" -e 'SHOW ALL REPLICAS STATUS \G;' I get no output at all on that server

@taxilian
Copy link
Author

And both of my replicas on that set (not sure how they are both bad, honestly, one should have been working) say Last_IO_Error: Fatal error: The slave I/O thread stops because master and slave have equal MariaDB server ids; these ids must be different for replication to work (or the --replicate-same-server-id option must be used on slave but this does not always make sense; please check the manual before using it).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature replication Asynchronous replication
Projects
None yet
Development

No branches or pull requests

2 participants