Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"Backup running" notification sometimes remains after backup is complete #694

Open
t-m-w opened this issue Jun 18, 2024 · 6 comments · May be fixed by #695
Open

"Backup running" notification sometimes remains after backup is complete #694

t-m-w opened this issue Jun 18, 2024 · 6 comments · May be fixed by #695
Assignees
Labels
Milestone

Comments

@t-m-w
Copy link
Collaborator

t-m-w commented Jun 18, 2024

The "Backup running" notification should always be removed when a backup is no longer running.

My last backup "failed" and notified me as such, but even now, a half an hour after the failure notification, there is still a notification present that says "Backup running" for Cromite.

I have occasionally seen this behavior (for various apps) even after an overall successful backup, too.

It can be confusing not knowing which to believe! 😄

The relevant lines might be the ones with NotificationBackupObserver?

Logcat snippet
06-18 10:06:52.070 1010168 20318 20332 I NotificationBackupObserver: Completed. Target: org.calyxos.ripple, status: 0
06-18 10:06:52.075 1010168 20318 20332 I FullBackup: Perform full backup for org.cromite.cromite.
06-18 10:06:52.079 1010168 20318 20332 I BackupCoordinator: Get backup quota for org.cromite.cromite. Is full backup: true.
06-18 10:06:52.079 1010168 20318 20332 I BackupCoordinator: Reported quota of 9223372036854775807 bytes.
06-18 10:06:52.538 1010168 20318 20332 I FullBackup: Check full backup size of 26632192 bytes.
06-18 10:06:52.545 1010168 20318 20332 D FullBackup: Initializing OutputStream for org.cromite.cromite.
06-18 10:06:52.549 1010168 20318 21201 I NotificationBackupObserver: Showing progress notification for org.calyxos.ripple 233/253
06-18 10:06:52.550 1010168 20318 21201 I NotificationBackupObserver: 233/253 - Cromite (org.cromite.cromite)
[...]
06-18 10:07:07.122 1010168 20318 20416 E FullBackup: Error handling backup data for org.cromite.cromite: 
06-18 10:07:07.122 1010168 20318 20416 E FullBackup: java.io.IOException: Pipe closed
[...]
06-18 10:11:52.249  1000  1581 14419 D BackupManagerService: Timeout message received for token=5b634c50
06-18 10:11:52.546  1000  1581 14419 D BackupManagerService: Timeout message received for token=432d53c
06-18 10:11:52.546  1000  1581 14419 V LifecycleOperationStorage: [UserID:10] Cancel: token=432d53c
06-18 10:11:52.546  1000  1581 14419 W PFTBT   : Full backup cancel of org.cromite.cromite
06-18 10:11:52.553 1010168 20318 20416 I BackupCoordinator: Cancel full backup of org.cromite.cromite because of UNKNOWN_ERROR
06-18 10:11:52.554 1010331 25347 25347 I Process : Sending signal. PID: 25347 SIG: 9
06-18 10:11:52.627  1000  1581  4506 I ActivityManager: Process org.cromite.cromite (pid 25347) has died: bkup TRNB
06-18 10:11:52.628 shell 10848  1587 D BinderSender: onProcessDied: pid=25347, uid=1010331
06-18 10:11:52.628 shell 10848 13061 D BinderSender: onUidGone: uid=1010331, disabled=false
06-18 10:11:52.628 shell 10848 13061 V BinderSender: Uid 1010331 dead
06-18 10:11:52.629  root   864   864 I Zygote  : Process 25347 exited due to signal 9 (Killed)
06-18 10:11:52.629  1000  1581  1809 I libprocessgroup: Removed cgroup /sys/fs/cgroup/uid_1010331/pid_25347
06-18 10:11:52.629  1000  1581  1791 D DisplayManagerService: Drop pending events for gone uid 1010331
06-18 10:11:53.079  1000  1581 14228 E AppOps  : Operation not found: uid=1041 pkg=audioserver(null) op=WAKE_LOCK
06-18 10:11:53.080  1000  1581 14223 E AppOps  : Operation not found: uid=1041 pkg=audioserver(null) op=WAKE_LOCK
06-18 10:11:53.081  1000  1581  4506 E AppOps  : Operation not found: uid=1041 pkg=audioserver(null) op=WAKE_LOCK
06-18 10:11:53.082  1000  1581  4506 E AppOps  : Operation not found: uid=1041 pkg=audioserver(null) op=WAKE_LOCK
06-18 10:11:53.082  1000  1581  4506 E AppOps  : Operation not found: uid=1041 pkg=audioserver(null) op=WAKE_LOCK
06-18 10:11:53.086  1000  1581  4506 E AppOps  : Operation not found: uid=1041 pkg=audioserver(null) op=WAKE_LOCK
06-18 10:11:54.545  1000  1581 14223 E AppOps  : Operation not found: uid=1041 pkg=audioserver(null) op=WAKE_LOCK
06-18 10:11:55.059  1046  1145  3541 D BufferPoolAccessor2.0: evictor expired: 1, evicted: 0
06-18 10:11:56.060 media  1117  6242 D BufferPoolAccessor2.0: evictor expired: 1, evicted: 0
06-18 10:11:56.907 10167  3490  3503 I ndroid.systemui: Background concurrent mark compact GC freed 92MB AllocSpace bytes, 195(3136KB) LOS objects, 70% free, 39MB/135MB, paused 3.052ms,6.968ms total 714.982ms
06-18 10:11:56.959  1000  1581  2049 D ActivityManager: freezing 21437 com.google.android.gms
06-18 10:11:56.961  1000  1581  2049 D ActivityManager: freezing 24296 org.tasks
06-18 10:11:56.964  1000  1581  2049 D ActivityManager: freezing 24322 com.android.externalstorage
06-18 10:11:56.965  1000  1581  2049 D ActivityManager: freezing 24788 com.android.providers.calendar
06-18 10:11:57.060 media  1117  6242 D BufferPoolAccessor2.0: bufferpool2 0xe47c3748 : 0(0 size) total buffers - 0(0 size) used buffers - 78/86 (recycle/alloc) - 8/166 (fetch/transfer)
06-18 10:11:57.060 media  1117  6242 D BufferPoolAccessor2.0: evictor expired: 1, evicted: 1
06-18 10:11:57.060  1046  1145  3541 D BufferPoolAccessor2.0: bufferpool2 0xb4000072c866aa58 : 0(0 size) total buffers - 0(0 size) used buffers - 77/81 (recycle/alloc) - 4/80 (fetch/transfer)
06-18 10:11:57.060  1046  1145  3541 D BufferPoolAccessor2.0: evictor expired: 1, evicted: 1
06-18 10:11:57.278 10167  3490  3505 W System  : A resource failed to call close. 
06-18 10:11:57.278 10167  3490  3505 W System  : A resource failed to call close. 
06-18 10:12:01.780 1010168 20318 20355 D WebDavStorage: getOutputStream (redacted)
06-18 10:12:01.798 1010168 20318 20416 I FullBackup: Cancel full backup
06-18 10:12:03.024 1010168 20318 20416 W FullBackup: Error cancelling full backup for org.cromite.cromite
06-18 10:12:03.024 1010168 20318 20416 W FullBackup: java.io.IOException: at.bitfire.dav4jvm.exception.NotFoundException: HTTP 404 
06-18 10:12:03.024 1010168 20318 20416 W FullBackup: 	at com.stevesoltys.seedvault.plugins.webdav.WebDavStoragePlugin.removeData(WebDavStoragePlugin.kt:159)
06-18 10:12:03.024 1010168 20318 20416 W FullBackup: 	at com.stevesoltys.seedvault.transport.backup.FullBackup.clearBackupData(FullBackup.kt:189)
06-18 10:12:03.024 1010168 20318 20416 W FullBackup: 	at com.stevesoltys.seedvault.transport.backup.FullBackup.cancelFullBackup(FullBackup.kt:196)
06-18 10:12:03.024 1010168 20318 20416 W FullBackup: 	at com.stevesoltys.seedvault.transport.backup.BackupCoordinator.cancelFullBackup(BackupCoordinator.kt:307)
06-18 10:12:03.024 1010168 20318 20416 W FullBackup: 	at com.stevesoltys.seedvault.transport.ConfigurableBackupTransport$cancelFullBackup$1.invokeSuspend(ConfigurableBackupTransport.kt:207)
06-18 10:12:03.024 1010168 20318 20416 W FullBackup: 	at kotlin.coroutines.jvm.internal.BaseContinuationImpl.resumeWith(ContinuationImpl.kt:33)
06-18 10:12:03.024 1010168 20318 20416 W FullBackup: 	at kotlinx.coroutines.DispatchedTask.run(DispatchedTask.kt:108)
06-18 10:12:03.024 1010168 20318 20416 W FullBackup: 	at kotlinx.coroutines.EventLoopImplBase.processNextEvent(EventLoop.common.kt:280)
06-18 10:12:03.024 1010168 20318 20416 W FullBackup: 	at kotlinx.coroutines.BlockingCoroutine.joinBlocking(Builders.kt:85)
06-18 10:12:03.024 1010168 20318 20416 W FullBackup: 	at kotlinx.coroutines.BuildersKt__BuildersKt.runBlocking(Builders.kt:59)
06-18 10:12:03.024 1010168 20318 20416 W FullBackup: 	at kotlinx.coroutines.BuildersKt.runBlocking(Unknown Source:1)
06-18 10:12:03.024 1010168 20318 20416 W FullBackup: 	at kotlinx.coroutines.BuildersKt__BuildersKt.runBlocking$default(Builders.kt:38)
06-18 10:12:03.024 1010168 20318 20416 W FullBackup: 	at kotlinx.coroutines.BuildersKt.runBlocking$default(Unknown Source:1)
06-18 10:12:03.024 1010168 20318 20416 W FullBackup: 	at com.stevesoltys.seedvault.transport.ConfigurableBackupTransport.cancelFullBackup(ConfigurableBackupTransport.kt:206)
06-18 10:12:03.024 1010168 20318 20416 W FullBackup: 	at android.app.backup.BackupTransport$TransportImpl.cancelFullBackup(BackupTransport.java:910)
06-18 10:12:03.024 1010168 20318 20416 W FullBackup: 	at com.android.internal.backup.IBackupTransport$Stub.onTransact(IBackupTransport.java:786)
06-18 10:12:03.024 1010168 20318 20416 W FullBackup: 	at android.os.Binder.execTransactInternal(Binder.java:1505)
06-18 10:12:03.024 1010168 20318 20416 W FullBackup: 	at android.os.Binder.execTransact(Binder.java:1444)
06-18 10:12:03.024 1010168 20318 20416 W FullBackup: Caused by: at.bitfire.dav4jvm.exception.NotFoundException: HTTP 404 
06-18 10:12:03.024 1010168 20318 20416 W FullBackup: 	at at.bitfire.dav4jvm.DavResource.checkStatus(DavResource.kt:583)
06-18 10:12:03.024 1010168 20318 20416 W FullBackup: 	at at.bitfire.dav4jvm.DavResource.checkStatus(DavResource.kt:565)
06-18 10:12:03.024 1010168 20318 20416 W FullBackup: 	at at.bitfire.dav4jvm.DavResource.delete(DavResource.kt:441)
06-18 10:12:03.024 1010168 20318 20416 W FullBackup: 	at at.bitfire.dav4jvm.DavResource.delete$default(DavResource.kt:429)
06-18 10:12:03.024 1010168 20318 20416 W FullBackup: 	at com.stevesoltys.seedvault.plugins.webdav.WebDavStoragePlugin.removeData(WebDavStoragePlugin.kt:152)
06-18 10:12:03.024 1010168 20318 20416 W FullBackup: 	... 17 more
06-18 10:12:03.052  1000  1581 24858 W PFTBT   : Error -1000 backing up org.cromite.cromite
06-18 10:12:03.056 1010168 20318 20416 I BackupCoordinator: Request full backup time. Returned 0
06-18 10:12:03.057  1000  1581 24858 I PFTBT   : Transport suggested backoff=0
06-18 10:12:03.057  1000  1581 24858 W PFTBT   : Transport failed; aborting backup: -1000
06-18 10:12:03.057 1010168 20318 20416 I NotificationBackupObserver: Completed. Target: org.cromite.cromite, status: -1000
06-18 10:12:03.058  1000  1581 24858 W ActivityManager: Unbinding backup agent with no active backup
06-18 10:12:03.058  1000  1581 24858 W ActivityManager: Process/uid not found attempting kill of org.cromite.cromite / 1010331
06-18 10:12:03.058  1000  1581 24858 I PFTBT   : Full backup completed with status: -1000
06-18 10:12:03.058 1010168 20318 20416 I NotificationBackupObserver: Backup finished 234/253. Status: -1000
06-18 10:12:03.061  1000  1581 25341 E BackupManagerService: Error backing up org.cromite.cromite: write failed: EPIPE (Broken pipe)
06-18 10:12:03.066 1010168 20318 20318 I BackupStateManager: appBackupRunning: false, filesBackupRunning: false, workInfoState: ENQUEUED
06-18 10:12:03.067 1010168 20318 20318 D ConfigurableBackupTransportService: Service destroyed.
@t-m-w t-m-w added the bug label Jun 18, 2024
@grote
Copy link
Collaborator

grote commented Jun 18, 2024

Looks like there's around 10ms the system gives us between calling our NotificationBackupObserver and killing our service. This doesn't seem to be always sufficient to reach the code path for cancelling the ongoing backup notification.

I only see two options:

  • try to cancel notification as soon as possible in NotificationBackupObserver
  • cancel it when our service gets killed, has disadvantage that notification might flicker for multi-step backups

@grote grote linked a pull request Jun 18, 2024 that will close this issue
@grote grote self-assigned this Jun 18, 2024
@grote grote added this to the 3.x milestone Jun 18, 2024
@grote
Copy link
Collaborator

grote commented Jun 20, 2024

Here we had 5ms:

06-20 11:18:36.755 1010168 30423 31899 I NotificationBackupObserver: Backup finished 253/253. Status: 0
06-20 11:18:36.760 1010168 30423 30423 I BackupStateManager: appBackupRunning: false, filesBackupRunning: false, workInfoState: ENQUEUED
06-20 11:18:36.760  1000  1581 22520 I PFTBT   : Full data backup pass finished.
06-20 11:18:36.760 1010168 30423 30423 I BackupStateManager: appBackupRunning: false, filesBackupRunning: false, workInfoState: ENQUEUED
06-20 11:18:36.760 1010168 30423 30423 D ConfigurableBackupTransportService: Service destroyed.
06-20 11:18:36.761  1000  1581 22520 V BackupManagerService: [UserID:10] Released wakelock:*backup*-10-23616

@grote
Copy link
Collaborator

grote commented Jul 2, 2024

Turns out that my theory that our process simply got killed fast is wrong. We can in fact keep running for quite some time after the backup transport service was destroyed. This makes it even more of a mystery why the notification sometimes stays around.

@grote
Copy link
Collaborator

grote commented Jul 2, 2024

New theory: In the same millisecond (above one ms before) we cancel the notification, we update the same notification with the status of the app backup (NotificationBackupObserver: Completed).

So maybe there's a race-condition between this cancel and the update, if the system NotificationManager is somehow doing these updates async. Need to check that system code to verify.

07-02 16:44:40.768  1320 12557 E BackupManagerService: Error backing up org.fdroid.basic: write failed: EPIPE (Broken pipe)
07-02 16:44:40.768  1320 12557 W ActivityManager: Unbinding backup agent with no active backup
07-02 16:44:40.769  1320 12557 W ActivityManager: Process/uid not found attempting kill of org.fdroid.basic / 10210
07-02 16:44:40.775  1320 12140 I PFTBT   : Full data backup pass finished.
07-02 16:44:40.777  3917  3917 I BackupStateManager: appBackupRunning: false, filesBackupRunning: false, workInfoState: ENQUEUED
07-02 16:44:40.777  3917  7338 I NotificationBackupObserver: Completed. Target: org.fdroid.basic, status: -1000
07-02 16:44:40.777  3917  7338 E NotificationBackupObserver: CANCELLING NOTIFICATION
07-02 16:44:40.777  3917  3917 I BackupStateManager: appBackupRunning: false, filesBackupRunning: false, workInfoState: ENQUEUED
07-02 16:44:40.778  3917  7338 I NotificationBackupObserver: Backup finished 257/270. Status: -1000
07-02 16:44:40.778  3917  3917 D ConfigurableBackupTransportService: Service destroyed.
07-02 16:44:40.779  1320 12140 V BackupManagerService: [UserID:0] Released wakelock:*backup*-0-2657

@grote
Copy link
Collaborator

grote commented Jul 3, 2024

Comment from the code:

    // In enqueueNotificationInternal notifications are added by scheduling the
    // work on the worker handler. Hence, we also schedule the cancel on this
    // handler to avoid a scenario where an add notification call followed by a
    // remove notification call ends up in not removing the notification.

So this sounds like are aware of a potential race and took measures to avoid it.

Also, the callbacks into our NotificationBackupObserver are both on 3917 7338, so the order of update and cancel should be observed.

In the code, there are some conditions when the system doesn't cancel our notification. We can test for those and see if they are at fault.

@grote
Copy link
Collaborator

grote commented Jul 4, 2024

hacky debug code we used to check for notification flags:

        Log.e("TEST", "CANCEL NOTIFICATION!")
        val mustHaveFlags = 0
        val mustNotHaveFlags =
            (FLAG_FOREGROUND_SERVICE or
                Notification.FLAG_USER_INITIATED_JOB or
                Notification.FLAG_AUTOGROUP_SUMMARY or 0x00010000)
        nm.activeNotifications.forEach {
            if (it.packageName == context.packageName) {
                val flags = it.notification.flags
                Log.e(
                    "TEST",
                    "${it.id} ${it.isOngoing} ${it.postTime} ${it.notification.channelId}"
                )
                Log.e(
                    "TEST",
                    "$  mustHaveFlags $flags - ${(flags and mustHaveFlags) != mustHaveFlags}"
                )
                Log.e("TEST", "$  mustNotHaveFlags $flags  - ${(flags and mustNotHaveFlags) != 0}")
            }
        }

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants