Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Disable the automatic snapshots for TestBug297635 #623

Conversation

fedejeanne
Copy link
Contributor

@fedejeanne fedejeanne commented Aug 4, 2023

Automatic snapshots might cause TestBug297635::testBug to fail if the snapshot is taken right before assertStateTreesIsNotNull. The reason for this is that a by taking a snapshot, org.eclipse.core.internal.resources.SaveManager.forgetSavedTree(String) will be called and it will delete the state-trees.

This PR disables the automatic snapshots while running tests in TestBug297635 to avoid such random failures.

Fixes #460

@@ -28,6 +31,7 @@ public class DelayedSnapshotJob extends Job {
private static final String MSG_SNAPSHOT = Messages.resources_snapshot;
private SaveManager saveManager;
private Workspace workspace;
private boolean suspended;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this must be volatile

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You might make this an integer and just increment and decrement it when suspend or resume are being called. This would relieve existing client code from checking the isSuspended state before resuming again. isSuspended might become obsolete even.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made it an AtomicInteger, thanks for the tips to you both!

@@ -47,6 +51,10 @@ public IStatus run(IProgressMonitor monitor) {
if (!workspace.isOpen()) {
return Status.OK_STATUS;
}
if (suspended) {
System.out.println("Automatic snapshots are suspended... skipping"); //$NON-NLS-1$
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and system.out must be removed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, "debug code". Thank you for noticing!

@fedejeanne fedejeanne force-pushed the bugs/460-suspend-automatic-snapshots-when-running-TestBug297635 branch 2 times, most recently from 97869ab to a6c7df4 Compare August 4, 2023 14:29
@fedejeanne fedejeanne marked this pull request as ready for review August 4, 2023 14:32
@github-actions
Copy link
Contributor

github-actions bot commented Aug 4, 2023

Test Results

       42 files  ±0         42 suites  ±0   50m 40s ⏱️ - 2m 59s
  3 760 tests ±0    3 756 ✔️ ±0    4 💤 ±0  0 ±0 
11 283 runs  ±0  11 253 ✔️ ±0  30 💤 ±0  0 ±0 

Results for commit 1b35a80. ± Comparison against base commit 3e3f88d.

♻️ This comment has been updated with latest results.

@fedejeanne fedejeanne force-pushed the bugs/460-suspend-automatic-snapshots-when-running-TestBug297635 branch 2 times, most recently from ac37216 to efa6b8b Compare August 7, 2023 04:40
Copy link
Contributor

@HeikoKlare HeikoKlare left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have some general remarks to this proposal:

  • The DelayedSnapshotJob is completely hidden by the SaveManager. The new API exposes that there is some snapshot job running, such that the user has to know about all potential (side) effects of a resume and suspend operation. Currently, these effects are not documented.
  • I am not sure whether the changes lead to consistent and expectable behavior w.r.t. snapshots. For example...
    • Shouldn't the resume also at least schedule the job if a snapshot was requested in between?
    • Shouldn't the SaveManager.shutdown still write a snapshot? It uses the snapshop job to do so, which was okay by now as the job was completely hidden/managed by the class. By now it can be influenced from outside, which has an unintended side effect on the shutdown behavior.

All in all, I am not sure if a single test is worth exposing internals of the SaveManager with all the efforts for documenting and adapting behavior accordingly. At least, I would expect a consistent, documented behavior of the extended SaveManager.
Are there further scenarios where disabling the snapshot job would be required, such that these changes are justified by being an enabler for further things?

Comment on lines 65 to 83
/**
* Suspend automatic snapshots until {@link #resume()} is called.
*
* @see #resume()
*/
void suspend() {
suspended.incrementAndGet();
}

/**
* Resume automatic snapshots. This method does not schedule this Job.
*
* @see #suspend()
*/
void resume() {
if (suspended.decrementAndGet() < 0) {
suspended.set(0);
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The documentation is inconsistent with the behavior. I would actually expect the behavior as specified in the Javadoc and not that resume has to be called as many times as suspend has been called to actually resume. If a caller really wants to resume, then they would call resume as long as isSuspended returns true. And whoever else has suspended the job before will then call resume without any effect.
I see that this conflicts with your proposal to use an integer instead of a boolean for the suspended state, @szarnekow (#623 (comment)), but I am not sure whether that behavior (even if properly documented) is what you expect from suspend and resume methods. The behavior of suspend/resume methods in the SaveManager would then, for example, be different from behavior of suspend/resume methods in the JobManager.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. Sorry about that, I didn't consider the semantics of JobManager.suspend / resume. I was more thinking about the usage pattern where a boolean was stored to call / not call resume again. I think that's highly confusing and prone to invalid state in case multiple threads use these methods.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, I haven't thought about the behavior of suspend / resume in the JobManager, only about nesting calls like the ones the prepare.../begin.../endOperation in Workspace.

Comment on lines 2209 to 2232
public void suspendSnapshotJob() {
snapshotJob.suspend();
}

public void resumeSnapshotJob() {
snapshotJob.resume();
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These methods should be documented, in particular when the behavior of the methods in the DelayedSnapshotJob is supposed to stay as it is now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These methods might not survive, see my proposal below

Disable the automatic snapshots while running tests in TestBug297635 to
avoid random failures.

Fixes eclipse-platform#460
@fedejeanne fedejeanne force-pushed the bugs/460-suspend-automatic-snapshots-when-running-TestBug297635 branch from efa6b8b to 1b35a80 Compare August 8, 2023 11:21
@fedejeanne
Copy link
Contributor Author

  * Shouldn't the `resume` also at least schedule the job if a snapshot was requested in between?

Good point, the current implementation would have taken the snapshot the next time an operation finished and snapshotIfNeeded had been called. I refactored the code and called (a new version of) snapshotIfNeeded inside resumeSnapshotJob to take a belated snapshot in case one had been requested in between.

  * Shouldn't the `SaveManager.shutdown` still write a snapshot? It uses the snapshop job to do so, which was okay by now as the job was completely hidden/managed by the class. By now it can be influenced from outside, which has an unintended side effect on the shutdown behavior.

I wouldn't force that. I think that if it's clear that snapshots are disabled then they will be expected to be disabled all the way, even during shutdown.

Are there further scenarios where disabling the snapshot job would be required, such that these changes are justified by being an enabler for further things?

This is the only case I found for now but I haven't looked for other possible scenarios. I assume there aren't many though.

All in all, I am not sure if a single test is worth exposing internals of the SaveManager with all the efforts for documenting and adapting behavior accordingly.

I see your point: it might be too risky to allow to allow suspending snapshots without ever making sure the original state of the SaveManager is restored. What would you say if I change the current implementation and offer a method like this one?:

public void runWithSnapshotsDisabled(Runnable runnable) {
	snapshotJob.suspend();
	try {
		runnable.run();
	} finally {
		snapshotJob.resume();
		snapshotIfNeeded();
	}
}

This would still expose the fact that there are snapshots and that they can be deactivated but they could only be deactivated in a controlled environment and the caller wouldn't even have the chance to "forget" activating them afterwards. It would also make it clear in the stack-trace that snapshot have been temporarily deactivated, which would help debugging.

@HeikoKlare
Copy link
Contributor

When considering the issue rather locally, I would prefer the solution with a scoped disablement over the suspend and resume methods. However, there are some things to consider even with that solution:

  • Suspending the snapshot job does not disable snapshots. There are different ways of taking snapshots. The job is only responsible for taking them once in a while in case no one explicitly performs a save operation (whether in snapshot mode, full save mode, or something else)
  • The scoped disablement has to ensure that is does not fail through conflicting/concurrent calls, i.e., the proposed solution would be too simple as the implicit contract of the method is not fulfilled when performing concurrent calls to it.

Still, I am still not in favor of providing public API that is currently only justified by fixing tests. So I would ask a different question here: instead of "how can we change the API to make the test run?" I would ask "what is the current design flaw that seems to make it impossible to write a proper test?". From a design perspective, it looks to me like "auto-snapshots" are a kind of trait of the SaveManager, which is currently not properly separated from the SaveManager implementation (and thus cannot be disabled easily as it is integrated rather deeply at different places). So maybe the "correct" solution is to think of a proper separation of that trait from the SaveManager. At least in terms of software design and testability that might be the best solution, if though it may not be one that can be achieved with limited effort.

@fedejeanne
Copy link
Contributor Author

I see your point. Do you have any suggestions on how to separate that treat without changing the internal public APIs?

I thought of maybe overloading the constructors of SaveManager and Workspace with new protected constructors to be able to inject my own implementations of DelayedSnapshotJob and SaveManager, but that has a few downsides:

  1. protected is IMO another way of saying "is not as public a public but you can still change it"
  2. One needs to hack its way into the LC of the whole workspace to be even able to replace the objects (and this could also open another door which can also be easily abused by other methods).
  3. Separating that treat only for a test might be as much an overkill as offering a new internal public API and it would be much more cumbersome, wouldn't you agree?

What do you think about using reflection to replace the DelayedSnapshotJob with a NO-OP version of it during tests? I would put the method in a TestUtil class, making it only accessible for tests.

The scoped disablement has to ensure that is does not fail through conflicting/concurrent calls, i.e., the proposed solution would be too simple as the implicit contract of the method is not fulfilled when performing concurrent calls to it.

Could you please elaborate?

@HeikoKlare
Copy link
Contributor

You will probably need to change the API anyway, but since it's internal you are basically able to change it. The problem I currently see is that functionality of the snapshot job (maybe even the whole snapshot management) is woven into the SaveManager, which makes the originally proposed change difficult because the side effects of suspending the job are unclear. My comment is not really about how the API looks like (may be that suspend and resume methods for taking snapshots are fine), but about how that API is realized.

Separating that treat only for a test might be as much an overkill as offering a new internal public API and it would be much more cumbersome, wouldn't you agree?

I don't see why it should be more cumbersome, because you have to check and test every place in the SaveManager using the snapshot job anyway, which becomes more easy when separating the functionality into another class implementing the trait.

I thought of maybe overloading the constructors of SaveManager and Workspace with new protected constructors to be able to inject my own implementations of DelayedSnapshotJob and SaveManager, but that has a few downsides:

What do you think about using reflection to replace the DelayedSnapshotJob with a NO-OP version of it during tests? I would put the method in a TestUtil class, making it only accessible for tests.

As I mentioned before, in my opinion the need to provide additional API for the test either reveals a broken test or a design flaw and providing some test-specific injection functionality rather looks like a patch for the symptoms than a solution for the root cause to me.

Actually, the original issue is not about providing some API but about a failing test. And looking at the test, it is in a real bad state (validating internal states using reflection). Even if disabling snapshots solves the issue for now, it will fail again when internal implementations change. I think the test could be fixed by rewriting it in a much easier way such that it tests what it is actually supposed to test: #460 (comment)
Maybe than the API extension even becomes obsolete.

@fedejeanne
Copy link
Contributor Author

Superseded by #737

@fedejeanne fedejeanne closed this Oct 13, 2023
@fedejeanne fedejeanne deleted the bugs/460-suspend-automatic-snapshots-when-running-TestBug297635 branch October 13, 2023 11:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Random failing of TestBug297635
4 participants