Use RUBY_ASSERT when available #12216

casperisfine · 2023-03-13T12:05:31Z

It should be present from Ruby 2.7 onwards.

The advantage over a raw assert is that it will trigger MRI's crash report on failure, making it much easier to debug at a glance.

For context I'm trying to debug an assertion failure we've been seeing on Ruby nigthly builds:

ruby: protobuf.c:404: ObjectCache_Add: Assertion `ObjectCache_Get(key) == val' failed.

It should be present from Ruby 2.7 onwards. The advantage over a raw `assert` is that it will trigger MRI's crash report on failure, making it much easier to debug at a glance. For context I'm trying to debug an assertion failure we've been seeing on Ruby nigthly builds: ``` ruby: protobuf.c:404: ObjectCache_Add: Assertion `ObjectCache_Get(key) == val' failed. ```

haberman · 2023-03-13T19:49:17Z

Looks like this PR triggers the same assertion failure you are referring to:

Assertion Failed: ruby/ext/google/protobuf_c/protobuf.c:406:ObjectCache_Add:ObjectCache_Get(key) == val

casperisfine · 2023-03-13T21:14:30Z

Oh, that's very interesting. I was suspecting it was because I recently added write barriers to WeakMap, but if it also fail on 2.7, then that's not it.

On our CI it failed in a bout 3% of workers, so is suspect it's caused by GC triggering at a very specifc place, I may be able to repro with GC.stress = true.

casperisfine · 2023-03-13T21:17:53Z

I was indeed able to reproduce on first try with:

diff --git a/conformance/conformance_ruby.rb b/conformance/conformance_ruby.rb
index 0fa43391d..bf9c00d15 100755
--- a/conformance/conformance_ruby.rb
+++ b/conformance/conformance_ruby.rb
@@ -31,7 +31,10 @@
 # OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
 
 require 'conformance/conformance_pb'
+p [:here]
+GC.stress = true
 require 'google/protobuf/test_messages_proto3_pb'
+GC.stress = false
 require 'google/protobuf/test_messages_proto2_pb'
 
 $test_count = 0
diff --git a/ruby/ext/google/protobuf_c/protobuf.c b/ruby/ext/google/protobuf_c/protobuf.c
index bee8ac288..bc318366a 100644
--- a/ruby/ext/google/protobuf_c/protobuf.c
+++ b/ruby/ext/google/protobuf_c/protobuf.c
@@ -403,6 +403,9 @@ void ObjectCache_Add(const void *key, VALUE val) {
 #if USE_SECONDARY_MAP
   rb_mutex_unlock(secondary_map_mutex);
 #endif
+  if (ObjectCache_Get(key) != val) {
+      rb_bug("ObjectCache_Get(key) != val");
+  }
   PBRUBY_ASSERT(ObjectCache_Get(key) == val);
 }

casperisfine · 2023-03-13T22:28:07Z

I have a good repro loop, but I'm having a hard time figuring out why it's happening.

Clearly either val isn't added to the WeakMap, or it's ejected right after, but that can't possibly be via the finalizer since the object is still alive.

What is certain is that it happens frequently with GC.stress.

This may be a bug in Ruby. I'll try to find some time to dig deeper with @peterzhu2118

haberman · 2023-03-13T22:38:57Z

I wonder if we might need an RB_GC_GUARD(val): https://docs.ruby-lang.org/en/3.0/extension_rdoc.html#label-Appendix+E.+RB_GC_GUARD+to+protect+from+premature+GC

casperisfine · 2023-03-14T07:16:35Z

I wonder if we might need an RB_GC_GUARD(val)

That was my initial thought, but no, val isn't GCed, it's either not added to the WeakMap, or evicted:

but that can't possibly be via the finalizer since the object is still alive.

I tried nonetheless just to be sure, and it's still failing.

At this stage I suspect it might be a bug in WeakMap, as it uses two st_tables: https://github.com/ruby/ruby/blob/ac65ce16e904695ba45888d3fba641d12caf733a/weakmap.c#L396-L397, so perhaps it fails if GC triggers in the middle.

I'll try to reproduce this in a standalone script.

casperisfine · 2023-03-14T07:33:16Z

I'll try to reproduce this in a standalone script.

Seems like I can't repro, in isolation :/

casperisfine · 2023-03-14T07:38:15Z

I found an unrelated crash though -_-

wm = ObjectSpace::WeakMap.new
GC.stress = true
GC.auto_compact = true
obj = Object.new
1000.times do |i|
  wm[i] = obj #  [BUG] Segmentation fault at 0x0000000000000003
  raise "wat" unless wm[i].equal?(obj)
end

casperisfine · 2023-03-15T17:43:17Z

Ok, so I have a good news and a bad news.

The good new is that I have figured out the bug.

The bad news is that it's a bug in Ruby. Here's the reproduction:

wmap = ObjectSpace::WeakMap.new

a = "A"
b = "B"

wmap[1] = a
wmap[1] = b # the table entry with 1 is still in the list of entries to clear when `a` is GCed

a = nil

GC.start

p wmap[1] # Should be `"B"`, but is `nil`

The bug is here: https://github.com/ruby/ruby/blob/f613c18912f4bf5492a8dc02ab5e3661f4ae907c/weakmap.c#L193-L194

When an object is added in a WeakMap as a value, the weak map keeps a list of all keys associated with that object, e.g.

class WeakMap
  def []=(key, value)
    @hash[key] = value
    @reverse[value] << key
  end
end

However when we store another value for that given key, that reverse list isn't updated, so when the original value is GCed, it delete the new value too.

I think I can fix it upstream, but no idea how to make this protobuf code work on versions that won't receive a backported fix.

casperisfine · 2023-03-15T17:54:54Z

Upstream bug report: https://bugs.ruby-lang.org/issues/19531

haberman · 2023-03-15T20:20:29Z

Excellent work on the repro. Thanks so much for tracking this down, filing the upstream bug, and working on a fix!

As a side note, it would be so nice if WeakMap was capable of making only the values weak, but not the keys. Then we could remove the code that uses a secondary map.

Any chance this could be added to WeakMap upstream?

casperisfine · 2023-03-16T07:21:47Z

Any chance this could be added to WeakMap upstream?

We discussed this last year: https://bugs.ruby-lang.org/issues/16038, WeakKeysMap is added in 3.3, we could ask for WeakValuesMap too, but ultimately that wouldn't simplify this project code base until support for Rubies older than 3.3 is dropped, given that this project still support 2.3, that would take a decade before you could cleanup anything.

At this stage I wonder if the simplest wouldn't be to implement you own WeakMap, which I don't think would be too hard, especially given that your keys are pointer types, so a simple st_table and finalizer could do.

haberman · 2023-03-16T14:41:54Z

At this stage I wonder if the simplest wouldn't be to implement you own WeakMap, which I don't think would be too hard

I always figured that WeakMap had privileged access to the GC, and that its efficiency depended on this. But it looks like it is just defining finalizers for every map element.

In that case, I agree we could just define our own weak map that makes only the values weak. It would probably be more efficient, because we could use plain void* as the key instead of converting it to a VALUE. And we could dispense with the secondary map entirely.

casperisfine · 2023-03-16T17:58:14Z

It would probably be more efficient, because we could use plain void* as the key instead of converting it to a VALUE

We could also potentially avoid the finalizer by clearing the map from the free callbacks. Not sure if worth it, I'd have to review a bit how many objects actually end up in that map. If it's all of them, it would be a nice improvements, as finalizers are a bit annoying, they pin objects, and basically disable the FREE_IMMEDIATELY flag.

casperisfine · 2023-03-16T17:58:39Z

It's not high priority for me, but I'll see if I can find some time to try that around next week.

haberman · 2023-03-16T18:26:23Z

We could also potentially avoid the finalizer by clearing the map from the free callbacks.

I was going to say that these objects don't currently have free callbacks. But I guess that RUBY_DEFAULT_FREE is still a free callback.

Is there any guarantee about how promptly the free callbacks will be called? If the free() was delayed, it could run after a different insert had already happened, and delete the map entry inappropriately.

haberman · 2023-03-16T18:28:33Z

btw, an alternative to st_table would be to use upb's internal hash table: https://github.com/protocolbuffers/upb/blob/main/upb/hash/int_table.h

It could make us less dependent on the internals of Ruby, if that is a possible concern.

casperisfine · 2023-03-16T19:31:47Z

I guess that RUBY_DEFAULT_FREE is still a free callback.

Yes. And even if they didn't have one, it wouldn't be a problem, we could just define one without any worry.

Is there any guarantee about how promptly the free callbacks will be called?

If RUBY_TYPED_FREE_IMMEDIATELY is set, and the object has no finalizer defined, then it is freed as soon as it's collected.

If either the flag isn't set or a finalizer is set, then the callback is delayed to the dedicated "sweep" phase.

But either way we should be able to handle "race conditions" here, by cheking the map entry still points the GCed object.

The problem is that the free callback only receive the pointer to the C struct, not the VALUE reference, so it may be tricky and not worth it.

It could make us less dependent on the internals of Ruby, if that is a possible concern.

At this stage, ruby/st.h is public API, tons of gems depends on it, it's not gonna go away any time soon.

But either way, this is all conjecture, I'll have to dig more into the gem to figure what would be the most practical (finalizer vs free callback, ruby st_table vs upb int_table).

haberman · 2023-03-17T14:49:49Z

The problem is that the free callback only receive the pointer to the C struct, not the VALUE reference, so it may be tricky and not worth it.

The C struct contains the pointer to the upb object which is the cache key to the weak map, so I don't think this should be a problem.

casperisfine · 2023-04-03T10:07:19Z

Ok, so I tried implementing a custom weakmap, however It appears it's not possible, as we'd need access to rb_objspace_garbage_object_p to know the reference is scheduled to be swept.

casperisfine · 2023-04-03T10:07:47Z

ruby/ext/google/protobuf_c/repeated_field.c

@@ -432,7 +432,7 @@ static VALUE RepeatedField_dup(VALUE _self) {
 */
 VALUE RepeatedField_to_ary(VALUE _self) {
  RepeatedField* self = ruby_to_RepeatedField(_self);
-  int size = upb_Array_Size(self->array);
+  int size = self->array ? upb_Array_Size(self->array) : 0;


This I'll submit individually, with similar fixes.

Did you get a chance to submit a separate PR for this? I didn't see one.

What was the root cause that led you to this fix? I ask because it looks very similar to what I've been debugging in #11968

There is a similar symptom; self->array ends up being NULL. But in that case it's happening in RepeatedField_each. I've isolated it to a race condition involving the object cache.

Hum, I was working on it at some point but never submitted it. I think at some point I assumed this case wasn't supposed to happen and was just a fallout of another bug, but I might have been wrong not sure.

I can submit a PR if you want.

I don't think this case is supposed to happen, and I do think it is the fallout of another bug. I think this is probably not the right fix.

But I am curious what triggered this bug for you. I'm wondering if it's a similar root cause to #11968 (a race condition on the object cache) or something else.

casperisfine · 2023-04-03T10:09:06Z

As for the Ruby bug, it's fixed upstream but hasn't been backported yet unfortunately.

casperisfine requested a review from a team as a code owner March 13, 2023 12:05

casperisfine requested review from esorot and removed request for a team March 13, 2023 12:05

casperisfine force-pushed the use-ruby-assert branch from 8d38656 to cb3b121 Compare March 13, 2023 12:09

sbenzaquen assigned haberman Mar 13, 2023

haberman added the 🅰️ safe for tests Mark a commit as safe to run presubmits over label Mar 13, 2023

github-actions bot removed the 🅰️ safe for tests Mark a commit as safe to run presubmits over label Mar 13, 2023

Repro for ObjectCache_Get(key) != val"

e834ad3

casperisfine force-pushed the use-ruby-assert branch from a732a44 to e834ad3 Compare March 13, 2023 22:26

casperisfine commented Apr 3, 2023

View reviewed changes

casperisfine closed this Apr 3, 2023

haberman mentioned this pull request May 25, 2023

ruby: segfault iterating over protobuf objects #11968

Closed

casperisfine mentioned this pull request May 25, 2023

ruby: Don't assume RepeatedField->array is not NULL #12899

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use RUBY_ASSERT when available #12216

Use RUBY_ASSERT when available #12216

casperisfine commented Mar 13, 2023

haberman commented Mar 13, 2023

casperisfine commented Mar 13, 2023

casperisfine commented Mar 13, 2023

casperisfine commented Mar 13, 2023

haberman commented Mar 13, 2023

casperisfine commented Mar 14, 2023

casperisfine commented Mar 14, 2023

casperisfine commented Mar 14, 2023

casperisfine commented Mar 15, 2023

casperisfine commented Mar 15, 2023

haberman commented Mar 15, 2023

casperisfine commented Mar 16, 2023

haberman commented Mar 16, 2023

casperisfine commented Mar 16, 2023

casperisfine commented Mar 16, 2023

haberman commented Mar 16, 2023

haberman commented Mar 16, 2023

casperisfine commented Mar 16, 2023

haberman commented Mar 17, 2023

casperisfine commented Apr 3, 2023

casperisfine Apr 3, 2023

haberman May 25, 2023

casperisfine May 25, 2023

haberman May 25, 2023

casperisfine commented Apr 3, 2023

Use RUBY_ASSERT when available #12216

Use RUBY_ASSERT when available #12216

Conversation

casperisfine commented Mar 13, 2023

haberman commented Mar 13, 2023

casperisfine commented Mar 13, 2023

casperisfine commented Mar 13, 2023

casperisfine commented Mar 13, 2023

haberman commented Mar 13, 2023

casperisfine commented Mar 14, 2023

casperisfine commented Mar 14, 2023

casperisfine commented Mar 14, 2023

casperisfine commented Mar 15, 2023

casperisfine commented Mar 15, 2023

haberman commented Mar 15, 2023

casperisfine commented Mar 16, 2023

haberman commented Mar 16, 2023

casperisfine commented Mar 16, 2023

casperisfine commented Mar 16, 2023

haberman commented Mar 16, 2023

haberman commented Mar 16, 2023

casperisfine commented Mar 16, 2023

haberman commented Mar 17, 2023

casperisfine commented Apr 3, 2023

casperisfine Apr 3, 2023

Choose a reason for hiding this comment

haberman May 25, 2023

Choose a reason for hiding this comment

casperisfine May 25, 2023

Choose a reason for hiding this comment

haberman May 25, 2023

Choose a reason for hiding this comment

casperisfine commented Apr 3, 2023