Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ruby] Unable to instantiate a descriptor in v3.24.0 on 32-bit system #13481

Closed
stanhu opened this issue Aug 9, 2023 · 3 comments
Closed

[ruby] Unable to instantiate a descriptor in v3.24.0 on 32-bit system #13481

stanhu opened this issue Aug 9, 2023 · 3 comments
Assignees

Comments

@stanhu
Copy link
Contributor

stanhu commented Aug 9, 2023

What version of protobuf and what language are you using?

Version: v3.24.0
Language: Ruby

What operating system (Linux, Windows, ...) and version?

Linux on a Raspberry Pi 2 Debian image (armv6l)

What runtime / compiler are you using (e.g., python version or gcc version)

Ruby 3.0.5

What did you do?

Steps to reproduce the behavior:

docker run -it registry.gitlab.com/gitlab-org/gitlab-omnibus-builder/rpi_10:4.20.0 bash
gem install pg_query
ruby -r pg_query -e "puts 'done'"

What did you expect to see

The library loads without error with done. This works fine on v3.23.0.

What did you see instead?

/usr/local/lib/ruby/gems/3.0.0/gems/google-protobuf-3.24.0/lib/google/protobuf/descriptor_dsl.rb:295:in `method_missing': wrong argument type nil (expected Google::Protobuf::DescriptorPool) (TypeError)
        from /usr/local/lib/ruby/gems/3.0.0/gems/google-protobuf-3.24.0/lib/google/protobuf/descriptor_dsl.rb:295:in `initialize'
        from /usr/local/lib/ruby/gems/3.0.0/gems/google-protobuf-3.24.0/lib/google/protobuf/descriptor_dsl.rb:85:in `new'
        from /usr/local/lib/ruby/gems/3.0.0/gems/google-protobuf-3.24.0/lib/google/protobuf/descriptor_dsl.rb:85:in `add_message'
        from /usr/local/lib/ruby/gems/3.0.0/gems/pg_query-4.2.3/lib/pg_query/pg_query_pb.rb:8:in `block (2 levels) in <top (required)>'
        from /usr/local/lib/ruby/gems/3.0.0/gems/google-protobuf-3.24.0/lib/google/protobuf/descriptor_dsl.rb:42:in `instance_eval'
        from /usr/local/lib/ruby/gems/3.0.0/gems/google-protobuf-3.24.0/lib/google/protobuf/descriptor_dsl.rb:42:in `add_file'
        from /usr/local/lib/ruby/gems/3.0.0/gems/pg_query-4.2.3/lib/pg_query/pg_query_pb.rb:7:in `block in <top (required)>'
        from /usr/local/lib/ruby/gems/3.0.0/gems/google-protobuf-3.24.0/lib/google/protobuf/descriptor_dsl.rb:460:in `instance_eval'
        from /usr/local/lib/ruby/gems/3.0.0/gems/google-protobuf-3.24.0/lib/google/protobuf/descriptor_dsl.rb:460:in `build'
        from /usr/local/lib/ruby/gems/3.0.0/gems/pg_query-4.2.3/lib/pg_query/pg_query_pb.rb:6:in `<top (required)>'
        from <internal:/usr/local/lib/ruby/site_ruby/3.0.0/rubygems/core_ext/kernel_require.rb>:85:in `require'
        from <internal:/usr/local/lib/ruby/site_ruby/3.0.0/rubygems/core_ext/kernel_require.rb>:85:in `require'
        from /usr/local/lib/ruby/gems/3.0.0/gems/pg_query-4.2.3/lib/pg_query.rb:4:in `<top (required)>'
        from <internal:/usr/local/lib/ruby/site_ruby/3.0.0/rubygems/core_ext/kernel_require.rb>:159:in `require'
        from <internal:/usr/local/lib/ruby/site_ruby/3.0.0/rubygems/core_ext/kernel_require.rb>:159:in `rescue in require'
        from <internal:/usr/local/lib/ruby/site_ruby/3.0.0/rubygems/core_ext/kernel_require.rb>:39:in `require'
        ... 4 levels...
<internal:/usr/local/lib/ruby/site_ruby/3.0.0/rubygems/core_ext/kernel_require.rb>:85:in `require': cannot load such file -- pg_query (LoadError)
        from <internal:/usr/local/lib/ruby/site_ruby/3.0.0/rubygems/core_ext/kernel_require.rb>:85:in `require'
        from (irb):1:in `<main>'
        from /usr/local/lib/ruby/gems/3.0.0/gems/irb-1.3.5/exe/irb:11:in `<top (required)>'
        from /usr/local/bin/irb:23:in `load'
        from /usr/local/bin/irb:23:in `<main>'

This might just be an emulation issue, but I'm seeing this strange behavior when trying to debug this:

[293, 302] in /usr/local/lib/ruby/gems/3.0.0/gems/google-protobuf-3.24.0/lib/google/protobuf/descriptor_dsl.rb
   293:           require 'byebug'
   294:           byebug
   295:           @msg_proto = Google::Protobuf::DescriptorProto.new(
   296:             :name => name
   297:           )
=> 298:           file_proto.message_type << @msg_proto
   299:         end
   300:
   301:         def optional(name, type, number, type_class=nil, options=nil)
   302:           internal_add_field(:LABEL_OPTIONAL, name, type, number, type_class, options)
(byebug) @msg_proto
*Error in evaluation*
(byebug) step

[49, 58] in /usr/local/lib/ruby/gems/3.0.0/gems/google-protobuf-3.24.0/lib/google/protobuf/object_cache.rb
   49:         @map = ObjectSpace::WeakMap.new
   50:         @mutex = Mutex.new
   51:       end
   52:
   53:       def get(key)
=> 54:         @map[key]
   55:       end
   56:
   57:       def try_add(key, value)
   58:         @map[key] || @mutex.synchronize do
(byebug) n

[49, 58] in /usr/local/lib/ruby/gems/3.0.0/gems/google-protobuf-3.24.0/lib/google/protobuf/object_cache.rb
   49:         @map = ObjectSpace::WeakMap.new
   50:         @mutex = Mutex.new
   51:       end
   52:
   53:       def get(key)
=> 54:         @map[key]
   55:       end
   56:
   57:       def try_add(key, value)
   58:         @map[key] || @mutex.synchronize do
(byebug) n
*** No sourcefile available for <internal:/usr/local/lib/ruby/site_ruby/3.0.0/rubygems/core_ext/kernel_require.rb>
(byebug) n
*** No sourcefile available for <internal:/usr/local/lib/ruby/site_ruby/3.0.0/rubygems/core_ext/kernel_require.rb>
(byebug)

[581, 590] in /usr/local/lib/ruby/3.0.0/irb.rb
   581:           rescue Exception => exc
   582:           else
   583:             exc = nil
   584:             next
   585:           end
=> 586:           handle_exception(exc)
   587:           @context.workspace.local_variable_set(:_, exc)
   588:           exc = nil
   589:         end
   590:       end
(byebug) exc
#<TypeError: wrong argument type nil (expected Google::Protobuf::DescriptorPool)>

@fowles @casperisfine I think this is related to the object cache refactoring in #13204. Wondering if there is a pointer size issue here.

@stanhu stanhu added the untriaged auto added to all issues by default when created. label Aug 9, 2023
@fowles fowles added ruby 24.x and removed untriaged auto added to all issues by default when created. labels Aug 9, 2023
@stanhu
Copy link
Contributor Author

stanhu commented Aug 9, 2023

This diff appears to fix the issue:

diff --git a/ruby/ext/google/protobuf_c/protobuf.c b/ruby/ext/google/protobuf_c/protobuf.c
index c9354db4e..c575ffe4f 100644
--- a/ruby/ext/google/protobuf_c/protobuf.c
+++ b/ruby/ext/google/protobuf_c/protobuf.c
@@ -279,14 +279,14 @@ static void ObjectCache_Init(VALUE protobuf) {
 VALUE ObjectCache_TryAdd(const void *key, VALUE val) {
   VALUE key_val = (VALUE)key;
   PBRUBY_ASSERT((key_val & 3) == 0);
-  return rb_funcall(weak_obj_cache, item_try_add, 2, LL2NUM(key_val), val);
+  return rb_funcall(weak_obj_cache, item_try_add, 2, LL2NUM(key_val >> 2), val);
 }

 // Returns the cached object for this key, if any. Otherwise returns Qnil.
 VALUE ObjectCache_Get(const void *key) {
   VALUE key_val = (VALUE)key;
   PBRUBY_ASSERT((key_val & 3) == 0);
-  return rb_funcall(weak_obj_cache, item_get, 1, LL2NUM(key_val));
+  return rb_funcall(weak_obj_cache, item_get, 1, LL2NUM(key_val >> 2));
 }

 /*

This restores the previous behavior of ObjectCache_GetKey. I assume these lower bits are flags that can be discarded because the previous assertion ensures they are all zeros?

stanhu added a commit to stanhu/protobuf that referenced this issue Aug 9, 2023
protocolbuffers#13204 refactored the
Ruby object cache to use a key of `LL2NUM(key_val)` instead of
`LL2NUM(key_val >> 2)`. On 32-bit systems, it appears that
`LL2NUM(key_val)` returns inconsistent results, possibly due to
overflow. This causes cache lookups to fail. This commit restores the
previous behavior of using `ObjectCache_GetKey`, which discards the
lower 2 bits, which are zero.

Closes protocolbuffers#13481
stanhu added a commit to stanhu/protobuf that referenced this issue Aug 9, 2023
protocolbuffers#13204 refactored the
Ruby object cache to use a key of `LL2NUM(key_val)` instead of
`LL2NUM(key_val >> 2)`. On 32-bit systems, LL2NUM(key_val) returns
inconsistent results because a large value has to be stored as a
Bignum on the heap. This causes cache lookups to fail.

This commit restores the previous behavior of using
`ObjectCache_GetKey`, which discards the lower 2 bits, which are zero.
This enables a key to be stored as a Fixnum on both 32 and 64-bit
platforms.

Closes protocolbuffers#13481
@stanhu
Copy link
Contributor Author

stanhu commented Aug 10, 2023

I realize after updating #13494 that this problem may also show up on a 64-bit system if the VALUE address doesn't fit in a Fixnum because it starts with 0xff.

stanhu added a commit to stanhu/protobuf that referenced this issue Aug 11, 2023
protocolbuffers#13204 refactored the
Ruby object cache to use a key of `LL2NUM(key_val)` instead of
`LL2NUM(key_val >> 2)`. On 32-bit systems, `LL2NUM(key_val)` returns
inconsistent results because a large value has to be stored as a
Bignum on the heap. This causes cache lookups to fail.

This commit restores the previous behavior of using
`ObjectCache_GetKey`, which discards the lower 2 bits, which are
zero. This enables a key to be stored as a Fixnum on both 32 and
64-bit platforms.

As https://patshaughnessy.net/2014/1/9/how-big-is-a-bignum describes,
a Fixnum uses:

* 1 bit for the `FIXNUM_FLAG`.
* 1 bit for the sign bit.

Therefore the largest possible Fixnum value on a 64-bit value is
4611686018427387903 (2^62 - 1). On a 32-bit system, the largest value
is 1073741823 (2^30 - 1).

For example, a possible VALUE pointer address on a 32-bit system:

0xff5b4af8 => 4284173048

Dropping the lower 2 bits makes up for the loss of range to these
flags. In the example above, we see that shifting by 2 turns the value
into a 30-bit number, which can be represented as a Fixnum:

(0xff5b4af8 >> 2) => 1071043262

This bug can also manifest on a 64-bit system if the upper bits are 0xff.

Closes protocolbuffers#13481
copybara-service bot pushed a commit that referenced this issue Aug 15, 2023
#13204 refactored the Ruby object cache to use a key of `LL2NUM(key_val)` instead of `LL2NUM(key_val >> 2)`. On 32-bit systems, `LL2NUM(key_val)` returns inconsistent results because a large value has to be stored as a Bignum on the heap. This causes cache lookups to fail.

This commit restores the previous behavior of using `ObjectCache_GetKey`, which discards the lower 2 bits, which are zero. This enables a key to be stored as a Fixnum on both 32 and 64-bit platforms.

As https://patshaughnessy.net/2014/1/9/how-big-is-a-bignum describes, a Fixnum uses:

* 1 bit for the `FIXNUM_FLAG`.
* 1 bit for the sign flag.

Therefore the largest possible Fixnum value on a 64-bit value is 4611686018427387903 (2^62 - 1). On a 32-bit system, the largest value  is 1073741823 (2^30 - 1).

For example, a possible VALUE pointer address on a 32-bit system:

0xff5b4af8 => 4284173048

Dropping the lower 2 bits makes up for the loss of range to these flags. In the example above, we see that shifting by 2 turns the value into a 30-bit number, which can be represented as a Fixnum:

(0xff5b4af8 >> 2) => 1071043262

This bug can also manifest on a 64-bit system if the upper bits are 0xff.

Closes #13481

Closes #13494

COPYBARA_INTEGRATE_REVIEW=#13494 from stanhu:sh-fix-ruby-protobuf-32bit d63122a
FUTURE_COPYBARA_INTEGRATE_REVIEW=#13494 from stanhu:sh-fix-ruby-protobuf-32bit d63122a
PiperOrigin-RevId: 557189479
copybara-service bot pushed a commit that referenced this issue Aug 15, 2023
#13204 refactored the Ruby object cache to use a key of `LL2NUM(key_val)` instead of `LL2NUM(key_val >> 2)`. On 32-bit systems, `LL2NUM(key_val)` returns inconsistent results because a large value has to be stored as a Bignum on the heap. This causes cache lookups to fail.

This commit restores the previous behavior of using `ObjectCache_GetKey`, which discards the lower 2 bits, which are zero. This enables a key to be stored as a Fixnum on both 32 and 64-bit platforms.

As https://patshaughnessy.net/2014/1/9/how-big-is-a-bignum describes, a Fixnum uses:

* 1 bit for the `FIXNUM_FLAG`.
* 1 bit for the sign flag.

Therefore the largest possible Fixnum value on a 64-bit value is 4611686018427387903 (2^62 - 1). On a 32-bit system, the largest value  is 1073741823 (2^30 - 1).

For example, a possible VALUE pointer address on a 32-bit system:

0xff5b4af8 => 4284173048

Dropping the lower 2 bits makes up for the loss of range to these flags. In the example above, we see that shifting by 2 turns the value into a 30-bit number, which can be represented as a Fixnum:

(0xff5b4af8 >> 2) => 1071043262

This bug can also manifest on a 64-bit system if the upper bits are 0xff.

Closes #13481

Closes #13494

COPYBARA_INTEGRATE_REVIEW=#13494 from stanhu:sh-fix-ruby-protobuf-32bit d63122a
FUTURE_COPYBARA_INTEGRATE_REVIEW=#13494 from stanhu:sh-fix-ruby-protobuf-32bit d63122a
PiperOrigin-RevId: 557189479
copybara-service bot pushed a commit that referenced this issue Aug 15, 2023
#13204 refactored the Ruby object cache to use a key of `LL2NUM(key_val)` instead of `LL2NUM(key_val >> 2)`. On 32-bit systems, `LL2NUM(key_val)` returns inconsistent results because a large value has to be stored as a Bignum on the heap. This causes cache lookups to fail.

This commit restores the previous behavior of using `ObjectCache_GetKey`, which discards the lower 2 bits, which are zero. This enables a key to be stored as a Fixnum on both 32 and 64-bit platforms.

As https://patshaughnessy.net/2014/1/9/how-big-is-a-bignum describes, a Fixnum uses:

* 1 bit for the `FIXNUM_FLAG`.
* 1 bit for the sign flag.

Therefore the largest possible Fixnum value on a 64-bit value is 4611686018427387903 (2^62 - 1). On a 32-bit system, the largest value  is 1073741823 (2^30 - 1).

For example, a possible VALUE pointer address on a 32-bit system:

0xff5b4af8 => 4284173048

Dropping the lower 2 bits makes up for the loss of range to these flags. In the example above, we see that shifting by 2 turns the value into a 30-bit number, which can be represented as a Fixnum:

(0xff5b4af8 >> 2) => 1071043262

This bug can also manifest on a 64-bit system if the upper bits are 0xff.

Closes #13481

Closes #13494

COPYBARA_INTEGRATE_REVIEW=#13494 from stanhu:sh-fix-ruby-protobuf-32bit d63122a
FUTURE_COPYBARA_INTEGRATE_REVIEW=#13494 from stanhu:sh-fix-ruby-protobuf-32bit d63122a
PiperOrigin-RevId: 557216800
@esrauchg
Copy link

I merged the proposed changes. Thank you @stanhu for including the new test config as well!

zhangskz pushed a commit that referenced this issue Aug 17, 2023
#13204 refactored the Ruby object cache to use a key of `LL2NUM(key_val)` instead of `LL2NUM(key_val >> 2)`. On 32-bit systems, `LL2NUM(key_val)` returns inconsistent results because a large value has to be stored as a Bignum on the heap. This causes cache lookups to fail.

This commit restores the previous behavior of using `ObjectCache_GetKey`, which discards the lower 2 bits, which are zero. This enables a key to be stored as a Fixnum on both 32 and 64-bit platforms.

As https://patshaughnessy.net/2014/1/9/how-big-is-a-bignum describes, a Fixnum uses:

* 1 bit for the `FIXNUM_FLAG`.
* 1 bit for the sign flag.

Therefore the largest possible Fixnum value on a 64-bit value is 4611686018427387903 (2^62 - 1). On a 32-bit system, the largest value  is 1073741823 (2^30 - 1).

For example, a possible VALUE pointer address on a 32-bit system:

0xff5b4af8 => 4284173048

Dropping the lower 2 bits makes up for the loss of range to these flags. In the example above, we see that shifting by 2 turns the value into a 30-bit number, which can be represented as a Fixnum:

(0xff5b4af8 >> 2) => 1071043262

This bug can also manifest on a 64-bit system if the upper bits are 0xff.

Closes #13481

Closes #13494

COPYBARA_INTEGRATE_REVIEW=#13494 from stanhu:sh-fix-ruby-protobuf-32bit d63122a
PiperOrigin-RevId: 557211768
esrauchg pushed a commit that referenced this issue Aug 17, 2023
#13204 refactored the Ruby object cache to use a key of `LL2NUM(key_val)` instead of `LL2NUM(key_val >> 2)`. On 32-bit systems, `LL2NUM(key_val)` returns inconsistent results because a large value has to be stored as a Bignum on the heap. This causes cache lookups to fail.

This commit restores the previous behavior of using `ObjectCache_GetKey`, which discards the lower 2 bits, which are zero. This enables a key to be stored as a Fixnum on both 32 and 64-bit platforms.

As https://patshaughnessy.net/2014/1/9/how-big-is-a-bignum describes, a Fixnum uses:

* 1 bit for the `FIXNUM_FLAG`.
* 1 bit for the sign flag.

Therefore the largest possible Fixnum value on a 64-bit value is 4611686018427387903 (2^62 - 1). On a 32-bit system, the largest value  is 1073741823 (2^30 - 1).

For example, a possible VALUE pointer address on a 32-bit system:

0xff5b4af8 => 4284173048

Dropping the lower 2 bits makes up for the loss of range to these flags. In the example above, we see that shifting by 2 turns the value into a 30-bit number, which can be represented as a Fixnum:

(0xff5b4af8 >> 2) => 1071043262

This bug can also manifest on a 64-bit system if the upper bits are 0xff.

Closes #13481

Closes #13494

COPYBARA_INTEGRATE_REVIEW=#13494 from stanhu:sh-fix-ruby-protobuf-32bit d63122a
PiperOrigin-RevId: 557211768
zhangskz pushed a commit that referenced this issue Aug 17, 2023
#13204 refactored the Ruby object cache to use a key of `LL2NUM(key_val)` instead of `LL2NUM(key_val >> 2)`. On 32-bit systems, `LL2NUM(key_val)` returns inconsistent results because a large value has to be stored as a Bignum on the heap. This causes cache lookups to fail.

This commit restores the previous behavior of using `ObjectCache_GetKey`, which discards the lower 2 bits, which are zero. This enables a key to be stored as a Fixnum on both 32 and 64-bit platforms.

As https://patshaughnessy.net/2014/1/9/how-big-is-a-bignum describes, a Fixnum uses:

* 1 bit for the `FIXNUM_FLAG`.
* 1 bit for the sign flag.

Therefore the largest possible Fixnum value on a 64-bit value is 4611686018427387903 (2^62 - 1). On a 32-bit system, the largest value  is 1073741823 (2^30 - 1).

For example, a possible VALUE pointer address on a 32-bit system:

0xff5b4af8 => 4284173048

Dropping the lower 2 bits makes up for the loss of range to these flags. In the example above, we see that shifting by 2 turns the value into a 30-bit number, which can be represented as a Fixnum:

(0xff5b4af8 >> 2) => 1071043262

This bug can also manifest on a 64-bit system if the upper bits are 0xff.

Closes #13481

Closes #13494

COPYBARA_INTEGRATE_REVIEW=#13494 from stanhu:sh-fix-ruby-protobuf-32bit d63122a
PiperOrigin-RevId: 557211768

Co-authored-by: Stan Hu <stanhu@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment