Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Buffer encoding #19

Merged
merged 7 commits into from
Jan 29, 2018
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion Gemfile
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ group :test do
end

group :doc do
gem "redcarpet"
gem "redcarpet", :platform => :mri
gem "yard"
end

Expand Down
10 changes: 6 additions & 4 deletions lib/http/form_data/composite_io.rb
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ class CompositeIO
# @param [Array<IO>] ios Array of IO objects
def initialize(ios)
@index = 0
@buffer = String.new
@buffer = "".b
@ios = ios.map do |io|
if io.is_a?(String)
StringIO.new(io)
Expand All @@ -29,14 +29,16 @@ def initialize(ios)
#
# @return [String, nil]
def read(length = nil, outbuf = nil)
outbuf = outbuf.to_s.replace("")
outbuf = outbuf.to_s.clear
Copy link
Member

@tarcieri tarcieri Dec 18, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Though this is likely a bug in the previous version as well, #to_s may make a new object, at which point the purpose of passing outbuf is defeated.

Perhaps give outbuf a default value (e.g. outbuf = "".b instead of outbuf = nil as the argument definition), and raise if it's not a String?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#to_s will not create a new object if the object is a string. But the replace call might change its encoding. That was the reasoning behind it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hence "may make a new object". We never want to make a new object for the buffer... that defeats the point.

How about raising if the supplied buffer is not a String type with Encoding::BINARY?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We never want to make a new object for the buffer... that defeats the point.

ahh, I see what you mean.

How about raising if the supplied buffer is ...

I'm still not sure. The encoding requirement is usually not mandatory, and the buffer passed may be a specialized buffer. Besides "fixing" jruby, I'm not even sure that forcing binary encoding there is the right thing to do.

By the way, what's the reasoning behind allowing length = nil? The length argument is usually required for IO#read and File#read.

Copy link
Member

@tarcieri tarcieri Dec 20, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the buffer passed may be a specialized buffer

In what case would it make sense for it to be anything but a String? Ultimately it needs to thunk to a Ruby I/O function which also accepts a preallocated buffer, and those all need to be a String unless JRuby has some fancy behavior I don't know about where it lets you pass a Java ByteBuffer or something.

Besides "fixing" jruby, I'm not even sure that forcing binary encoding there is the right thing to do.

For these types of buffers, I think mandating Encoding::BINARY is the only right option. They should really be a ByteArray/ByteBuffer type, but Ruby doesn't have those, so we need the next best thing, which is a String with Encoding::BINARY.

For other encodings there are all sorts of pathological performance implications, and the contract at this point in the code is to handle arbitrary binary data that may not be whatever-encoding-was-passed-in.

Given this is a rarely used power-user feature, I think it should have some pretty exacting semantics, which are:

  • Avoid all allocations. The whole point of this API is to avoid allocations. This means no #to_s or anything else that might potentially allocate memory
  • String is probably the only argument type that makes sense
  • Encoding::BINARY (i.e. Encoding::ASCII_8BIT) is probably the only sensible encoding for the buffer String. If you think other encodings should be supported, do you really think the answer to the question "Will outbuf always be a valid string under encoding X?" is yes?

By the way, what's the reasoning behind allowing length = nil?

No idea, like the outbuf being nil I think that probably doesn't make sense.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with this change, I'll try to answer the questions below.

# buffer in JRuby is sometimes US-ASCII, force to ASCII-8BIT
outbuf.force_encoding(Encoding::BINARY)

while current_io
current_io.read(length, @buffer)
outbuf << @buffer
outbuf << @buffer.force_encoding(Encoding::BINARY)

if length
length -= @buffer.length
length -= @buffer.bytesize
break if length.zero?
end

Expand Down
2 changes: 1 addition & 1 deletion lib/http/form_data/multipart/param.rb
Original file line number Diff line number Diff line change
Expand Up @@ -62,7 +62,7 @@ def self.coerce(data)
private

def header
header = String.new
header = "".b
header << "Content-Disposition: form-data; #{parameters}#{CRLF}"
header << "Content-Type: #{content_type}#{CRLF}" if content_type
header << CRLF
Expand Down