Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Encoding weirdness in .named_any scope #533

Closed
Fodoj opened this issue May 14, 2014 · 8 comments
Closed

Encoding weirdness in .named_any scope #533

Fodoj opened this issue May 14, 2014 · 8 comments
Labels

Comments

@Fodoj
Copy link
Contributor

Fodoj commented May 14, 2014

v. 3.2.1
ruby 2.0.0p353 (2013-11-22 revision 43784) [x86_64-darwin13.0.0]

There is some issue with encodings. In some cases encoding of tag is switched to ASCII, in other it's not and it gives an encoding exception explosion. Check it out:

# tag.rb
def self.named_any(list)
  if ActsAsTaggableOn.strict_case_match
    clause = list.map { |tag|
      sanitize_sql(["name = #{binary}?", as_8bit_ascii(tag)])
    }.join(' OR ')
    where(clause)
  else
    clause = list.map { |tag|
      puts as_8bit_ascii(unicode_downcase(tag))
      puts as_8bit_ascii(unicode_downcase(tag)).encoding
      puts sanitize_sql(['LOWER(name) = LOWER(?)', as_8bit_ascii(unicode_downcase(tag))])
      puts sanitize_sql(['SELECT (?)', as_8bit_ascii(unicode_downcase(tag))]).encoding
      puts "====="*10
      sanitize_sql(['LOWER(name) = LOWER(?)', as_8bit_ascii(unicode_downcase(tag))])
    }
    .join(' OR ')
    where(clause)
  end
end
=> str = ActsAsTaggableOn::Tag.named_any(["holä", "holä"])
=> holä
=> ASCII-8BIT
=> LOWER(name) = LOWER('holä')
=> ASCII-8BIT
=> ==================================================
=> holä
=> ASCII-8BIT
=> LOWER(name) = LOWER('holä')
=> ASCII-8BIT
### All is good, same encodings, but

=> str = ActsAsTaggableOn::Tag.named_any(["holä", "hol'ä"])
=> holä
=> ASCII-8BIT
=> LOWER(name) = LOWER('holä')
=> ASCII-8BIT
=> ==================================================
=> hol
=> ASCII-8BIT
=> LOWER(name) = LOWER('hol\'ä')
=> UTF-8 # BAM! Different encoding! And as a result, when next part of code join(' OR ')
is executed..
=> ... incompatible character encodings: ASCII-8BIT and UTF-8

So one solution that works is to do instead:

sanitize_sql(['LOWER(name) = LOWER(?)', as_8bit_ascii(unicode_downcase(tag))]).force_encoding('BINARY')

then it's always ASCII and nothing explodes.

But looks weird to me. And I am not sure why ASCII is always needed here.

What could be another possible solution here? I can provide pull request with solution above though.

@Fodoj Fodoj changed the title Encoding weirdness Encoding weirdness in .named_any scope May 14, 2014
@jensljungblad
Copy link

Also noticed this problem. The above solution seems to work.

@Linuus
Copy link

Linuus commented Sep 12, 2014

Does any maintainer have any insights regarding this matter?

Ping @mbleigh @seuros

@seuros
Copy link
Collaborator

seuros commented Sep 12, 2014

I think that affect mysql only.

@Linuus
Copy link

Linuus commented Sep 12, 2014

@seuros Ok. We are using Mysql so that seems right.

Any ideas on how to fix it? Perhaps merge the above proposal by @Fodoj

@Fodoj
Copy link
Contributor Author

Fodoj commented Sep 12, 2014

if you'd like I can prepare PR with this change

@seuros
Copy link
Collaborator

seuros commented Sep 12, 2014

Submit a PR with tests, if it don't break other tests/adapter, i will merge it and release a new version.

@Linuus
Copy link

Linuus commented Sep 26, 2014

This PR fixed this issue, right?
#588

@seuros Any plans to release a new version soon?

@seuros
Copy link
Collaborator

seuros commented Sep 26, 2014

Bundle update!

@seuros seuros closed this as completed Sep 26, 2014
cdmicacc pushed a commit to 500px/acts-as-taggable-on that referenced this issue Apr 26, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants