Getting a Count of Occurrences of Items in a Ruby Array (and a Caveat for Rails)

I feel like I’m often wanting to count occurrences of items in an array (Rails has its own special case as well), and I’m always trying to do it the “long way.”

I finally stumbled upon this answer on StackOverflow that details the version-by-version options:

  • Ruby 2.7+ use .tally directly on the array:
irb(main):006:0> %i{a b c c d c e b a a a b d}.tally<br>=> {:a=>4, :b=>3, :c=>3, :d=>2, :e=>1}
irb(main):011:0> %i{a b c c d c e b a a a b d}.group_by(&:itself).transform_values(&:count)
=> {:a=>4, :b=>3, :c=>3, :d=>2, :e=>1}
irb(main):012:0> %i{a b c c d c e b a a a b d}.group_by(&:itself).map { |k,v| [k, v.length] }.to_h<br>=> {:a=>4, :b=>3, :c=>3, :d=>2, :e=>1}

The Rails Exception

It’s a pretty common temptation, especially once you start thinking in terms of the list of items you want to count, to try to use a pure Ruby solution for things. But what if your source is from the your database?

The key here is the database. You probably don’t want to load all of the records from the database just to count them using the above methods, and SQL has a GROUP BY clause which is just called .group.

D, [2021-08-26T02:49:43.996743 #4] DEBUG -- :    (1.2ms)  SELECT COUNT(*) AS count_all, "entries"."user_id" AS entries_user_id FROM "entries" GROUP BY "entries"."user_id"
=> {1=>231, 4=>15, 2=>2}

This output is tallying entries by what User (via user_id) entered them. More importantly, the SQL used did the counts within the database without retrieving any data contained into the application except what was counted. (This used to be a pun on the :what column in the entries table, but apparently we’re not there with proper rendering and cutting and pasting of emojis between apps and OSes and well, I enter emoji as part of my entries in this app.

This original example in extreme wide screen glory

Ruby: Enumerable grep, grep_v

UPDATE: After I wrote this, I started finding myself doing a lot of caller.grep(/(project_directory|suspected_gem)/) to aid in debugging obscure interactions with internal gems and projects.

In looking at a pull request and noticing some awkward “first” and “last” iteration detection which also required each_with_index, I started looking into what would be a cleaner way, and my first step was trying to figure out if there was an enumerable context.

Ultimately I landed on Enumerable#grep and Enumerable#grep_v, which somewhat perplexed me. It’s not really a “grep” unless your collection’s values respond to that:

=> ["100", "200", "300", "400", "500", "600", "700", "800", "900"]

Maybe you want to look for Classes in the ObjectSpace… The argument to #grep is compared against an implicit element for each iteration with the === operator. So you could list everything in the ObjectSpace that’s a Class:

ObjectSpace.each_object.grep(Class) # too long to include here

One situation that I thought of that might be especially useful is Dir globbing:

irb(main):064:0> Dir['*'].grep(/(yarn|json)/)
=> ["package-lock.json", "package.json", "yarn.lock"]

If you were trying to the Ruby REPL as a shell, you could even:

irb(main):005:0> Dir['*/'].grep_v(%r{/packs/}).grep(/(.js$|.erb$|.rb$|.json$|.lock$)/)
=> ["app/controllers/application_controller.rb", "app/controllers/posts_controller.rb", "app/helpers/application_helper.rb", "app/helpers/posts_helper.rb", "app/javascript/src/jets/crud.js", "app/jobs/application_job.rb", "app/models/application_item.rb", "app/models/application_record.rb", "app/models/post.rb", "app/views/layouts/application.html.erb", "app/views/posts/edit.html.erb", "app/views/posts/index.html.erb", "app/views/posts/new.html.erb", "app/views/posts/show.html.erb", "app/views/posts/_form.html.erb", "babel.config.js", "config/application.rb", "config/environments/development.rb", "config/environments/production.rb", "config/environments/test.rb", "config/routes.rb", "config/webpack/development.js", "config/webpack/environment.js", "config/webpack/production.js", "config/webpack/test.js", "db/migrate/20210610023540_create_posts.rb", "db/schema.rb", "Gemfile.lock", "postcss.config.js", "spec/controllers/posts_controller_spec.rb", "spec/fixtures/payloads/posts-index.json", "spec/fixtures/payloads/posts-show.json", "spec/spec_helper.rb"]

And, as with .each, .map, etc… you can pass a block interact with each element. Your return from each iteration will map back to the output.

# array containing the contents of all the files matching:
Dir['*/'].grep_v(%r{/packs/}).grep(/(.js$|.erb$|.rb$|.json$|.lock$)/) { |f| }

I’m not sure if I’ve seen any code that would be made cleaner by grep (unless for utility scripting), and there’s always the risk of lowering maintainability of the code by using features no one else uses, but it’s nice to know that Ruby always has something more even after using it for many years.