Getting a Count of Occurrences of Items in a Ruby Array (and a Caveat for Rails)

I feel like I’m often wanting to count occurrences of items in an array (Rails has its own special case as well), and I’m always trying to do it the “long way.”

I finally stumbled upon this answer on StackOverflow that details the version-by-version options:

  • Ruby 2.7+ use .tally directly on the array:
irb(main):006:0> %i{a b c c d c e b a a a b d}.tally<br>=> {:a=>4, :b=>3, :c=>3, :d=>2, :e=>1}
irb(main):011:0> %i{a b c c d c e b a a a b d}.group_by(&:itself).transform_values(&:count)
=> {:a=>4, :b=>3, :c=>3, :d=>2, :e=>1}
irb(main):012:0> %i{a b c c d c e b a a a b d}.group_by(&:itself).map { |k,v| [k, v.length] }.to_h<br>=> {:a=>4, :b=>3, :c=>3, :d=>2, :e=>1}

The Rails Exception

It’s a pretty common temptation, especially once you start thinking in terms of the list of items you want to count, to try to use a pure Ruby solution for things. But what if your source is from the your database?

The key here is the database. You probably don’t want to load all of the records from the database just to count them using the above methods, and SQL has a GROUP BY clause which is just called .group.

D, [2021-08-26T02:49:43.996743 #4] DEBUG -- :    (1.2ms)  SELECT COUNT(*) AS count_all, "entries"."user_id" AS entries_user_id FROM "entries" GROUP BY "entries"."user_id"
=> {1=>231, 4=>15, 2=>2}

This output is tallying entries by what User (via user_id) entered them. More importantly, the SQL used did the counts within the database without retrieving any data contained into the application except what was counted. (This used to be a pun on the :what column in the entries table, but apparently we’re not there with proper rendering and cutting and pasting of emojis between apps and OSes and well, I enter emoji as part of my entries in this app.

This original example in extreme wide screen glory

Rails / ActiveRecord count, size, and length

When trying to be sensitive to n+1 queries and memory usage, knowing the differences between count, size, and length in ActiveRecord is important. It had been a while since I reviewed the usage, and I wanted to ensure that I hadn’t made some bad assumptions along the way that somehow stuck. The reality is that each method is pretty close to indicating what it will do, with size being the method that will load the data on (or for) you.


Back in the old days count was a more sophisticated member of ActiveRecord::Calculations::ClassMethods module. You could pass conditions to the method, or column names… basically a combination where and includes and joins.

The column/distinct counting moved to ActiveRecord::Calculations without all the extra conditionals, joins, and including. Note that you do not need a query to “count”:

irb(main):011:0> Model.count(:special_data) # count Model records with non-nil special_data
   (191.9ms)  SELECT COUNT(`models`.`special_data`) FROM `models`
=> 41828
irb(main):012:0> Model.distinct.count(:special_data) # count Model records with DISTINCT non-nil special_data
   (17.6ms)  SELECT COUNT(DISTINCT `models`.`special_data`) FROM `models`
=> 1909
irb(main):013:0> Model.count # count all records
   (3790.8ms)  SELECT COUNT(*) FROM `models`
=> 594383

If you’re just looking for a count of records for a query that has not been loaded, that’s now a member of ActiveRecord::Associations::CollectionProxy.

irb(main):015:0> Model.all.count
   (744.2ms)  SELECT COUNT(*) FROM `models`
=> 594383
irb(main):017:0> Model.where('special_data is not null').count
   (24.0ms)  SELECT COUNT(*) FROM `models` WHERE (special_data is not null)
=> 41828


length will load all of the records indicated by a collection, which might be useful if calling length on an association that you’re going to use the data from anyway, but not if you are throwing that data away. You’ll be wasting time (and memory) on the operation.

irb(main):018:0> Model.where('special_data is not null').length
  Model Load (647.9ms)  SELECT ...
=> 41828

You also can’t call length on a model’s class name, as it is not a collection itself:

irb(main):020:0> Model.length
Traceback (most recent call last):
        1: from (irb):20
NoMethodError (undefined method `length' for #<Class:0x00007f810ed6ec28>)


size also requires a collection, but does not attempt to load that collection, instead wrapping a COUNT around its query:

irb(main):022:0> Model.where('special_data is not null').count
   (22.8ms)  SELECT COUNT(*) FROM `models` WHERE (special_data is not null)
=> 41828

Like with length, this doesn’t work:

irb(main):023:0> Model.size
Traceback (most recent call last):
        1: from (irb):20
NoMethodError (undefined method `size' for #<Class:0x00007f810ed6ec28>)


The behavior of these methods isn’t all that surprising, but sometimes we can let our guard down in Ruby and think of methods as synonyms when they actually have distinct behaviors. This is especially risky if you are working in more than one language or framework and might otherwise gravitate toward a method such as length because it’s second nature elsewhere.