Getting a Count of Occurrences of Items in a Ruby Array (and a Caveat for Rails)

I feel like I’m often wanting to count occurrences of items in an array (Rails has its own special case as well), and I’m always trying to do it the “long way.”

I finally stumbled upon this answer on StackOverflow that details the version-by-version options:

  • Ruby 2.7+ use .tally directly on the array:
irb(main):006:0> %i{a b c c d c e b a a a b d}.tally<br>=> {:a=>4, :b=>3, :c=>3, :d=>2, :e=>1}
irb(main):011:0> %i{a b c c d c e b a a a b d}.group_by(&:itself).transform_values(&:count)
=> {:a=>4, :b=>3, :c=>3, :d=>2, :e=>1}
irb(main):012:0> %i{a b c c d c e b a a a b d}.group_by(&:itself).map { |k,v| [k, v.length] }.to_h<br>=> {:a=>4, :b=>3, :c=>3, :d=>2, :e=>1}

The Rails Exception

It’s a pretty common temptation, especially once you start thinking in terms of the list of items you want to count, to try to use a pure Ruby solution for things. But what if your source is from the your database?

The key here is the database. You probably don’t want to load all of the records from the database just to count them using the above methods, and SQL has a GROUP BY clause which is just called .group.

irb(main):013:0> Entry.group(:user_id).count
D, [2021-08-26T02:49:43.996743 #4] DEBUG -- :    (1.2ms)  SELECT COUNT(*) AS count_all, "entries"."user_id" AS entries_user_id FROM "entries" GROUP BY "entries"."user_id"
=> {1=>231, 4=>15, 2=>2}

This output is tallying entries by what User (via user_id) entered them. More importantly, the SQL used did the counts within the database without retrieving any data contained into the application except what was counted. (This used to be a pun on the :what column in the entries table, but apparently we’re not there with proper rendering and cutting and pasting of emojis between apps and OSes and well, I enter emoji as part of my entries in this app.

This original example in extreme wide screen glory

Use find_each vs. select or each when needing to process each record via Ruby

The problem:

This scenario is a much more practical case of a similar concept with Rails / ActiveRecord count, size, and length. In this case, you’re needing to run ruby code off of either every value from a where method result or a subset that is determined by ruby code. For example, assume that a model Entry has a method after_sunrise (probably want a real sunrise/sunset gem but this is a simplistic example) as follows:

  # determine if a time included in the string description of the entry is after sunrise
  # nil (falsey) if not found
  def after_sunrise
    notated_time = what.match('\d+:?\d* ?[AP]M')
    return nil if notated_time.nil?

    (Time.parse([date.to_s, notated_time].join(' ')) > 
      Time.parse([date.to_s, "6:30 AM"].join(' ')))
  end

Using select for filtering will cause the result from where clause to be loaded all at once:

irb(main):178:0> entries = Entry.where(private: false).select { |entry| entry.after_sunrise } #.each { do something with each record here }
D, [2021-08-07T17:53:52.492261 #4] DEBUG -- :   Entry Load (10.4ms)  SELECT "entries".* FROM "entries" WHERE "entries"."private" = $1  [["private", false]]
=>
[#<Entry:0x000055ddafc1b230

find_each for the batch

If you don’t have very many records, this isn’t a big deal, but if you have models with a lot of data per model instance and you have millions of rows, you risk running out of memory trying to load it all at once. Anytime you take the result of a query or association and try to switch to operating on it like an array of ruby objects, you force the lazy loading of ActiveRecord::Relation and ActiveRecord::Association::CollectionProxy to load those records.

If you switch to find_each you can load those records in batches:

irb(main):179:1* entries = Entry.where(private: false).find_each do |entry|
irb(main):180:1*   next unless entry.after_sunrise
irb(main):181:1*   # do something with each record here
irb(main):182:0> end
D, [2021-08-07T17:57:07.895360 #4] DEBUG -- :   Entry Load (5.1ms)  SELECT "entries".* FROM "entries" WHERE "entries"."private" = $1 ORDER BY "entries"."id" ASC LIMIT $2  [["private", false], ["LIMIT", 1000]]
=> nil

I only have 214 records in this example database, so the default batch size of 1000 makes this look almost identical to the original. However… you can use a keyword argument of batch_size: to tune the size of the batches pulled, in case your records are small and so more than 1000 records can be loaded at a time or they’re large and 1000 records is too much, or you have contrived example and want to show it actually batching:

irb(main):184:1* entries = Entry.where(private: false).find_each(batch_size: 2) do |entry|
irb(main):185:1*   next unless entry.after_sunrise
irb(main):186:1*   # do something with each record here
irb(main):187:0> end
D, [2021-08-07T17:59:22.339190 #4] DEBUG -- :   Entry Load (1.9ms)  SELECT "entries".* FROM "entries" WHERE "entries"."private" = $1 ORDER BY "entries"."id" ASC LIMIT $2  [["private", false], ["LIMIT", 2]]
D, [2021-08-07T17:59:22.344225 #4] DEBUG -- :   Entry Load (1.8ms)  SELECT "entries".* FROM "entries" WHERE "entries"."private" = $1 AND "entries"."id" > $2 ORDER BY "entries"."id" ASC LIMIT $3  [["private", false], ["id", 8], ["LIMIT", 2]]
D, [2021-08-07T17:59:22.347152 #4] DEBUG -- :   Entry Load (1.9ms)  SELECT "entries".* FROM "entries" WHERE "entries"."private" = $1 AND "entries"."id" > $2 ORDER BY "entries"."id" ASC LIMIT $3  [["private", false], ["id", 10], ["LIMIT", 2]]

Conclusion

The difference between chaining .each or .select off of a .where clause vs. chaining .find_each is something you won’t necessarily see the benefits of if you’re building something from the ground up and don’t have much data flowing through your application. You may even even have a successful launch until you grow an order of magnitude or so. That’s part of the challenge of recognizing the need for it.


Rails / ActiveRecord count, size, and length

When trying to be sensitive to n+1 queries and memory usage, knowing the differences between count, size, and length in ActiveRecord is important. It had been a while since I reviewed the usage, and I wanted to ensure that I hadn’t made some bad assumptions along the way that somehow stuck. The reality is that each method is pretty close to indicating what it will do, with size being the method that will load the data on (or for) you.

count

Back in the old days count was a more sophisticated member of ActiveRecord::Calculations::ClassMethods module. You could pass conditions to the method, or column names… basically a combination where and includes and joins.

The column/distinct counting moved to ActiveRecord::Calculations without all the extra conditionals, joins, and including. Note that you do not need a query to “count”:

irb(main):011:0> Model.count(:special_data) # count Model records with non-nil special_data
   (191.9ms)  SELECT COUNT(`models`.`special_data`) FROM `models`
=> 41828
irb(main):012:0> Model.distinct.count(:special_data) # count Model records with DISTINCT non-nil special_data
   (17.6ms)  SELECT COUNT(DISTINCT `models`.`special_data`) FROM `models`
=> 1909
irb(main):013:0> Model.count # count all records
   (3790.8ms)  SELECT COUNT(*) FROM `models`
=> 594383

If you’re just looking for a count of records for a query that has not been loaded, that’s now a member of ActiveRecord::Associations::CollectionProxy.

irb(main):015:0> Model.all.count
   (744.2ms)  SELECT COUNT(*) FROM `models`
=> 594383
irb(main):017:0> Model.where('special_data is not null').count
   (24.0ms)  SELECT COUNT(*) FROM `models` WHERE (special_data is not null)
=> 41828

length

length will load all of the records indicated by a collection, which might be useful if calling length on an association that you’re going to use the data from anyway, but not if you are throwing that data away. You’ll be wasting time (and memory) on the operation.

irb(main):018:0> Model.where('special_data is not null').length
  Model Load (647.9ms)  SELECT ...
.
.
.
=> 41828

You also can’t call length on a model’s class name, as it is not a collection itself:

irb(main):020:0> Model.length
Traceback (most recent call last):
        1: from (irb):20
NoMethodError (undefined method `length' for #<Class:0x00007f810ed6ec28>)

size

size also requires a collection, but does not attempt to load that collection, instead wrapping a COUNT around its query:

irb(main):022:0> Model.where('special_data is not null').count
   (22.8ms)  SELECT COUNT(*) FROM `models` WHERE (special_data is not null)
=> 41828

Like with length, this doesn’t work:

irb(main):023:0> Model.size
Traceback (most recent call last):
        1: from (irb):20
NoMethodError (undefined method `size' for #<Class:0x00007f810ed6ec28>)

Conclusion

The behavior of these methods isn’t all that surprising, but sometimes we can let our guard down in Ruby and think of methods as synonyms when they actually have distinct behaviors. This is especially risky if you are working in more than one language or framework and might otherwise gravitate toward a method such as length because it’s second nature elsewhere.


Raspberry Pi Zero W to monitor Enphase Envoy Solar Array

I decided to set up some form of monitoring for my solar installation after a fuse and the breaker panel broke down leaving me without solar generation for a couple stretches during near-peak, up to about 1,400 kWh, or about $140-210 worth of solar generation.

Missing output

Components (physical and software)

  • A RaspberryPi Zero W on the same wireless network as the Envoy controller was set up on (initially used PiBakery to configure hostname/wifi/username/password, but the project is a little bit stale at this point).
  • A Nexmo account (part of Vonage APIs now) to allow for SMS alerts on zero output when the sun is up.
  • RubySunrise for only emailing alerts from dusk until dawn.
  • Ruby Gmail and a Gmail account for email informational “down” alerts just to be aware that the cron job is running.
  • cron and gmail

Connections

These are described in the source code repo as well

  • ENVOY_HOST for me was envoy.local, but depending on your DNS situation, your mileage may vary. I got my local DNS in a weird enough state that I just looked up the envoy.local IP on my wireless router’s status page and used that.
  • USERNAME and PASSWORD are the Gmail username and app-specific password credentials I generated for the gmail account I used.
  • INVERTER_COUNT is compared to the number of inverters you should have so that even if the array is producing, you can still generate an error if one of them isn’t reporting (only valid when producing)
  • LATITUDE and LONGITUDE plucked from a site that displays your geolocation… this, along with your TZ represented in a form within the TZInfo::Timezone list, and RubySunrise allow you to figure out if the sun’s up.
  • NEXMO* are api keys and config from the Nexmo site (NEXMO_SMS_TO is your personal mobile to alert to)
  • TO_EMAIL is the email to actually mail to

Code

.config must be of the form that follows but the rest of the code can be cloned from envoy-rpi-zero-monitor

    USERNAME='some.burner.gmail.account'
    PASSWORD='gmai1@cc0untp@$$w0rd'
    TO_EMAIL='an.email.you.read@example.com'
    NEXMO_API_KEY="3ab3789123"
    NEXMO_API_SECRET="123456sSD8dh"
    NEXMO_SMS_FROM="19281123581"
    NEXMO_SMS_TO="15551112222"
    LATITUDE=20.1237899
    LONGITUDE=-57.3364631
    TZ='America/Chicago'
    ENVOY_HOST='192.168.1.222'
    INVERTER_COUNT=100
# crontab runs every hours and inits rbenv to use the right ruby version because
# I didn't really care about "production readiness"... it's a Raspberry Pi Zero W
0 * * * * cd /home/twill/envoy-rpi-zero-monitor && eval "$(rbenv init -)" && ruby read-envoy.rb

YMMV

This all depends on having an Enphase Enlighten Envoy (and a bunch of other random “E” names) as your solar monitor, but if you have a relatively recent solar install and your technician needed to configure the monitor for your wifi, then you probably have a similar device with a pollable endpoint. Look at your wireless router’s web console and you’ll see that monitor:

If you browse to that name or the IP address associated, you’ll probably get a web page with status. If you reload with the network tab up, you’ll probably see it retrieve the data via a .json endpoint:

From there, you can build your own monitor around it.


Referencing one trait from another trait in factory_bot

Sometimes you want to DRY up traits by referencing one trait from another trait in factory_bot. I tried searching on “inheriting traits” (that’s just for one factory inheriting traits from another and was in a factory_bot issue in GitHub). I accidentally stumbled upon the answer in a slightly unrelated StackOverflow question about calling a trait from another trait with params in factory_girl.

Ultimately, you use the trait name from the first trait as a method invocation in the referencing trait:

FactoryBot.define do
  factory :user do
    role
    trait :with_supervisor do
      # complex set up might go
      # here
      after(:create) do |user|
        supervisor { create(:user) }
      end
    end
    trait :with_organization do
      with_supervisor # invoke the other trait first
      organization
    end
  end
end

RAW_POST_DATA in rspec rails for Rails 5.2 and beyond

The last time I was trying to specify RAW_POST_DATA in rspec was probably Rails 3 or 4, but I ran into a situation trying to test an edge case for error handling where I wanted that same functionality. I quickly found this issue [Unable to POST raw request body], but didn’t immediately figure out what wasn’t being set correctly.

In this case the test setup I was using was setting multipart/form-data instead of application/xml on the content types:

{:HTTP_ACCEPT=>"application/xml", :HTTP_CONTENT_TYPE=>"multipart/form-data", :CONTENT_TYPE=>"multipart/form-data"}

Because of this, the Rails controller tests that rspec hooks into was trying to break following malformed xml down to parameters:

            <test>
              <data&nbsp;
              <![CDATA[THIS|IS|SENSITIVE|BUT|MALFORMED]]>
              </data>
            </test>
 Minitest::Assertion:
   Expected response to be a <400: bad_request>, but was a <422: Unprocessable Entity>
   Response body: <errors>
       <error>["<test>\n  <data", "nbsp;\n  <!"] are not permitted parameters</error>
   </errors>

I finally noticed that the mime-type might be involved. In this code, Content-Type was also an issue, so:

  • Removed HTTP_CONTENT_TYPE from the headers
  • Set CONTENT_TYPE header to 'application/xml' instead of 'multipart/form-data' to prevent automatic params parsing in this case.
  • Passed as: :xml into the test to get the 'mime-type' correct.

Ultimately, if your code hasn’t boxed you in, then the as: :xml and passing raw data as a parameter should work:

post things_path, params: raw_xml_data, headers: non_form_data_headers, as: :xml

## replacement for the following:
# @request.env['RAW_POST_DATA'] = raw_xml_data
# post things_path

String#tr in ruby (like tr in Linux) complete with figuring out slashes.

It seems like I’ve seen quite a few programming puzzles in the last few weeks that involved translating mistyped input in which the hands were shifted (right) on the keyboard. My first thought was the tr utility in *nix operating systems, but didn’t immediately go looking for or notice that ruby has a tr method on string. However, after doing a trivial implementation involving keyboard rows like the following, I stumbled on the tr method.

  # initial array of characters/strings to shift back to the left with [index-1]
  KEYBOARD_ROWS= [
    '`1234567890-=',
    'qwertyuiop[]\\', # need to escape the backslash or else debugging pain
    "asdfghjkl;'", # double-quotes here because single quote embedded
    'zxcvbnm,./',
    '~!@#\$%^&*()_+',
    'QWERTYUIOP{}|',
    'ASDFGHJKL:"',
    'ZXCVBNM<>?'
  ].join

Attempting to rewrite this for .tr presented a few challenges, however. If you are substituting for \, -, or ~, you have to escape the characters. You also have to escape them from their string representation, which makes for some head-spinning levels of escaping (zsh users who run shell commands through kubectl might be familiar with this pain as well):

# puts '\\~-'.tr('\\', 'a') # doesn't match because \ is passed to tr and not escaped
a~-
# puts '\\~-'.tr('\\\\', 'a') # now \\ is passed to tr, which is
a~-
# puts '\\~-'.tr("\\\\\\", 'a') # with double quotes, you need an extra pair, for 6 total.
a~-
# puts '\\~-'.tr('\\~', 'b') # the escaping backslash needs to be doubled
\b-
# puts '\\~-'.tr("\\\~", 'b') # the escaping backslash needs to be tripled
\b-
# puts '\\~-'.tr('\\-', 'c') # the escaping backslash needs to be doubled
\~c
# puts '\\~-'.tr("\\\-", 'c') # the escaping backslash needs to be tripled
\~c

So if you’re going to use translate to “shift” hands back to the left, the two arguments to tr, SHIFTED_KEYBOARD_ROWS and UNSHIFTED_KEYBOARD_ROWS would have to be defined with the following escaping:

  SHIFTED_KEYBOARD_ROWS =
    [
      '1234567890\\-=',
      'wertyuiop[]\\\\', # 4x backslash = backslash
      "sdfghjkl;'",
      'xcvbnm,./',
      '!@#\$%\^&*()_+',
      'WERTYUIOP{}|',
      'SDFGHJKL:"',
      'XCVBNM<>?'
  ].join

  UNSHIFTED_KEYBOARD_ROWS= [
    '`1234567890\-',
    'qwertyuiop[]', # need to escape the backslash or else debugging pain
    'asdfghjkl;',
    'zxcvbnm,.',
    '~!@#\$%\^&*()_',
    'QWERTYUIOP{}',
    'ASDFGHJKL:',
    'ZXCVBNM<>?'
  ].join

  def self.translate(string)
    string.tr(SHIFTED_KEYBOARD_ROWS, UNSHIFTED_KEYBOARD_ROWS)
  end

Tracing / Debugging ruby output like set -x in bash

In writing some shell scripts in ruby, I decided that I needed to be able to debug (trace) the lines that were being executed. I even ran across a closed StackOverflow question looking for the same thing.

code=ARGF.readlines.grep_v(/^$/)
eval code.map { |c| %Q|puts "+ #{c.gsub("\"", "\\"").strip}"| }.zip(code).join($/)

After playing around with one of the other answers (see above), I ended up taking a different tactic to try and figure out how to debug the scripts. (By the way, the above code breaks if you have line breaks in a single statement, like the following contrived example):

y = 2
      + 4

The important search term here is “trace”, or Tracer to be exact.

Take the following example:

bind = binding
p bind

bind.local_variable_set(:bind, 2)

p bind

p binding.local_variables

bind = binding

p eval("bind", bind)

if 2 > 3
  puts 2
else
  puts 3
end


if true
  puts "looks like the if true is compiled out"
end

if you run the above (contained in a filename binding.rb) using ruby -r tracer binding.rb then you get the following:

#0:/home/tpowell/.rbenv/versions/2.5.5/lib/ruby/2.5.0/rubygems/core_ext/kernel_require.rb:59:Kernel:<:       return gem_original_require(path)
#0:binding.rb:1::-: bind = binding
#0:binding.rb:2::-: p bind
#<Binding:0x000055fe6879cd38>
#0:binding.rb:4::-: bind.local_variable_set(:bind, 2)
#0:binding.rb:6::-: p bind
2
#0:binding.rb:8::-: p binding.local_variables
[:bind]
#0:binding.rb:10::-: bind = binding
#0:binding.rb:12::-: p eval("bind", bind)
#0:binding.rb:10::-: bind = binding
#<Binding:0x000055fe68835718>
#0:binding.rb:14::-: if 2 > 3
#0:binding.rb:17::-:   puts 3
3
#0:binding.rb:22::-:   puts "looks like the if true is compiled out"
looks like the if true is compiled out

One interesting difference between how ruby -r tracer works from set -x is that the ruby tracer appears skips evaluating the if true at all. The above runs were against ruby 2.5.5. Looking at 3.0.0 (and as far back as 2.6.x), I only get the output of the script:

#<Binding:0x000055af54e0c730>
2
[:bind]
#<Binding:0x000055af54e0c050>
3
looks like the if true is compiled out

Looking at Tracer, it’s using set_trace_func under the hood:

set_trace_func proc { |event, file, line, id, binding, classname|
  printf "%8s %s:%-2d %10s %8s\n", event, file, line, id, classname
}

Adding that in the 2.6.x+ world returns:

c-return binding.rb:1  set_trace_func   Kernel
    line binding.rb:5
  c-call binding.rb:5     binding   Kernel
c-return binding.rb:5     binding   Kernel
    line binding.rb:6
  c-call binding.rb:6           p   Kernel
  c-call binding.rb:6     inspect   Kernel
c-return binding.rb:6     inspect   Kernel
.
.
.

That output can be filtered by the event type , but the lines of code themselves aren’t output and apparently set_trace_func was apparently obsoleted as of 2.1.10. TracePoint is the updated way to accomplish this:

trace = TracePoint.new(:line) do |tp|
  p tp
end
trace.enable
.
.
.

But we still have the same problem:

#<TracePoint:line@binding_trace_point.rb:8>
#<TracePoint:line@binding_trace_point.rb:9>
#<Binding:0x00005652f2a125e8>
#<TracePoint:line@binding_trace_point.rb:11>
#<TracePoint:line@binding_trace_point.rb:13>
2
#<TracePoint:line@binding_trace_point.rb:15>
[:trace, :bind]

A crude solution I’ve found around this is to read the line from the file mentioned in the TracePoint from within the block (and this apparently doesn’t end up with a stack overflow).

trace = TracePoint.new(:line) do |tp|
  puts "+ #{File.open(tp.path) { |f| f.each_line.to_a[tp.lineno-1] }}"
end

trace.enable
.
.
.

Which produces a somewhat set -x output:

+ bind = binding
+ p bind
#<Binding:0x000055ef6c9d9f70>
+ bind.local_variable_set(:bind, 2)
+ p bind
2
+ p binding.local_variables
[:trace, :bind]
+ bind = binding
+ p eval("bind", bind)
+ bind = binding
#<Binding:0x000055ef6c8321b8>
+ if 2 > 3
+   if 3 > 2
+     puts 3
3
+   puts "looks like the if true is compiled out"
looks like the if true is compiled out


Rails’ and Ruby’s Hash transform_values

Rails: 4.2.1-5.2.3 and Ruby >= 2.5.5 have a transform_values method on Hash that allows you to pass a block to the method and transform the values of the key-value map in the hash based on the block contents. Essentially, it’s map but for the Hash values only, and with no weird Hash/array-element syntax. From the linked API documents above:

h = { a: 1, b: 2, c: 3 }
h.transform_values {|v| v * v + 1 }  #=> { a: 2, b: 5, c: 10 }
h.transform_values(&:to_s)           #=> { a: "1", b: "2", c: "3" }
h.transform_values.with_index {|v, i| "#{v}.#{i}" }
                                     #=> { a: "1.0", b: "2.1", c: "3.2" }


NoMethodError undefined method `shared_examples_for’ for main:Object for bundle gem rspec

If you create a gem stub using bundle gem thing and select rspec as your test suite, you may get an error similar to the following:

❯ bundle exec rspec

An error occurred while loading ./spec/thing_spec.rb.
Failure/Error:
  shared_examples_for 'saying hello' do
    puts "hi"
  end

NoMethodError:
  undefined method `shared_examples_for' for main:Object
# ./spec/thing_spec.rb:1:in `<top (required)>'
No examples found.

whenever using describe, shared_examples, and shared_examples_for, etc…

After a lot of diving into rspec source code to verify where shared_examples_for was defined (rspec-core so…) I noticed the following code in the stubbed spec_helper.rb:

  # Disable RSpec exposing methods globally on `Module` and `main`
  config.disable_monkey_patching!

If you comment out config.disable_monkey_patching!, then those methods will be included at a top level.