Getting a Count of Occurrences of Items in a Ruby Array (and a Caveat for Rails)

I feel like I’m often wanting to count occurrences of items in an array (Rails has its own special case as well), and I’m always trying to do it the “long way.”

I finally stumbled upon this answer on StackOverflow that details the version-by-version options:

  • Ruby 2.7+ use .tally directly on the array:
irb(main):006:0> %i{a b c c d c e b a a a b d}.tally<br>=> {:a=>4, :b=>3, :c=>3, :d=>2, :e=>1}
irb(main):011:0> %i{a b c c d c e b a a a b d}.group_by(&:itself).transform_values(&:count)
=> {:a=>4, :b=>3, :c=>3, :d=>2, :e=>1}
irb(main):012:0> %i{a b c c d c e b a a a b d}.group_by(&:itself).map { |k,v| [k, v.length] }.to_h<br>=> {:a=>4, :b=>3, :c=>3, :d=>2, :e=>1}

The Rails Exception

It’s a pretty common temptation, especially once you start thinking in terms of the list of items you want to count, to try to use a pure Ruby solution for things. But what if your source is from the your database?

The key here is the database. You probably don’t want to load all of the records from the database just to count them using the above methods, and SQL has a GROUP BY clause which is just called .group.

irb(main):013:0> Entry.group(:user_id).count
D, [2021-08-26T02:49:43.996743 #4] DEBUG -- :    (1.2ms)  SELECT COUNT(*) AS count_all, "entries"."user_id" AS entries_user_id FROM "entries" GROUP BY "entries"."user_id"
=> {1=>231, 4=>15, 2=>2}

This output is tallying entries by what User (via user_id) entered them. More importantly, the SQL used did the counts within the database without retrieving any data contained into the application except what was counted. (This used to be a pun on the :what column in the entries table, but apparently we’re not there with proper rendering and cutting and pasting of emojis between apps and OSes and well, I enter emoji as part of my entries in this app.

This original example in extreme wide screen glory

Plotting Characters to the Commodore 128 80-column (8563) chip.

You can find the below source code at https://github.com/stringsn88keys/unnecessary-computer-things/blob/main/episodes/2021/001/commodore-128/CHASXYCOSINE.BAS and watch here for demonstration of the run time vs. the TRS-80 Model 16 emulator (a computer for which I never realized had a graphics mode when I had access to one). The BASIC and disassembly is also available on page 312 of the Commodore 128: Programmer’s Reference Guide… but beware, the OCR versions translate the “1”s occasionally to lowercase “l”s (which wouldn’t exist in a Commodore program listing unless all lowercase) and the “O”s to “0”s (but inconsistently).

Why is this a post?

If you’re here, you probably wrote some Commodore screen plotting code in which the screen was mappable via POKE commands for the contiguous video RAM (22×23 for VIC-20, 80×25 for Commodore 64/128) or by DRAW commands in graphics mode for the Commodore 128. (Also bitmap POKEable for the Commodore 64, if I recall correctly… haven’t sorted out the VIC-20’s situation yet… that’s another day.)

Well, the 80-column MOS 8563 chip has its own video ram. And given the 16-bit address space (yeah, there are BANKs to switch for the 128), it’s not readily addressable in contiguous address pointer space. Actually, it’s NOT ADDRESSABLE AT ALL by address space. Instead, there’s a convoluted process to write to it. (Thanks to this video for helping me “get it” fully)

The process

  • High memory byte write
    • Store video register number 18 for the 8563 high memory address byte in register X (register X = 18)
    • Store 8563 high memory address in register A (register A = [8563 address] / 256)
    • Write value of X to location $D600 (location 54784 in decimal)
    • Wait until $D600‘s high bit flips (instructions: do a bit test, check for “bit plus” (sign bit is bit 7, so loop if it’s still zero)
    • Write value of A to location $D601 (location 54785 in decimal)
  • Low memory byte write
    • Store video register number 19 for the 8563 low memory address byte in register X (register X = 19)
    • Store 8563 high memory address in register A (register A = [8563 address] AND 255)
    • Write value of X to location $D600
    • Wait until $D600‘s high bit flips
    • Write value of A to location $D601
  • Write the actual content!
    • Store video register number 31 in register X to signal that you want to interact with data in the address set by the last two steps.
    • Store byte you want to write in register A
    • Write value of X to location $D600
    • Wait until $D600‘s high bit flips
    • Write value of A to location $D601

The code

This can probably be done with POKE and PEEK, but this process is tedious enough for machine code. You can assemble it this way:

0180c 8e 00 d6  stx $d600
0180f 2c 00 d6  bit $d600
01812 10 fb     bpl $180f
01814 8d 01 d6  sta $d601
01817 60        rts

Or store it in data and have your basic routine load it into memory. The former is a lot saner for actual entry because you at least get the assembly mnemonics.

Invoking the code

100 addr = (x * 80) + y : rem x = 0 to 79, y = 0 to 24
110 c = 209 : rem the filled disc
120 gosub 11010
9999 end
10000 vo=dec("180c")
11000 rem vo is output routine location, addr address, c character to output
11010 sys vo, addr/256,18
11020 sys vo, addrand255,19
11030 sys vo, c, 31
11040 return

Further Challenges

You’ll notice in the video that the filled disc characters are reversed. That’s because while the characters in video RAM are in $0000$07FF (0-2047), the attributes are in $0800$0FFF. I haven’t confirmed, but I believe there are three different registers to set those as well.


Microphone Crackle and Distortion on Windows 10 with New ASUS Motherboard

I’ve had to remedy microphone crackle and distortion twice on two different builds/rebuilds of Windows 10 PCs, and somehow forgot from the last time how to remedy.

In my case, the problem was the onboard sound for the ASUS Motherboard (ASUS ROG Strix B550-E and ASUS ROG Strix B550-A motherboards.) The solution is to get the audio drivers for the Realtek Audio that come on the board and install them (B550-A and B550-E downloads.)


Craftsman V60 Mower (Cordless Electric) Two Years In

We’ve had mixed results with our Craftsman V60 Mower (CMCMW270) after a little over two years of ownership.

UPDATE: After a back and forth on a response to a review on their page, I appear to have gotten a check for the cost of the larger battery that’s still available (5.0Ah, though, not the original 7.5Ah). Still trying to decide whether to get a couple of batteries or upgrade mower.

The Good

While we’ve had plenty of cordless lawn equipment such as trimmers and blowers, this was our first lawn mower that wasn’t gas powered. Being able to mow without breathing in gas fumes for an hour is a huge benefit to an electric mower, and not having a cord attached seems like the only way to go.

Another huge benefit is that it starts and stops pretty easily. The initial battery life on the V60 was only about an hour or so for our cutting needs, but you could start and stop the mower in an instant in cases in which you might keep a gas mower running just to not have to fight with starting it back up again.

The bad

The biggest drawback is that the battery life of the 7.5AH battery now only lasts long enough for at most 2/3rds of the yard to be cut. This is only 2 years after purchase. You’ll also notice that if you search for CMCB6075 (the 7.5AH battery) or go to the above link, there are no sellers of the brand new battery, only the occasional eBay listing. We did get a decent discount over brands that weren’t dying out, but didn’t expect the support for this thing to disappear so quickly.

The CMCB6025 2.5AH battery is still available on Amazon (affiliate link) or Lowes, but it’s a whopping $150 for 15-20 minutes of battery life. Still, we manage to juggle the CMCB6025 for our blower with the CMCB6075 to get the job done.

A similar nuisance that is compounded by the abysmal battery life is the fact that the blade doesn’t quite spin up to the speed of a gas mower, which left us with missed blades of grass everywhere, even when the blade was fresh and sharp.

Conclusion

I’ve liked the experience enough with this mower to not really want to go back to gas-powered, but I don’t look forward to paying the premium and being on the hook for replacement batteries.


kubectl and eks… You must be logged in to the server (Unauthorized)

Say you have a setup with EKS using IAM API keys with Admin permissions and are using an AWS profile that you’ve confirmed can retrieve the eks config with aws eks update-kubeconfig --name context-name --region appropriate-region-name-n.

But then kubectl get pods --context context-name still fails, oddly with a “You must be logged in to the server (Unauthorized)”

Similarly, kubectl describe configmap -n kube-system aws-auth fails with the same message.

Is your username and/or role included/spelled correctly?

If you have access to all of the resources used by EKS then perhaps the ConfigMap is the issue. Check out How do I resolve an unauthorized server error when I connect to the Amazon EKS API server? for more details, and presumably the “You’re not the cluster creator” section.

Debugging steps:

  • Have the cluster creator or member of the system:masters group run kubectl describe configmap -n kube-system aws-auth and verify that the mapUsers or mapRoles section mention the identity with adequate permissions as returned by aws sts get-caller-identity
  • Double check that if the identity/roles are there that they have adequate permissions.
  • Double check that the identity/role ARN and username actually match. (This was a relatively simple setup, so in my case it was just a misspelling of the username that was the cause.)


Running a Commodore VICE emulator on a remote Ubuntu Linux machine with Xfvb

The Challenge

I want to have a Commodore 128 VICE emulator start up, run some arbitrary BASIC code, and get a snapshot of the output. There are a few settings configurable from the command line to accomplish this:

  • +sound (without this option you will get and error "pa_simple_new(): Connection refused" because you’re *probably* not going to have a PulseAudio option for your remote linux box)
  • -limitcycles 10000000 (intentionally timeout the machine after 10 million cycles… ~10 seconds)
  • -exitscreenscreenshotvicii – this is just -exitscreenshot for non-128 emulators
  • -keybuf so that you can “type” in your program to the BASIC emulator

Installing and Running VICE

In Ubuntu the VICE package can be installed with sudo apt install vice. You still have a couple of issues: First you have nothing to send your display to. I remedied this with Xfvb (the package is lowercase x, but the executable is uppercase)

sudo apt install vice xfvb 
Xfvb :1 & # if you exit your session, you'll have to kill this off or point to it again
export DISPLAY=:1 # use Xfvb for your "display"

At this point, if you try to run on Ubuntu, you’ll be missing ROMs for the various components (basic and the kernal are two of them). They don’t install with the vice package because they’re not appropriately licensed (understatement) for the Ubuntu distro. If you try to run the emulator without them, you’ll get something like the following:

*** VICE Version 3.4 ***

Welcome to x128, the free portable C128 Emulator.

Current VICE team members:
Marco van den Heuvel, Fabrizio Gennari, Groepaz, Errol Smith, Olaf Seibert,
Marcus Sutton, Kajtar Zsolt, AreaScout, Bas Wassink, Michael C. Martin,
David Hogan.

This is free software with ABSOLUTELY NO WARRANTY.
See the "About VICE" command for more info.

C128MEM: Error - Couldn't load kernal ROM `kernal'.
Error - Machine initialization failed.

Exiting...
Segmentation fault

Getting and installing the ROMs

You can download the ROMs from the release source file on the project page. I used the vice-3.4 source. Download/upload the file to your Ubuntu machine and then untar and copy the rom files from the vice-{version}/data directory to /usb/lib/vice:

tar -zxvf vice-3.4.tar.gz
cd vice-3.4/data
sudo ls **/* | grep  -v '\.' | sudo xargs -I {} cp -Rp --parents {} /usr/lib/vice

Run a test script

The following code should have the emulator draw a circle and then capture to the horribly named haha.png:

x128 -keybuf "10 graphic 1
20 scnclr
30 circle 1,100,100,30
run
" -sound -limitcycles 10000000 -exitscreenshotvicii haha.png

Next Steps

I don’t know… Hook up a lambda? Write a crude server that listens on a COM port? One thing I’m happy about discovering is the -keybuf argument, because I know now that I can inject BASIC (keystrokes to enter BASIC) into an emulator from a source code file without having to worry about the disk or tape image formats.


Use find_each vs. select or each when needing to process each record via Ruby

The problem:

This scenario is a much more practical case of a similar concept with Rails / ActiveRecord count, size, and length. In this case, you’re needing to run ruby code off of either every value from a where method result or a subset that is determined by ruby code. For example, assume that a model Entry has a method after_sunrise (probably want a real sunrise/sunset gem but this is a simplistic example) as follows:

  # determine if a time included in the string description of the entry is after sunrise
  # nil (falsey) if not found
  def after_sunrise
    notated_time = what.match('\d+:?\d* ?[AP]M')
    return nil if notated_time.nil?

    (Time.parse([date.to_s, notated_time].join(' ')) > 
      Time.parse([date.to_s, "6:30 AM"].join(' ')))
  end

Using select for filtering will cause the result from where clause to be loaded all at once:

irb(main):178:0> entries = Entry.where(private: false).select { |entry| entry.after_sunrise } #.each { do something with each record here }
D, [2021-08-07T17:53:52.492261 #4] DEBUG -- :   Entry Load (10.4ms)  SELECT "entries".* FROM "entries" WHERE "entries"."private" = $1  [["private", false]]
=>
[#<Entry:0x000055ddafc1b230

find_each for the batch

If you don’t have very many records, this isn’t a big deal, but if you have models with a lot of data per model instance and you have millions of rows, you risk running out of memory trying to load it all at once. Anytime you take the result of a query or association and try to switch to operating on it like an array of ruby objects, you force the lazy loading of ActiveRecord::Relation and ActiveRecord::Association::CollectionProxy to load those records.

If you switch to find_each you can load those records in batches:

irb(main):179:1* entries = Entry.where(private: false).find_each do |entry|
irb(main):180:1*   next unless entry.after_sunrise
irb(main):181:1*   # do something with each record here
irb(main):182:0> end
D, [2021-08-07T17:57:07.895360 #4] DEBUG -- :   Entry Load (5.1ms)  SELECT "entries".* FROM "entries" WHERE "entries"."private" = $1 ORDER BY "entries"."id" ASC LIMIT $2  [["private", false], ["LIMIT", 1000]]
=> nil

I only have 214 records in this example database, so the default batch size of 1000 makes this look almost identical to the original. However… you can use a keyword argument of batch_size: to tune the size of the batches pulled, in case your records are small and so more than 1000 records can be loaded at a time or they’re large and 1000 records is too much, or you have contrived example and want to show it actually batching:

irb(main):184:1* entries = Entry.where(private: false).find_each(batch_size: 2) do |entry|
irb(main):185:1*   next unless entry.after_sunrise
irb(main):186:1*   # do something with each record here
irb(main):187:0> end
D, [2021-08-07T17:59:22.339190 #4] DEBUG -- :   Entry Load (1.9ms)  SELECT "entries".* FROM "entries" WHERE "entries"."private" = $1 ORDER BY "entries"."id" ASC LIMIT $2  [["private", false], ["LIMIT", 2]]
D, [2021-08-07T17:59:22.344225 #4] DEBUG -- :   Entry Load (1.8ms)  SELECT "entries".* FROM "entries" WHERE "entries"."private" = $1 AND "entries"."id" > $2 ORDER BY "entries"."id" ASC LIMIT $3  [["private", false], ["id", 8], ["LIMIT", 2]]
D, [2021-08-07T17:59:22.347152 #4] DEBUG -- :   Entry Load (1.9ms)  SELECT "entries".* FROM "entries" WHERE "entries"."private" = $1 AND "entries"."id" > $2 ORDER BY "entries"."id" ASC LIMIT $3  [["private", false], ["id", 10], ["LIMIT", 2]]

Conclusion

The difference between chaining .each or .select off of a .where clause vs. chaining .find_each is something you won’t necessarily see the benefits of if you’re building something from the ground up and don’t have much data flowing through your application. You may even even have a successful launch until you grow an order of magnitude or so. That’s part of the challenge of recognizing the need for it.


Rails / ActiveRecord count, size, and length

When trying to be sensitive to n+1 queries and memory usage, knowing the differences between count, size, and length in ActiveRecord is important. It had been a while since I reviewed the usage, and I wanted to ensure that I hadn’t made some bad assumptions along the way that somehow stuck. The reality is that each method is pretty close to indicating what it will do, with size being the method that will load the data on (or for) you.

count

Back in the old days count was a more sophisticated member of ActiveRecord::Calculations::ClassMethods module. You could pass conditions to the method, or column names… basically a combination where and includes and joins.

The column/distinct counting moved to ActiveRecord::Calculations without all the extra conditionals, joins, and including. Note that you do not need a query to “count”:

irb(main):011:0> Model.count(:special_data) # count Model records with non-nil special_data
   (191.9ms)  SELECT COUNT(`models`.`special_data`) FROM `models`
=> 41828
irb(main):012:0> Model.distinct.count(:special_data) # count Model records with DISTINCT non-nil special_data
   (17.6ms)  SELECT COUNT(DISTINCT `models`.`special_data`) FROM `models`
=> 1909
irb(main):013:0> Model.count # count all records
   (3790.8ms)  SELECT COUNT(*) FROM `models`
=> 594383

If you’re just looking for a count of records for a query that has not been loaded, that’s now a member of ActiveRecord::Associations::CollectionProxy.

irb(main):015:0> Model.all.count
   (744.2ms)  SELECT COUNT(*) FROM `models`
=> 594383
irb(main):017:0> Model.where('special_data is not null').count
   (24.0ms)  SELECT COUNT(*) FROM `models` WHERE (special_data is not null)
=> 41828

length

length will load all of the records indicated by a collection, which might be useful if calling length on an association that you’re going to use the data from anyway, but not if you are throwing that data away. You’ll be wasting time (and memory) on the operation.

irb(main):018:0> Model.where('special_data is not null').length
  Model Load (647.9ms)  SELECT ...
.
.
.
=> 41828

You also can’t call length on a model’s class name, as it is not a collection itself:

irb(main):020:0> Model.length
Traceback (most recent call last):
        1: from (irb):20
NoMethodError (undefined method `length' for #<Class:0x00007f810ed6ec28>)

size

size also requires a collection, but does not attempt to load that collection, instead wrapping a COUNT around its query:

irb(main):022:0> Model.where('special_data is not null').count
   (22.8ms)  SELECT COUNT(*) FROM `models` WHERE (special_data is not null)
=> 41828

Like with length, this doesn’t work:

irb(main):023:0> Model.size
Traceback (most recent call last):
        1: from (irb):20
NoMethodError (undefined method `size' for #<Class:0x00007f810ed6ec28>)

Conclusion

The behavior of these methods isn’t all that surprising, but sometimes we can let our guard down in Ruby and think of methods as synonyms when they actually have distinct behaviors. This is especially risky if you are working in more than one language or framework and might otherwise gravitate toward a method such as length because it’s second nature elsewhere.


Getting a count of the rows in all of your ActiveRecord models

Warning: Going through ActiveRecord is likely not something you want to do in production unless you have a replica database that your script or your model can point at. See the second part for MySQL and PostgreSQL examples in SQL.

How do you enumerate your models?

For apps created under Rails 5 app or later, all of your ActiveRecord models should be descendants ApplicationRecord as well. If this is an option, you’ll be able to filter out some noise from descendants such as SchemaMigration, etc…

Using the descendants of ApplicationRecord you can map the statistics… You’ll need to catch errors on tables that might only be in production-like environments if you’re running locally:

counts = ApplicationRecord.descendants.map do |klass|
  [klass.name, klass.table_name, klass.count]
rescue Mysql2::Error, ActiveRecord::StatementInvalid # if certain tables aren't present in certain environments
  nil
end


=> []

Oops. I’m in development so… no eager loading of the models.


Rails.application.eager_load! # if in development

counts = ApplicationRecord.descendants.map do |klass|
  [klass.name, klass.table_name, klass.count]
rescue Mysql2::Error, ActiveRecord::StatementInvalid # if certain tables aren't present in certain environments
  nil
end

# lots of SELECT COUNT(*) logs if you're on :debug level logging
# followed by an array of name/table name/counts

You might also want to compact that result due to nils:


Rails.application.eager_load!
counts = ApplicationRecord.descendants.map do |klass|
  [klass.name, klass.table_name, klass.count]
rescue Mysql2::Error, ActiveRecord::StatementInvalid # if certain tables aren't present in certain environments
  nil
end

From there, you can export to CSV:

Rails.application.eager_load!

counts = ApplicationRecord.descendants.map do |klass|
  [klass.name, klass.table_name, klass.count]
rescue Mysql2::Error, ActiveRecord::StatementInvalid # if certain tables aren't present in certain environments
  nil
end

output_filename = "/tmp/counts.csv"

CSV.open(output_filename, "wt") do |csv|
  csv << ['class name', 'table name', 'row count']
  counts.compact.each do |row|
    csv << row
  end
end

What about from the database itself?

The below options are *significantly* faster than going through the Rails stack, so unless you need additional information from Rails, you should refine the below options instead (especially for production)

  • MySQL:
select table_name, sum(table_rows) from information_schema.tables where table_schema = 'app_dbname_development' group by table_name;
select table_schema, 
       table_name, 
       (xpath('/row/cnt/text()', xml_count))[1]::text::int as row_count
from (
  select table_name, table_schema, 
         query_to_xml(format('select count(*) as cnt from %I.%I', table_schema, table_name), false, true, '') as xml_count
  from information_schema.tables
  where table_schema = 'public' --<< change here for the schema you want
) t order by 3 DESC


Self-Modifying Code on a Commodore VIC-20

Note: All code listings are in lower case so that they are pastable into the VICE emulator. Otherwise, you will get graphics/uppercase PETSCII characters on paste.

Examining the structure of how the BASIC code is stored

User program RAM is in locations 4096 to 7680 (decimal) on a VIC 20. The storage format of the basic programs can be dumped with the following BASIC:

for i=4096 to 7680 - fre(1): ? i,chr$(peek(i));peek(i): next i 

I’ve taken the extra step up adding a slightly more sophisticated version of the above at line 10000 in the below code so that I can RUN 10000 to dump memory locations with paging and skipping control and non-printable characters.

10 print "hi"
20 n=peek(4104)
30 x=peek(4105)
40 if n >= 90 then n=65
50 n=n+1
60 x=int(26*rnd(1)+65)
70 poke 4104,n
80 poke 4105,x
90 goto 10
9999 end
10000 b=4096:i=b
10010 e=7680-fre(0)
10020 c=0
10030 ls=20
10040 ? i,
10050 ch=peek(i)
10060 ? ch;
10070 if(ch>=32 and ch<=127)or(ch>=160 and ch<=254)then ? chr$(ch);
10075 ?
10080 if c>ls then ? "continue";: input wt$: c=0
10090 c=c+1
10100 i=i+1
10110 if i>e then end
10120 goto 10040
User program RAM dump

You’ll notice in the above that we start with a null character (0) followed by 12, 16, 10 and 0. 12 and 16 are a pointer to the the memory location of the next line of code (in “little endian” order, so 16 * 256 + 12 = 4108)

The next bytes, at location 4099 and 4100, are 10 and 0. This is the line number for that line of code (again, in little endian format).

Once you get past these 2 2 byte numbers, you have a code…. 153: 153 is the VIC 20 BASIC Keyword Code for the PRINT statement. All syntactically significant tokens (keywords and symbols) are reduced to a single byte (and TAB and SPC functions actually include their left parenthesis as part of this code). The VIC-20 Programmer’s Reference Guide lists out these values (some of these are just their PETSCII codes if individual characters):

VIC 20 BASIC Keyword Codes

You’ll notice that space (32) and double quote (34) are explicitly expressed, as are the individual digits of any number literals.

At the very end of the line is a 0/null again to terminate the line. (Fun part of this experiment: Setting a byte in the middle of the line to 0 makes the rest of the line unreadable by the BASIC interpreter!)

Modifying the code

For an easy first attempt at this, I’m going to just change location 4105 and 4106, which are the letters in HI

10 print "hi"
HI at 4104 and 4105

In the below code, I’m cycling the original H through the alphabet (65-90) and setting the original I with random values:

20 n=peek(4104)
30 x=peek(4105)
40 if n >= 90 then n=65
50 n=n+1
60 x=int(26*rnd(1)+65)
70 poke 4104,n
80 poke 4105,x
90 goto 10
The changing 2-letter strings from the self-modifying code

If you BREAK out of the program (Esc key in VICE emulator) after running and list the first few lines, you’ll see that the initial PRINT statement’s string has indeed changed:

The print statement has had its string changed.

What’s Next?

This is obviously a very trivial exercise of self-modifying code, but any modifications that require anything aside from 1:1 in-place replacement requires more planning: The lines of a program are variable in length, which means that inserting code requires shifting subsequent code in memory. Also, shifting code in memory requires updating all pointers that pointed to the original locations. The next exercise will probably be adding code to the end of the program rather than trying to insert it in the middle.