Tagged: activerecord Toggle Comment Threads | Keyboard Shortcuts

  • ThomasPowell 1:15 pm on August 7, 2021 Permalink | Reply
    Tags: activerecord, find_each, ,   

    Use find_each vs. select or each when needing to process each record via Ruby 

    The problem:

    This scenario is a much more practical case of a similar concept with Rails / ActiveRecord count, size, and length. In this case, you’re needing to run ruby code off of either every value from a where method result or a subset that is determined by ruby code. For example, assume that a model Entry has a method after_sunrise (probably want a real sunrise/sunset gem but this is a simplistic example) as follows:

      # determine if a time included in the string description of the entry is after sunrise
      # nil (falsey) if not found
      def after_sunrise
        notated_time = what.match('\d+:?\d* ?[AP]M')
        return nil if notated_time.nil?
    
        (Time.parse([date.to_s, notated_time].join(' ')) > 
          Time.parse([date.to_s, "6:30 AM"].join(' ')))
      end
    

    Using select for filtering will cause the result from where clause to be loaded all at once:

    irb(main):178:0> entries = Entry.where(private: false).select { |entry| entry.after_sunrise } #.each { do something with each record here }
    D, [2021-08-07T17:53:52.492261 #4] DEBUG -- :   Entry Load (10.4ms)  SELECT "entries".* FROM "entries" WHERE "entries"."private" = $1  [["private", false]]
    =>
    [#<Entry:0x000055ddafc1b230
    

    find_each for the batch

    If you don’t have very many records, this isn’t a big deal, but if you have models with a lot of data per model instance and you have millions of rows, you risk running out of memory trying to load it all at once. Anytime you take the result of a query or association and try to switch to operating on it like an array of ruby objects, you force the lazy loading of ActiveRecord::Relation and ActiveRecord::Association::CollectionProxy to load those records.

    If you switch to find_each you can load those records in batches:

    irb(main):179:1* entries = Entry.where(private: false).find_each do |entry|
    irb(main):180:1*   next unless entry.after_sunrise
    irb(main):181:1*   # do something with each record here
    irb(main):182:0> end
    D, [2021-08-07T17:57:07.895360 #4] DEBUG -- :   Entry Load (5.1ms)  SELECT "entries".* FROM "entries" WHERE "entries"."private" = $1 ORDER BY "entries"."id" ASC LIMIT $2  [["private", false], ["LIMIT", 1000]]
    => nil
    

    I only have 214 records in this example database, so the default batch size of 1000 makes this look almost identical to the original. However… you can use a keyword argument of batch_size: to tune the size of the batches pulled, in case your records are small and so more than 1000 records can be loaded at a time or they’re large and 1000 records is too much, or you have contrived example and want to show it actually batching:

    irb(main):184:1* entries = Entry.where(private: false).find_each(batch_size: 2) do |entry|
    irb(main):185:1*   next unless entry.after_sunrise
    irb(main):186:1*   # do something with each record here
    irb(main):187:0> end
    D, [2021-08-07T17:59:22.339190 #4] DEBUG -- :   Entry Load (1.9ms)  SELECT "entries".* FROM "entries" WHERE "entries"."private" = $1 ORDER BY "entries"."id" ASC LIMIT $2  [["private", false], ["LIMIT", 2]]
    D, [2021-08-07T17:59:22.344225 #4] DEBUG -- :   Entry Load (1.8ms)  SELECT "entries".* FROM "entries" WHERE "entries"."private" = $1 AND "entries"."id" > $2 ORDER BY "entries"."id" ASC LIMIT $3  [["private", false], ["id", 8], ["LIMIT", 2]]
    D, [2021-08-07T17:59:22.347152 #4] DEBUG -- :   Entry Load (1.9ms)  SELECT "entries".* FROM "entries" WHERE "entries"."private" = $1 AND "entries"."id" > $2 ORDER BY "entries"."id" ASC LIMIT $3  [["private", false], ["id", 10], ["LIMIT", 2]]
    

    Conclusion

    The difference between chaining .each or .select off of a .where clause vs. chaining .find_each is something you won’t necessarily see the benefits of if you’re building something from the ground up and don’t have much data flowing through your application. You may even even have a successful launch until you grow an order of magnitude or so. That’s part of the challenge of recognizing the need for it.

     
  • ThomasPowell 5:34 pm on August 5, 2021 Permalink | Reply
    Tags: activerecord, , ,   

    Rails / ActiveRecord count, size, and length 

    When trying to be sensitive to n+1 queries and memory usage, knowing the differences between count, size, and length in ActiveRecord is important. It had been a while since I reviewed the usage, and I wanted to ensure that I hadn’t made some bad assumptions along the way that somehow stuck. The reality is that each method is pretty close to indicating what it will do, with size being the method that will load the data on (or for) you.

    count

    Back in the old days count was a more sophisticated member of ActiveRecord::Calculations::ClassMethods module. You could pass conditions to the method, or column names… basically a combination where and includes and joins.

    The column/distinct counting moved to ActiveRecord::Calculations without all the extra conditionals, joins, and including. Note that you do not need a query to “count”:

    irb(main):011:0> Model.count(:special_data) # count Model records with non-nil special_data
       (191.9ms)  SELECT COUNT(`models`.`special_data`) FROM `models`
    => 41828
    irb(main):012:0> Model.distinct.count(:special_data) # count Model records with DISTINCT non-nil special_data
       (17.6ms)  SELECT COUNT(DISTINCT `models`.`special_data`) FROM `models`
    => 1909
    irb(main):013:0> Model.count # count all records
       (3790.8ms)  SELECT COUNT(*) FROM `models`
    => 594383

    If you’re just looking for a count of records for a query that has not been loaded, that’s now a member of ActiveRecord::Associations::CollectionProxy.

    irb(main):015:0> Model.all.count
       (744.2ms)  SELECT COUNT(*) FROM `models`
    => 594383
    irb(main):017:0> Model.where('special_data is not null').count
       (24.0ms)  SELECT COUNT(*) FROM `models` WHERE (special_data is not null)
    => 41828

    length

    length will load all of the records indicated by a collection, which might be useful if calling length on an association that you’re going to use the data from anyway, but not if you are throwing that data away. You’ll be wasting time (and memory) on the operation.

    irb(main):018:0> Model.where('special_data is not null').length
      Model Load (647.9ms)  SELECT ...
    .
    .
    .
    => 41828
    

    You also can’t call length on a model’s class name, as it is not a collection itself:

    irb(main):020:0> Model.length
    Traceback (most recent call last):
            1: from (irb):20
    NoMethodError (undefined method `length' for #<Class:0x00007f810ed6ec28>)
    

    size

    size also requires a collection, but does not attempt to load that collection, instead wrapping a COUNT around its query:

    irb(main):022:0> Model.where('special_data is not null').count
       (22.8ms)  SELECT COUNT(*) FROM `models` WHERE (special_data is not null)
    => 41828
    

    Like with length, this doesn’t work:

    irb(main):023:0> Model.size
    Traceback (most recent call last):
            1: from (irb):20
    NoMethodError (undefined method `size' for #<Class:0x00007f810ed6ec28>)
    
    

    Conclusion

    The behavior of these methods isn’t all that surprising, but sometimes we can let our guard down in Ruby and think of methods as synonyms when they actually have distinct behaviors. This is especially risky if you are working in more than one language or framework and might otherwise gravitate toward a method such as length because it’s second nature elsewhere.

     
  • ThomasPowell 6:02 pm on August 3, 2021 Permalink | Reply
    Tags: activerecord, , postgres,   

    Getting a count of the rows in all of your ActiveRecord models 

    Warning: Going through ActiveRecord is likely not something you want to do in production unless you have a replica database that your script or your model can point at. See the second part for MySQL and PostgreSQL examples in SQL.

    How do you enumerate your models?

    For apps created under Rails 5 app or later, all of your ActiveRecord models should be descendants ApplicationRecord as well. If this is an option, you’ll be able to filter out some noise from descendants such as SchemaMigration, etc…

    Using the descendants of ApplicationRecord you can map the statistics… You’ll need to catch errors on tables that might only be in production-like environments if you’re running locally:

    counts = ApplicationRecord.descendants.map do |klass|
      [klass.name, klass.table_name, klass.count]
    rescue Mysql2::Error, ActiveRecord::StatementInvalid # if certain tables aren't present in certain environments
      nil
    end
    
    
    => []
    

    Oops. I’m in development so… no eager loading of the models.

    Rails.application.eager_load! # if in development
    
    counts = ApplicationRecord.descendants.map do |klass|
      [klass.name, klass.table_name, klass.count]
    rescue Mysql2::Error, ActiveRecord::StatementInvalid # if certain tables aren't present in certain environments
      nil
    end
    
    # lots of SELECT COUNT(*) logs if you're on :debug level logging
    # followed by an array of name/table name/counts
    
    

    You might also want to compact that result due to nils:

    Rails.application.eager_load!
    counts = ApplicationRecord.descendants.map do |klass|
      [klass.name, klass.table_name, klass.count]
    rescue Mysql2::Error, ActiveRecord::StatementInvalid # if certain tables aren't present in certain environments
      nil
    end
    

    From there, you can export to CSV:

    Rails.application.eager_load!
    
    counts = ApplicationRecord.descendants.map do |klass|
      [klass.name, klass.table_name, klass.count]
    rescue Mysql2::Error, ActiveRecord::StatementInvalid # if certain tables aren't present in certain environments
      nil
    end
    
    output_filename = "/tmp/counts.csv"
    
    CSV.open(output_filename, "wt") do |csv|
      csv << ['class name', 'table name', 'row count']
      counts.compact.each do |row|
        csv << row
      end
    end
    

    What about from the database itself?

    The below options are *significantly* faster than going through the Rails stack, so unless you need additional information from Rails, you should refine the below options instead (especially for production)

    • MySQL:
    select table_name, sum(table_rows) from information_schema.tables where table_schema = 'app_dbname_development' group by table_name;
    
    select table_schema, 
           table_name, 
           (xpath('/row/cnt/text()', xml_count))[1]::text::int as row_count
    from (
      select table_name, table_schema, 
             query_to_xml(format('select count(*) as cnt from %I.%I', table_schema, table_name), false, true, '') as xml_count
      from information_schema.tables
      where table_schema = 'public' --<< change here for the schema you want
    ) t order by 3 DESC
    

     
c
Compose new post
j
Next post/Next comment
k
Previous post/Previous comment
r
Reply
e
Edit
o
Show/Hide comments
t
Go to top
l
Go to login
h
Show/Hide help
shift + esc
Cancel
%d bloggers like this: