Getting Aggregate Statistics from Individual Realm Records

I’m trying to get aggregate data for individual Realm records so that it can be graphed on a chart.

I have a ‘WorkDay’ Realm Object which stores values like ‘startTime’, ‘endTime’, ‘singleTimeIncome’, ‘penaltyIncome’, ‘totalIncome’, etc.

A charting example is that I want to collect the ‘totalIncome’ of all Work Days for an entire year and graph that against the ‘totalIncome’ for all Work Days of previous years.

My current approach is to cycle all WorkDay Realm Objects for a given year with a ‘for loop’ and store the values from WorkDay in a custom data struct. I move the values from the Realm WorkDay Object to an array in the data struct. The reason I want to add them into an array as it will then allow me to calculate things like ‘averageTotalIncome’ for that year as well as getting the ‘totalIncome’ for the entire year by running ‘reduce(0, +)’ on the array.

The custom data struct looks similar to this:

import Foundation
import SwiftDate

struct TimePeriodData {
    
    //MARK: - Time Period
    
    var timePeriod: TimePeriod = .week
    var date: Date = Date()
    var localDate: Date = Date()
    
    //MARK: - Work Day Times
    
    var startTimeArray: [Date] = []
    var endTimeArray: [Date] = []
    
    //MARK: - Work Day Hours
    
    var hoursSingleTimeArray: [Double] = []
    var hoursPenaltyArray: [Double] = []
    var hoursTotalArray: [Double] = []
    
    //MARK: - Work Day Income
    
    var singleTimeIncomeArray: [Double] = []
    var penaltyIncomeArray: [Double] = []
    var totalIncomeArray: [Double] = []
}

I then collect all Work Days from the Realm, the function looks like this:

private func updateWorkDays() {
    // Get Time Period Dates
    let start = timePeriodStart()
    let end = timePeriodEnd()
    
    // Get Current User Work Days for Time Period
    let allUserWorkDays = realm.objects(WorkDay.self).where { workDay in
        workDay.date.contains(start ... end)
    }
    workDays = allUserWorkDays
}

Once I have the Work Days I then move the data into ‘TimePeriodData’ which will be used by SwiftUI to generate charts, etc. The code look something like this:

private func getData(for inputDate: Date, from inputWorkDays: Results<WorkDay>) -> TimePeriodData {
    var data = TimePeriodData()
    for workDay in inputWorkDays {
        // Time Period
        data.timePeriod = self.timePeriod
        data.date = inputDate
        data.localDate = workDayListBrain.getLocalDate(from: inputDate)
        
        // Work Day Times
        data.startTimeArray.append(workDay.startTime)
        data.endTimeArray.append(workDay.endTime)
        
        // Work Day Hours
        data.hoursSingleTimeArray.append(workDay.hoursSingleTime)
        data.hoursPenaltyArray.append(workDay.hoursPenalty)
        data.hoursTotalArray.append(workDay.hoursTotal)
        
        // Work Day Income
        data.singleTimeIncomeArray.append(workDay.singleTimeIncome)
        data.penaltyIncomeArray.append(workDay.penaltyIncome)

    }
    return data
}

When ‘getData’ returns the UI is updated. However what I am finding is that when I use ‘getData’ for an entire year I need to cycle 100 Work Day Objects and this takes around 2 Seconds. I know I will need to create ‘TimePeriodData’ for previous years as well so there is other data to graph against, assuming each of which will take 2 Seconds if they have 100 Work Days. This adds up quickly.

I’ve read other posts where people cycle 50,000 Realm Objects and it takes 0.1 Seconds. Given I am only cycling 100 and it takes two seconds, am I doing something fundamentally wrong here? Is there a better way to approach this?

What I have tried to date is using a map function instead of a ‘for loop’, this looks a bit like this:

    data.startTimeArray = inputWorkDays.compactMap { workDay in workDay.startTime } as [Date]
    data.endTimeArray = inputWorkDays.compactMap { workDay in workDay.endTime } as [Date]

But the ‘compactMap’ made no notable gains in speed. I also tried doing it all on a BG Thread, this made things ‘feel’ speedier as the UI was never stalled while processing data but it still took 2 Seconds to update.

Any help or advice you could offer would be very much appreciated. Thanks!

The question needs a bit of clarity; we probably need see what the WorkDay model looks like as that’s what will determine the queries.

Also, more complete and brief code will help - for example, we don’t know what workDays is or where it’s used as it doesn’t appear elsewhere in the question

workDays = allUserWorkDays

Part of this loop is questionable;

var data = TimePeriodData()
for workDay in inputWorkDays {
   data.timePeriod = self.timePeriod
   data.date = inputDate
   data.localDate = workDayListBrain.getLocalDate(from: inputDate)

The data var is instantiated once before the loop, but then the loop assigns some of the same properties over and over. e.g. if 10/23/2022 is passed in as input date and there are 365 input work days, data.date will be assigned 10/23/2022 365 times. The only changing values are the startTime, endTimeI etc. Is that intentional?

Lastly for those vars within the loop, workDay.hoursSingleTime for example. The loop iterates over all of the values to store them in an array (based on the array name) and that array appears to be used to get the totals for each property? (I could be way off on that guess)

If that’s the case, let Realm do the heavy lifting and return a total instead of iterating, maybe something like this?

let sum: Double = realm.objects(WorkDay.self).sum(ofProperty: "hoursSingleTime")

so then the iteration in the call is removed and getData becomes this

private var getData(inputDate: date, startDate: Date, endDate: Date) - > {
    var data = TimePeriodData()

    data.timePeriod = //use startDate and endDate
    data.date = inputDate
    data.localDate = workDayListBrain.getLocalDate(from: inputDate)
    
    data.singleTimeTotal: Double = realm.objects(WorkDay.self)
       .where { $0.startDate >= startDate && $0.endDate <= endDate }.sum(ofProperty: "hoursSingleTime")
        
   data.penaltyIncomeTotal: Double = realm.objects(WorkDay.self)
       .where { $0.startDate >= startDate && $0.endDate <= endDate }.sum(ofProperty: "penaltyIncome")
    
    //... the rest of the totals
    return data
}

If my guess is not correct, can you add a few more data points (per above) for clarity and we’ll take a look?

Hey Jay,

Thank you for taking the time to write back.

The WorkDay Model looks a bit like this:

    class WorkDay: Object, ObjectKeyIdentifiable {
        
        //MARK: - Realm Properties
        
        // Object Tracking - Used to Track Work Days in Database
        @objc dynamic var id = NSUUID().uuidString
        @objc dynamic var createdAt = Date()
        
        override static func primaryKey() -> String? {
            return "id"
        }
        
        // Realm Relationship
        var parentJob = LinkingObjects(fromType: Job.self, property: "workDays")
        
        // User Inputs
        @objc dynamic var date = Date()
        
        @objc dynamic var clockOn = Date()
        @objc dynamic var clockOff = Date()
        
        // Hours Worked
        @objc dynamic var hoursSingleTime: Double = 0
        @objc dynamic var hoursPenalty: Double = 0
        @objc dynamic var hoursTotal: Double = 0
        
        // Time Strings
        @objc dynamic var timesSingleTime: String = ""
        @objc dynamic var timesPenalty: String = ""
        @objc dynamic var timesTotal: String = ""
}

When I reference ‘workDays = allUserWorkDays’, ‘workDays’ is a local variable as follows, alongside other local variables:

@Published var workDays: Results<WorkDay>?
@Published var timePeriod: TimePeriod = .week
@Published var date: Date = Date()

And self.timePeriod is an enum as follows:

enum TimePeriod: String {
    case week = "week"
    case month = "month"
    case quarter = "quarter"
    case year = "year"
}

When the VC appears it is assigned the Current Date via Date(). The user can then browse via Week, Month, Quarter and Year, each of which correspond to the TimePeriod. When the TimePeriod is set, .week is default, the system gets the start/end of the time period which is detailed above by timePeriodStart() and timePeriodEnd(), in this example the first day of the week and the last day of the week. Then the Realm returns all WorkDay Objects that appear within that TimePeriod.

Thank you for pointing out that issue with the timePeriod, date and localDate at the start of the loop, this was a mistake on my part.

My reasoning for putting all of the ‘workDay.hoursSingleTime’ into an array was so that I could derive a number of data points from it. One would be the total that you point out, another would be to get the average and the median. My thinking was that I’d need to sort the array to get the median.

But what you are suggesting by getting the Realm to calculate the values instead which should be much faster. From what I understand Realm provides avg, count, max, min and sum in terms of calculating aggregates? So I’d be able to do pretty much everything I aim for except median via the Realm directly?

The only other thing I would note is that some values in the WorkDay Object are optionals and some are calculated via computed properties.

For example there is a property as follows:

@objc dynamic var driveAway: Date? = nil

Sometimes the user would record this, other times they would not. If I used avg on this via the Realm would all values that are nil be ignored? Or would the better approach be to filter the Results? for ‘driveAway’ values that aren’t nil prior to getting the avg?

In regards to the computed properties, they are based on values stored in the Realm but aren’t useable in the same way so I think for those values the only way to aggregate the data would be via running a for loop, do you agree?

No big deal. Easily handled with a filter (all the code below is a bit verbose for readability)

//average driveAway taking nils into account
let results = realm.objects(WorkDay.self).where { $0.driveAway != nil }
let count = results.count
let countDouble = Double(count)
let total: Double = results.sum(ofProperty: "driveAway")
print("Average: \(total / countDouble)")

You can do that too - no iteration necessary! Here’s the median of the hoursSingleTime

//find median of hoursSingleTime
let allResults = realm.objects(WorkDay.self).sorted(byKeyPath: "hoursSingleTime")
let allCount = allResults.count
let countIsEvenNum = allCount.isMultiple(of: 2) //even or odd number

if countIsEvenNum == true {
    print("even num") //leaving this for your homework :-)
    //get the objects values above and below the 'middle' index, add together and divide by 2
} else {
    let middleIndex = allCount / 2
    let medianWork = allResults[middleIndex]
    print("Median is: \(medianWork.hoursSingleTime)")
}

Hey Jay,

Thank you again for the detailed reply, very much appreciated.

After your previous reply, nothing was noted flagging anything I was doing as wildly wrong and that should be causing such a delay in my loop and processing times. So I started digging deeper through my code to see if I could identify a problem.

As I mentioned I had a bunch of computed properties in the WorkDay Realm Object, one of them is to return the total mileage, the code looks as follows:

var totalMileage: Double {
    var total: Double = 0
    for mileage in mileageLogged {
        total += mileage.unitsTravelled
    }
    return total
}

So I thought I’d looked for any ‘for loops’ in my computed properties and change them to use Realm Methods instead. When refactored to use the built in Realm Methods that you described the code looked as follows:

var totalMileage: Double {
    return mileageLogged.sum(ofProperty: "unitsTravelled")
}

I then setup some more accurate ways to benchmark the changes using DispatchTime.now() and comparing it when the function has finished.

I changed 10-15 Computed Properties and ran the benchmarks. What I found is that the function performed noticeably slower when using the Realm Methods. Below are my times between using a for loop and the Realm Methods.

Normal - For Loops
Diff: nanoseconds(967731375) | Secs: 0.9677313750000001
Diff: nanoseconds(871276166) | Secs: 0.871276166

Computed Property Optimisations
Diff: nanoseconds(1020237792) | Secs: 1.020237792
Diff: nanoseconds(921382750) | Secs: 0.92138275

Given this I reverted my code to the for loops and then it hit me, a computed property is computing. So I went down the rabbit hole of completely minimising the use of computed properties so things weren’t called twice and thus computations weren’t unnecessarily repeated. This turned out to be the crux of my problem. After a days of optimising this is where I landed:

With Augment Function + Super + Optimise Income + BG Thread
Diff: nanoseconds(160863416) | Secs: 0.160863416
Diff: nanoseconds(134789875) | Secs: 0.134789875

This was around 6x faster and made everything feel great again. Normally the computed properties are used to get info for 1x WorkDay Object at a time, this ensures calculations presented to the user are always up to date, but when running those computations for 100+ WorkDay Objects they are very much inefficient.

Thank you very much for providing those code examples above, they make sense and are easy to read. I find the font used on the Realm Documentation quite difficult to read so it’s nice to see it here in a familiar code format.

And yes, I appreciate you slotting in the homework about median. I actually use median in a different app and my code is as follows:

var medianHours: Double {
    let sortedHoursTotal = hoursTotalArray.sorted { $0 < $1 }
    if sortedHoursTotal.count % 2 == 0 {
        return Double((sortedHoursTotal[(sortedHoursTotal.count / 2)] + sortedHoursTotal[(sortedHoursTotal.count / 2) - 1])) / 2
    } else {
        return Double(sortedHoursTotal[(sortedHoursTotal.count - 1) / 2])
    }
}

Excellent! Sounds like you are well on your way.

One thing to note:

Realm is very memory friendly - as long as code uses Realm based calls, all of the objects are lazily loaded with very low memory impact. However, as soon as Realm objects are moved to Arrays or high-level Swift functions are used, that lazy-loading-ness goes out the window and all objects are loaded into memory.

This can not only bog down an app, but can actually overwhelm the devices memory.

It doesn’t sound like this will be an issue based on your dataset but just be aware to stick with Realm functions as much as possible; List vs Array for example, and .where vs .filter. Imagine a Person with a List of Dog objects:

let dogList = person.dogList //lazy-loading niceness
let foundDogList = dogList.where { $0.name == "Spot" } //memory friendly

let dogArray = Array(dogList) //gobble up memory
let foundDogArray = dogArray.filter { $0.name == "Spot" } //gobble up more memory

I mention it because of the Swift .sorted vs Realms .sorted(byKeyPath: in your code. Just a thought!