The Discovery
Our team was working on a release candidate for a client when we ran into a simple problem, one that could be found on any list of programming falsehoods. This is when things got interesting. In addressing what was believed to be the problem, we ended up masking the real issue.
“I’m getting an error when checking in”
Defects are normal. Truth be told, we almost look forward to submitting a release candidate and getting a laundry list of things to fix. It’s part of the process. While we do our best, inevitably there’s something we missed.
On this particular day, the error seemed innocent enough; however, reproducing it proved challenging. We couldn’t isolate an issue that would cause the app to fail on our end, much less catastrophically and consistently.
We can thank our contacts and QA team for sticking with us as we worked together to find a solution. Many theories and several hair-pulling hours later, we landed on it.
“Could it be a time difference?”
The question came completely out of the blue. We hadn’t considered time up until this point. Then again, if you throw enough theories at the wall, one is bound to stick.
“Does your device say it’s 2:53pm right now?” we asked.
It should not have come as a total surprise when our QA team member responded “2:50.” After all, anyone can set their device back 3 minutes. “Oh yeah, I remember I did that a long time ago for scheduling concerns.”
And just like that…catastrophic failure
The Problem
Our API design was twofold:
- The server would track activity any time the app makes a request using its local timestamp
- In cases where an app encounters a network error the request would be cached. On certain endpoints, the backend would accept a timestamp from the device “when the request was first created” rather than relying on the time the request was received by the backend itself.
This can only work if a developer relies on item #14 from the list of falsehoods about time (part 1).
“The server clock and the client clock will always be set to the same time.”
The hard truth? Clocks are inconsistent, time zones are fickle, and phones can’t be trusted. It’s probably not ideal to design an API that relies on its own time for tracking purposes, and expects all clients to have clocks to match. In the event that the device reported an event “too early,” the backend would return an error. We were just now learning that our app would need to tolerate such an occasion gracefully.
The mitigations
Our client was in a unique position to enforce date and time settings for their controlled release and userbase
Handle the error app side gracefully
Trust that the device’s clock is consistent with the server moving forward, and would cease reporting events happening “from the past” compared to the server
The fix
While changes to backend design can be tricky, there’s often no other remedy for flawed design. Rather than having the device send the timestamp of the relevant event in the request, we would have it send the delta of “how long ago the request occurred.” The delta would start at 0, incrementing after a request fails to send.
Pros:
The server can simply rely on its own clock for all time values, minus the incoming delta
If the app encounters network trouble or the request initially fails to send, the app can hold the request for as long as it wants while incrementing the delta
Only one device is concerned with the time on the wall
Cons:
All apps consuming the API would need to be updated
The user might still be able to adjust the date and time settings while the app is locally maintaining its delta
Falsehood about time #22: “The duration of one minute on the system clock will be pretty close to the duration of one minute on most other clocks.”
The fix for the first edge case would likely involve relying on a library to manage time independent of the device’s system clock while offline to maintain an accurate delta, one that the user would be less likely to obstruct or alter.
Ultimately, clock drift will need to be dealt with in another manner and may not be a business concern. In this case, problem solved.
Looking for more like this?
Sign up for our monthly newsletter to receive helpful articles, case studies, and stories from our team.
Why I use NextJS
December 21, 2022Is NextJS right for your next project? In this post, David discusses three core functionalities that NextJS excels at, so that you can make a well-informed decision on your project’s major framework.
Read moreMichiganLabs’ approach to product design: A strategic, problem-solving process
February 12, 2024Product design, or UX design, is a strategic problem-solving process that leads to a valuable digital product. Learn what to expect when working with product designers for your custom software.
Read moreBuild vs. buy: How to decide between custom software, off-the-shelf, or hybrid solutions
October 9, 2024Deciding whether to build custom software or buy off-the-shelf involves weighing factors like cost, flexibility, and scalability. While off-the-shelf solutions are quick and affordable, custom software offers more control and long-term adaptability. Sometimes, a hybrid approach combining both options can be the most effective for a business’s unique needs.
Read more