I’ve actually currently got something similar happening at my current workplace🫣
We run a cron job once a week to backup various files, clean the cache and restart the physical box that run our nightly builds.
When we requested this it was specified to happen at a time that wouldn’t interfere with the daily runs (midnight Sunday). On weekdays our test builds start at 7pm, on weekends because no one is working we start them earlier (10 am incase the Friday night ones ran signficantly longer)
Unfortunately for some reason it was set up to restart at 4pm on a Saturday which usually ends up being while our second test build for the day is still going.
For context our tests take a fair bit of time as they are entire e2e regression suites run against develop branch using Detox. This is done 2x, once for android and once for iOS. We are limited to running 1x bamboo plan at a time (additional cost) so they dont run concurrently.
On average they take ~2.5 - 5hrs per run to completly build, run and cleanup. While I consider this to be too long, this is down from the ~13hrs for 1x run that was happening earlier in the year when I first started with the company.
As I mentioned, they cant be run concurrently. Which is the same for any other CI processes we run using that box, such as linting, synk checks or building the app and pushing to google play/test flight etc. This is why we dont start the tests any earlier on weekdays because it can end up blocking development checks etc.
But it also means if any of those other builds are triggered late friday afternoon (say by a whole pile of PR’s merged to develop) then they may be still running at 7pm when our tests trigger. Which mean our tests are queued and will start once the box is available. You can see how it means the box restarting at 4pm on Saturday now becomes an issue.
The secondary issues with it, is that while it restarts, it doesn’t sign back into the box. Which then causes the runs after it resarts to also fail. Until I come in on Monday and remote into the box and sign in with the general team user credentials.
The best part about that though is the failures due to the machine being logged out look different when reading through the logs and comparing android to iOS 


For one it appears to be able to execute and passes each test, its only when attempting to deal with finding the test results, logs, screenshots etc does it fail. The other doesnt appear to execute the tests. So it took a bit of troubleshooting to confirm what the problem even was to begin with 
Oh another fun one, from two days ago…
Both our test runs for a release branch failed to execute the night before a 10am release that had a bugfix late the afternoon before. Both the develop test runs worked fine. I tried to manually trigger the release ones again in the morning, wouldnt even start…
Turned out the dev had accidentally deleted the release branch when they back merged their bugfix from release to develop.
Can’t run tests overnight against a branch that doesn’t exist 

