Its a really good article and covers a lot of the points I have run into.
Some of the things for me that have stood out in the past from the way I saw that may or may not differ from others.
The goal was often clean efficient code, its developer and unit level focused.
That’s a bit different from a testing goal and from a testers remit so whilst testers can collaborate and contribute it can be a fools errand for it to be tester driven, a way of work forced onto developers.
It does though have side benefits of development by examples which can be discussed and explored, examples I have found whether it be behaviours in BDD or user flow charts in design to be very useful in helping understanding of what needs to be built and the value of that.
The secondary byproduct is often the automated unit level coverage it can provide.
The challenge was often in others seeing it as something else or overly focusing on the byproduct values over the specific clean code value.
Developers unit testing has never been a substitute for a pro-testers risk investigations, it creates a challenge when someone believes it does. Yes it can reduce risk and allow testers to focus more on deeper because they have awareness of some developer coverage of the basics so they do not need to repeat that.
I remain in two minds though as to whether it is training wheels for developers, use TDD to get them to appreciate and write clean efficient code, once that is the natural level is the stricter workflow still of value?
On the other hand if TDD itself does become natural to developers you can get those secondary values on every project without much extra effort.
Measurable success is tough unless your starting point is dysfunction, this applies to a lot of development processes though, even things like someone pushing shift left, its measurable value would also only be relevant if there was a dysfunction in the first place though its often through ignorance of just good practice to consider all testing both tester and developer testing as early as possible.
Still having people treating it as a testing activity is likely the main challenge due to the knock on downsides of this.