We have a standalone Java application handling around 200–300 TPS. The application integrates with multiple third-party systems and databases. Currently, it runs as a wrapper-based process without any built-in monitoring or health check mechanisms.
We’re planning to design and implement an application-level health check to continuously monitor its runtime health — including key integrations, thread usage, database connectivity, and transaction queue performance. Based on this health status, we aim to either trigger self-healing actions (like restarting failed components, clearing queues, etc.) or send automated notifications to the operational team.
I’d like to get suggestions on:
-
Best practices or design patterns for building health checks in standalone Java applications.
-
Recommended metrics or indicators to monitor for overall application health.
-
Tools or libraries suitable for non-Spring applications.
-
How to integrate health check outputs with monitoring tools or alerting systems.
-
Approaches for implementing self-healing logic safely in production.
Looking forward to insights or real-world examples from those who’ve implemented similar mechanisms in high-throughput environments.