One of the more interesting issues we encountered with Magento 2 production environment is the cron_schedule table growth.
A few weeks ago, one of our Magento 2 production environments was getting slow. There was constant CPU load on the server, mostly used by Mysql queries. At times, more than six duplicated cron jobs were running. If we disabled the default cron job, the server load normalised.
We started to debug cron jobs with flock. Flock creates a lock file for the lifetime of cron execution, which means that only one cron job could run at a time.
Flock creates a lock file for the lifetime of cron execution, which means that only one cron job could run at a time.
This gave us a worrying result, since one cron job execution lasted for almost 10 minutes. With a normal execution this cron job should be executed within a 1-minute time frame and then the next cron job should start.
Because one cron ran for more than 1 minute, another cron job started after this minute. Since cron jobs write completed tasks to the database only after they are executed, the second one was created with the same tasks. After another minute the third one started and then the fourth and so on until there was one cron per CPU core.
When we checked the cron_schedule table, there was more than 300.000 rows, with 99% of them with a pending status.
By default, the cron_schedule table is cleaned for each cron group separately. How often each cron group is cleaned is set in cron config definition: