Reinforcement learning is known to be unstable or even to diverge when a nonlinear function approximator such as a neural network is used to represent the action-value known as Q function.
A SIR model is an epidemiological model that calculates the theoretical number of people infected with a contagious disease in a closed population over time.