Do you want to run the RL constantly, or do you want to use it to tune certain control gains and leave them fixed? If the latter, you could run the RL in Python, and have the control gains as SCADA Inputs in the model. Use the Typhoon API to run the RL to define a set of gains, set the SCADA Inputs, capture relevant data, compute the metrics, and start another iteration.
If you are going this route, I would also suggest using TyphoonTest, so you can more easily write your tests and get a better report.