I ran something over 3000 leg movement actions (only a proof of concept, not enough to get any real skill learned in) directed by a reinforcement learning agent using a tensor-flow based SARSA agent.
Most of the code came from Gelana Tostaeva in an article she published through Medium named
Learning with Deep SARSA & OpenAI Gym.
After that 3000 steps, I learned something important. The agent was learning from it's experience, using my 4 servo spider leg, and the python code I wrote to represent the spider leg as an Open AI Gym environment. My reward code was allowing it to learn to make big dramatic gestures without actually overheating the servos. Mwahahahaaa.
(There were a bunch of non-zero values in the Q matrix when I stopped, indicating some bits of learning, I believe)
However, eventually the jerky, fast, full arc, drama of the gestures did overstress the servos and break the gears in one. I expect it might eventually learn that it has to ease up to full speed and ease back to a stop. However to save on costs, I will be futzing with the reward design a bit. Thanks Gelana!