Join @Corey on Thursday January 28, 2021 at 1:30 PM - 2:30 PM UTC -6 (Chicago) which is 8:30 PM - 9:30 PM UTC +1 (Berlin).
This session will begin by introducing the concept of Reinforcement Learning, as well as some common use cases. After the high-level introduction, a more formal mathematical framework will be introduced. Finally a review and demonstration of the prior ideas to create a Tic-Tac-Toe playing AI, code-free, in the KNIME Analytics Platform.
Meet the trainer:
Corey Weisinger (@Corey) is a data scientist on the Evangelism team at KNIME. He currently focuses on signal processing and numeric prediction techniques for time series analysis. He is the author of the Alteryx to KNIME guidebook and a regular contributor to the KNIME blog.
Q&A Session
See what questions were asked at the webinar and check out the answers.
Anyone know of a good programmable traffic simulator?
A commonly referenced open source option is SUMO Eclipse SUMO - Simulation of Urban MObility I haven’t used it personally but it’s worth checking out!
How does the interpreter assign a reward for a move when it only finds out 10 moves in the future if the Agent won or lost the game?
During game play we log each board state as the game progresses and the interpreter waits until the game is completed to assign rewards
Does the KNIME WebPortal require a KNIME Server
The WebPortal is a part of KNIIME Server
Roughly how many tic-tac-toe games does the agent need to play so the policy function becomes perfect (always draw or win)?
This is a hard question and will depend a lot not just on the use case but also on our network architecture. In this example the network is likely too small to achieve perfection. A trade off to keep train time low.
What does 'r' represent there?
R in this case stands for rate, and is the dropout rate in the dropout layer. This regularization technique randomly zeroes out some neurons in the prior layer at train time and helps the network generalize.
Can the learning quantity/quality be shown?
The struggle with this is defining “quality” of the AI. One option might be to allow the tic-tac-toe agent to play against a different static network and log it’s win rate after each training iteration
Can you overfit a model?
You certainly can overfit a model here, using a higher portion of random moves, and using a more aggressive(higher) dropout rate (r) in the dropout layer are options for addressing this concern
In terms of computational power, is KNIME 100% CPU bounded?
You can use GPU as part of deep learning tools in KNIME
Wouldn't it be better to give a reward for every move of the game, perhaps in decaying fashion, instead of just one reward for win or lose?
In this use case we do assign a reward to every move, or more concisely every board state. However we don’t assign the rewards until the game is finished as our reward function is tied to how many turns away from a win or lose the board was.
Can you explain a little the select Move block?
The select move block is a component, you can ctrl + double click to open and see the nodes inside. This component takes each legal move, evaluates the expected reward (using the network) then selects the move with the highest expected value.
How do you decide how much "Drop Out Layers" you put into the Model?
10% is a common starting point, this can be adjusted higher for more aggressive regularization, or lowered to overfit more aggressively
I retrained the output network giving 500 games more with 15 iterations but no improvement was noticeable. Is that an expected behavior?
It is possible that the limit for the network architecture has been reached.
How many different tic-tac-toe boards are possible?
There are 3^9 = 19683 board states (although many of them are not possible). And even more ways a game could play out.
Do you have just one neural network for playing tic-tac-toe (for the policy function)?
There is only one network, but technically speaking the policy function is a combination of the network which evaluates expected rewards and the decision to select the move with the highest expected reward. This is represented by the entire select move component.
Why the Keras to Tensorflow block?
The Keras to tensorflow node isn’t strictly required however since I am executing the network in a loop I’ve chosen to transfer the network to Tensorflow and execute it in Java for a slight speed increase.
When you are not playing a game, could you please explain again how you chose the reward for a move when there are many moves in the future before you know if the move was any good?
The network involved in the Agent’s policy function is trained to predict the reward we will later assign. So the policy function is to make the move with the highest predicted reward. Later we assign a true reward with our reward function and update the network to better predict said reward.