Can you do magic? Well, if you use KNIME, you can!
Let me explain:
A few months ago I wrote a forum post about how I combined two passions. KNIME and roleplaying (dragons, knights, swords - not the kinky stuff). In this post, you’re going to discover how to take this idea a step further.
But here comes the usual disclaimer: This is - once again - a little bit a longer post. So I tried to provide structure with headings (for all the people who skim read).
Where do we come from?
If you remember, last time we talked about random tables and how to create “settings” with the help of KNIME. These days, real life meetings with friends to sit around a table and play games are still not possible.
But to turn a bad thing (lockdown) into something good, this also means the following. The situation helps to find people who would be willing to play online. But how do you get people together, when you need maps, dice, tokens of your heroes and so on?
Digitization to your help - you use a virtual tabletop (or VTT for short).
These tools mimic all the good stuff you have usually on paper. Rule books, character sheets (where you determine who you are and what your hero can and cannot do) and dice. Lots of dice. Like this
Problems over Problems
Besides installing a VTT you also have to get the rules. If you remember, you buy rulebooks and supplements from the publishers of the games. But it comes with a caveat. Which you may ask?
As they say: an image says more than thousand words, so here is another one:
You know what that is? All the rules and supplements I have for D&D. But more than that, it’s 1,169 (!!!) pages of content, rules, infos, images and tables.
How do I now get them into my VTT?
Well, you could do copy and paste. Let me make a prediction what you as a (poor) player might look like, when you’re done
There should be a better way and this wouldn’t be the KNIME forum if there isn’t one.
Of course we should use KNIME for it.
But there’s still another problem. The data is available digital for free. This is a so-called System Reference Document (SRD).
The one for Dungeons & Dragons is here: Systems Reference Document (SRD) | Dungeons & Dragons
But there’s more problems with that. Having the SRD online provides a structured format.
But most of the Virtual Tabletop programs do not offer an API or connection to these SRDs. Still you would need to copy and paste everything.
Digital and structure are the two key terms here.
Wouldn’t it be great if we could approach those websites? If we could get the (structured) data and enter it into something that can connect to a VTT? And all that automated?
We can because there are the fantastic Selenium Nodes by @qqilihq
But before we dive into that, let me show a short architectural drawing I did.
What the solution looks like
Very professional, isn’t it?
Let me introduce you to one more tool, before we dive into how KNIME can help.
So we found out that most SRDs do not connect to the VTTs. But there is a content management system called World Anvil. This focuses on roleplaying (https://www.worldanvil.com/ )
You can set up any structure you like and connect it to your VTT.
So the flow would be the following:
Extract data from SRD → Process in KNIME → Publish to World Anvil → Fetch content in VTT
And this is exactly the flow, where KNIME and the SE Nodes shine.
Before we go into the details of the workflow, I promised you some magic. So let’s head back to the 80s and this fantastic song by America:
Damn, the 80s were amazing. But I digress
The KNIME Workflow using the Selenium Nodes
Below you will find my workflow shared on the KNIME Hub:
I used the game of DSA (Das Schwarze Auge - The Dark Eye, a very popular roleplaying game in Germany). I tried to extract the data from their “RegelWiki” (Rules Wiki, also a SRD).
What I wanted to do here is to extract the spells from their website. Then use that in Foundry, my VTT solution that connects to World Anvil.
This page lists all the magic spells which I want to import in the first place: https://ulisses-regelwiki.de/zauberauswahl.html (German language, but if you want to look at it, you get the idea).
The workflow splits in several parts:
This first part extracts the links with details of every spell from the overview page
The second part scrapes each of these individual links. Then it extracts the HTML source data from each page.
The third part extracts the relevant properties from the downloaded HTML document. It prepares them for World Anvil after that.
Here is a snapshot of the XPath property extraction. This is core to this workflow and might be interesting.
Let’s go on with the rest of the workflow:
The fourth part logs in to World anvil and …
… finally the fifth part publishes the prepared data to World Anvil.
Once it is in our content management system, it is easy to get it into the VTT. You push a button and have it at your fingertips. That’s because both connect through an API connection.
What have I learned?
Here are some key learnings for me:
Use web browsing only when needed
My initial workflow used the Selenium Nodes to extract single HTML properties. It proved to be much more efficient to rather extract the whole page source. Then work with a combination of XPath and RegEx to get the desired “elements”. Thank you to @qqilihq for showing me this approach.
RegEx is massive
The rules wiki linked above is of quite low quality programming wise. Looking at the page source it looks like it has been hand coded all the way. Bold fonts were sometimes done using the <
strong> tag. Then in the next page it was through the use of
<span> tags with CSS. Using RegEx in a clever way, helped to extract the required info. Once again, kudos to @qqilihq for providing support here.
XPath to the help
This node seems a little bit like the swiss army knife of web data extraction and manipulation. Combine the XPath node with a little bit of web knowledge. Then think about the structure where you find your data. This makes the info accessible even from a badly coded website. And I was able to automate the extraction.
Structure your workflows
My initial workflow was a mess. It was an interesting learning to brainstorm what goes where. This helps me to understand my workflows weeks after I have last used them.
Selenium Nodes extend KNIME’s automation capabilities
These days, more and more things have web based front ends. I know of cases where SAP tasks are executed through a web frontend. This means, you no longer might need expensive automation tools to automate that work. An intelligent combination of KNIME and the Selenium Nodes can bring you very far.
At the end I was able to extract 178 spells in a little less than 10 minutes. And I pushed them to World Anvil which makes them accessible in our games.
And I have now a workflow that I could easily adapt to all the other “sub modules” of the rules set / SRD. For example …
… advantages and disadvantages
… weapons (and their hit points, prices etc.)
… beasts (and how to deal with them)
How could YOU use it?
Now I know that not everyone is that much into roleplaying like I am (although I deeply believe you should - it’s awesome!!!) but here are some other use cases that you can apply a very similar workflow to.
- A LinkedIn Extractor: extract profiles from groups. Connect with them semi-automatic. @mehrdad_bgh has done this already
- Booking goods receipt through a web based workflow tool. You would extract the relevant orders and automatically approve them
- Extract key financial data (e.g. stock prices, exchange rates etc. ) and push it to a dashboard or e. g. an online spreadsheet
I would love to go a little bit deeper into the parts of this specific workflow. So I’m planning to do a YouTube series on the workflow. I would share each episode with you KNIMErs in the responses to this post. Let me know if this is something you would like to see.
- I don’t care
- Can you show me the way to Hogwarts?
Disclaimer: Selenium Nodes, Foundry VTT and World Anvil are paid products. I do consider them to be extremely valuable for players (and data citizens ) but wanted to make sure that this is mentioned.