You can do magic!

kowisoft · May 18, 2021, 6:57pm

Dear KNIMErs,

Can you do magic? Well, if you use KNIME, you can!

Let me explain:

A few months ago I wrote a forum post about how I combined two passions. KNIME and roleplaying (dragons, knights, swords - not the ■■■■■ stuff). In this post, you’re going to discover how to take this idea a step further.

But here comes the usual disclaimer: This is - once again - a little bit a longer post. So I tried to provide structure with headings (for all the people who skim read).

Where do we come from?

If you remember, last time we talked about random tables and how to create “settings” with the help of KNIME. These days, real life meetings with friends to sit around a table and play games are still not possible.

But to turn a bad thing (lockdown) into something good, this also means the following. The situation helps to find people who would be willing to play online. But how do you get people together, when you need maps, dice, tokens of your heroes and so on?

Digitization to your help - you use a virtual tabletop (or VTT for short).

These tools mimic all the good stuff you have usually on paper. Rule books, character sheets (where you determine who you are and what your hero can and cannot do) and dice. Lots of dice. Like this

Problems over Problems

Besides installing a VTT you also have to get the rules. If you remember, you buy rulebooks and supplements from the publishers of the games. But it comes with a caveat. Which you may ask?

As they say: an image says more than thousand words, so here is another one:

You know what that is? All the rules and supplements I have for D&D. But more than that, it’s 1,169 (!!!) pages of content, rules, infos, images and tables.

How do I now get them into my VTT?

Well, you could do copy and paste. Let me make a prediction what you as a (poor) player might look like, when you’re done

There should be a better way and this wouldn’t be the KNIME forum if there isn’t one.

Of course we should use KNIME for it.

But there’s still another problem. The data is available digital for free. This is a so-called System Reference Document (SRD).

The one for Dungeons & Dragons is here: Systems Reference Document | Dungeons & Dragons

But there’s more problems with that. Having the SRD online provides a structured format.

But most of the Virtual Tabletop programs do not offer an API or connection to these SRDs. Still you would need to copy and paste everything.

Digital and structure are the two key terms here.

Wouldn’t it be great if we could approach those websites? If we could get the (structured) data and enter it into something that can connect to a VTT? And all that automated?

We can because there are the fantastic Selenium Nodes by @qqilihq

But before we dive into that, let me show a short architectural drawing I did.

What the solution looks like

Very professional, isn’t it?

Let me introduce you to one more tool, before we dive into how KNIME can help.

So we found out that most SRDs do not connect to the VTTs. But there is a content management system called World Anvil. This focuses on roleplaying (https://www.worldanvil.com/ )

You can set up any structure you like and connect it to your VTT.

So the flow would be the following:

Extract data from SRD → Process in KNIME → Publish to World Anvil → Fetch content in VTT

And this is exactly the flow, where KNIME and the SE Nodes shine.

Before we go into the details of the workflow, I promised you some magic. So let’s head back to the 80s and this fantastic song by America:

Damn, the 80s were amazing. But I digress

The KNIME Workflow using the Selenium Nodes

Below you will find my workflow shared on the KNIME Hub:

I used the game of DSA (Das Schwarze Auge - The Dark Eye, a very popular roleplaying game in Germany). I tried to extract the data from their “RegelWiki” (Rules Wiki, also a SRD).

What I wanted to do here is to extract the spells from their website. Then use that in Foundry, my VTT solution that connects to World Anvil.

This page lists all the magic spells which I want to import in the first place: Zauberauswahl - DSA Regel Wiki (German language, but if you want to look at it, you get the idea).

The workflow splits in several parts:

This first part extracts the links with details of every spell from the overview page

The second part scrapes each of these individual links. Then it extracts the HTML source data from each page.

The third part extracts the relevant properties from the downloaded HTML document. It prepares them for World Anvil after that.

Here is a snapshot of the XPath property extraction. This is core to this workflow and might be interesting.

Let’s go on with the rest of the workflow:

The fourth part logs in to World anvil and …

… finally the fifth part publishes the prepared data to World Anvil.

Once it is in our content management system, it is easy to get it into the VTT. You push a button and have it at your fingertips. That’s because both connect through an API connection.

What have I learned?

Here are some key learnings for me:

Use web browsing only when needed

My initial workflow used the Selenium Nodes to extract single HTML properties. It proved to be much more efficient to rather extract the whole page source. Then work with a combination of XPath and RegEx to get the desired “elements”. Thank you to @qqilihq for showing me this approach.

RegEx is massive

The rules wiki linked above is of quite low quality programming wise. Looking at the page source it looks like it has been hand coded all the way. Bold fonts were sometimes done using the <strong> tag. Then in the next page it was through the use of <span> tags with CSS. Using RegEx in a clever way, helped to extract the required info. Once again, kudos to @qqilihq for providing support here.

XPath to the help

This node seems a little bit like the swiss army knife of web data extraction and manipulation. Combine the XPath node with a little bit of web knowledge. Then think about the structure where you find your data. This makes the info accessible even from a badly coded website. And I was able to automate the extraction.

Structure your workflows

My initial workflow was a mess. It was an interesting learning to brainstorm what goes where. This helps me to understand my workflows weeks after I have last used them.

Selenium Nodes extend KNIME’s automation capabilities

These days, more and more things have web based front ends. I know of cases where SAP tasks are executed through a web frontend. This means, you no longer might need expensive automation tools to automate that work. An intelligent combination of KNIME and the Selenium Nodes can bring you very far.

At the end I was able to extract 178 spells in a little less than 10 minutes. And I pushed them to World Anvil which makes them accessible in our games.

And I have now a workflow that I could easily adapt to all the other “sub modules” of the rules set / SRD. For example …

… advantages and disadvantages
… weapons (and their hit points, prices etc.)
… beasts (and how to deal with them)

How could YOU use it?

Now I know that not everyone is that much into roleplaying like I am (although I deeply believe you should - it’s awesome!!!) but here are some other use cases that you can apply a very similar workflow to.

A LinkedIn Extractor: extract profiles from groups. Connect with them semi-automatic. @mehrdad_bgh has done this already
Booking goods receipt through a web based workflow tool. You would extract the relevant orders and automatically approve them
Extract key financial data (e.g. stock prices, exchange rates etc. ) and push it to a dashboard or e. g. an online spreadsheet

What’s next?

I would love to go a little bit deeper into the parts of this specific workflow. So I’m planning to do a YouTube series on the workflow. I would share each episode with you KNIMErs in the responses to this post. Let me know if this is something you would like to see.

Should I do a YouTube series on the details of this workflow?

yes
no
I don’t care
Can you show me the way to Hogwarts?

0 voters

Disclaimer: Selenium Nodes, Foundry VTT and World Anvil are paid products. I do consider them to be extremely valuable for players (and data citizens ) but wanted to make sure that this is mentioned.

qqilihq · May 29, 2021, 9:48am

@kowisoft

I had deliberately kept the reading of this post for a weekend noon with enough dedicated time and been looking forward to it the entire week IT IS SO COOL to have a non-everyday use case of the Selenium Nodes here!! I especially like your “Lessons learned” section, and I think this is extremely valuable advice for all magicians, novices and experts.

Thanks for sharing this, and I really enjoyed seeing the iterative progress and giving feedback on the way. I would really be delighted to see a YouTube series (if your time permits )

Have a good weekend – and I’m looking forward to the next adventures!

– Philipp

kowisoft · July 7, 2021, 8:56pm

Hi my dear Ladies and Gentleman of the honorable KNIME order,

as requested (at least by some ), here’s a video walkthrough to the first part of the workflow (more to come).

After having some beers with a friend I came up with a wonderful theme, so please allow me to introduce you to

The KNIME Knight in Limelight

Any feedback is highly appreciated (and don’t forget to subscribe on YouTube by clicking here because you don’t want to miss the future parts of this series)

danielesser · July 12, 2021, 2:40pm

Ha! Just found this one by accident! That is really hilarious! Thanks Phil for sharing this with the community. Looking forward to see the next episode!

By the way… what is your hardware and software tooling for producing these videos?

Best regards,
Daniel

kowisoft · July 12, 2021, 2:56pm

Thank you for the kind words, @danielesser

Secretly maybe I always wanted to be on some kind of stage so this might be my secret way of following that dream

Software wise I use mainly Camtasia Studio which is really easy for editing videos (beyond “just” screen recording)

Hardware is a standard desktop PC (Ryzen 5 3600 CPU, 32GB RAM, SSD, Logitech Brio Cam and AUNA 200 condensator mic + plus a rollable greenscreen)

kowisoft · October 2, 2021, 9:58am

Dear fellow data science adventurers,

HE IS BACK!

See episode two of “the KNIME Knight in Limelight”

In this episode, we cover the loop, that scrapes the data from the web.

kowisoft · December 22, 2021, 7:33am

Like a little Christmas Break, our friend - the KNIME Knight in Limelight - is back with a young apprentice. This video tries to explain the specialities about this forum post (Are you crazy? Show your weirdest use case - here’s mine) in a little fun and nerdy way.

Hope you like it.

Happy holidays and stay safe and healthy, dear KNIMErs!!!

system · April 21, 2023, 9:38pm

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.