I’m not that familiar with Alteryx, so a clarification question - is this basically just an indicator of how much data may or may not be classified as missing?
If so, would it be sufficient for us to add something like this to the Node Monitor?
I’m probably missing some nuance here so feel free to correct a poor assumption on my part.
The coloured bars above column name are indicating the quality of data. If you hover with the mouse on it you’ll see % of the dataset:
null values
missing values
trailing or leading whitespaces
multirow formatting
in alteryx you can inspect each node output like knime, but you’re limited in a portion of it, so these data quality checks refers only to this subset. If you want to perform a full data quality check and inspection (like knime), you have to drag in a Browse tool.
where to implement this functionality? well, not only in node monitor, but also in the data viewer when you are just analyzing the output of a node.
how can you improve this? maybe adding a more complete profiling such as distinct number of values (on categorical columns), mean, median, min, max for numerical one.
The Node monitor in Knime is good but I love the Results window in Alteryx Designer… but it’s a bit of a misnomer. There are icons on the left side of the window that let you select the port you want to see the data of (not just the outputs like in the Node monitor). I find this incredibly useful in configuring the node (tool) as I can see the incoming data. I can sort and filter to and copy and paste values if I need them… for instance when writing logic in a formula… that way there is less chance of typo etc… Also you can see the meta data and that also help when you are writing formulas so you realize you might need a type change before a certain function etc… Would love to see enhancement like this to the node Monitor.
As I dive deeper into using Knime converting over my Alteryx workflows… I would like to suggest some UI enhancements mostly around Node Monitor. My mindset is that of the person that is fairly but not intimately familiar with the dataset (like marketing data that has come from many sources and isn’t always clean). So as I build a workflow getting hints from the interface is very helpful in spotting errors.
The screenshot below shows both interfaces:
As you build and troubleshoot a complex workflow it’s really handy to see the config of each node you click on as you may have many of the same… double-clicking every time is a pain and as the window resizes it obscures other windows as you try to troubleshoot… Would love to see a dockable window with config settings… I would like use that in place of the Description window (once I’ve gotten used to config settings)… or I might keep both. Want to see both data and config at once.
Please make no data (i.e. Null) more obvious and different from the ascii character ?.. See in alteryx it’s grey text word [null]… which is easy to differentiate from ascii ?
Being able to see the data at the same time as configuring the node is really important so you avoid typos or errors. In alteryx you can select one of the incoming data tables or one of the output tables… with the icon on the side that dynamically populate based on the node
Again to avoid typos etc, you can double-click on the data and a window opens that allows you to select all or part of the text of that cell to copy and past into the config area or another app. You can also click on the column header to sort or filter the data… again this help you keep the data in mind as you configure your node to avoid errors and help with troubleshooting if errors are found. There are also visual indicators if you have nulls or blanks in your data (which can be really hard to see). If the column has any nulls there will be some red in the bar under the column name. If any of the data has leading or trailing blanks there will be a red triangle in the corner of the cell of the offending data.
Meta data can be displayed to let you know what type of data the columns have been set to… this has saved me many time where I can’t figure out why a node fails… it was because of the data type.
Handy for pulling data out for investigation externally you can copy os save the data right from this node monitor
This is more about RegEx split node… would be nice to be able to rename the output field(s) in the node. The alteryx RegEx tool allows you to define multiple groups of values and output to multiple fields and you can choose replace the value into the same column. This doesn’t seem possible in the knime node.
an example of debugging and wanting to pull out a couple rows of data to quickly test… while trying to get the RegEx syntax right to cover all problem data values I could filter the rows and them want to pull some sames of data missed my regex… and test on this site: https://regexr.com/ I want to pull some of the data to play with the regex format to see if it catches all occurrences. Just want to cut and paste… don’t want to have to export to a file and open the file to cut and paste to web site.
Knime is rally good but I was to push to make it better. Example of why having visual cues are so important. Debugging a problem with a Join… Had a list of 10 city and a long table with 100s of cites… tried to join based on city… only 3 cities matched… maybe misconfigured the Join? or Misspelling? must be a leading or trailing space… added a String manipulation (misconfigured it but no errors)… to check had to right click to open output table and then select a cell and them copy from the menu (that’s alot of steps to test somethin this simple). did this over and over until I figure out what was misconfigured (30 mins)…would have been 10 sec in Alteryx because the interface make it easy to inspect a value and highlight when there are nulls and leading and trailing spaces. It even has a tool (data cleansing) to remove such things automatically across numerous fields… perhaps this is a Component (haven’t looked).
Adding to the feature request suggestion… if the node hasn’t been executed… then the default should display the incoming data (and highlight that icon or port in the monitor)… If the node has executed then default to displaying one of the outgoing tables (should remember what was last selected)… if there isn’t an obvious one then just select top… This should never cause a node to run… as some have worried as would slow workflow development.
I’m sure you already know this… If the code used for the Node output table could be used in the Node Monitor Port Output (data display) it would take care of parts of the below asks. Also means you won’t have parallel code to maintain that essentially does the same function.
Working on any node with multiple inputs (like the Recursive Loop nodes) and it’s your first time using the node… being able to see the data at each input and output really helps you figure out what is going on. Also seeing the data on the Flow Variables would have helped alot.
I know I’m beating a dead horse… but being able to see the incoming data table while configuring the current node is really hand when working/building with Components where it’s a pain to see the incoming node and it’s data.
I would like to have a very nice and handy feature: auto wiring.
Like Alteryx, when dragging a new node near the output of another wire, the link between the two is automatically done.
Hi Luca,
KNIME has something similar. Just select the node you want to add a successor and then double click on the successor node in the node repository. This will add the successor node to the workflow and links it to the selected node.
Bye
Tobias
Fully agree with the config windows. What is annyoing is the modal aspect. Having it a separate window as now is fine as long as it isn’t modal.The double-clicking can be avoided by using shortcuts (F6)
Was confused about the comment of the missing values. I think the issue is in the normal table it’s a red ? so hence easily distinguished from the character ?. But in the node monitor it indeed is a black ? so no way to know if it’s missing or not.