I have an extensive internal generic column name approach that I utilize for most of my clients. This has a ton of benefits on the back end, but it often requires re-mapping 100+ column names when patching into new DB structures. At this point I am using an Excel conditional dropdown approach to align input columns for generic renaming, but I would love to hear if others are using KNIME tools for friendly UI on tasks along these lines.
I would especially be interested if anyone has pulled off something like an interactive table editor along these lines. Maybe something like the generic column names being uneditable / static, but the values in the other input column can be moved to align for name re-mapping?
My current approach for illustration - In order to avoid errors in my current excel approach, I build dropdown lists that remove all of the prior selected values from the available options. I also use conditional highlighting to help visually see which fields have been remapped and which are not.
Thank you for the workflow example. While it helps with the later actions of applying header changes, I am more aiming toward a user friendly tool that would need to occur before your example workflow… Your workflow begins with a Table Creator node that is already populated with pre-matched manually entered values. The goal is to have a KNIME built tool that would allow for simple error free manual alignment of those column header names in order to generate that initial table.
I did notice that they added dropdown selection to Table Editor (although access to dropdown values is still a bit finicky), but it lacks the capability to remove already selected values from the dropdown domain which makes the task of navigating and mapping 100 plus columns without error much tougher.
I find it quite difficult to follow this abstract explanation but somehow, based on the conversation, I start to understand the context. To break down the complexity is the following correct?
In order to: Map old to new column names I want to: Enable customers to visually map, i.e. by drawing connections or drag & drop old to new column names
To be frank, the first thing I thought off was the solution you already have. Column A with the current column names and column B with a dynamically generated drop down whos options are removed if already selected by other drop downs.
Thinking out loud and playing the devils advocate … I currently ask myself what you’d gain by an overly complex solution? Maybe you would like to break down the complexity first i.e. by categorizing column types. Basically making it a multi-filter approach which might also allow you to convert column types.
It is less that I want to necessarily improve upon my current user interface approach for re-mapping, and more that I wanted to re-create it as a tool that exists 100% in the KNIME platform. While I have found ways to better incorporate user friendly Excel based controller tools (such as single run only updaters & file openers for editing), there is always complexity in falling back on a multi-platform solution when you move past a single user.
I have not found much info on building custom user friendly tools to do manual user tasks within KNIME yet. (Although the newer dropdown addition to the Table Editor is a step in the right direction) I was just hoping that someone had found a trick or workaround that I might be able to expand upon.
I totally get what you are saying. And the solution to my question does pose some difficulties with the number of columns I have and keeping track of the matches.
I don’t see this actually as a KNIME issue but more of a communication issue. I think the better long term solution is to come to an agreement with your clients (or in my case different vendors I’m purchasing from) to agree on a standard communication template. Then it can be easy to build those dictionaries and then use string matching.
Do you have historical data that you can use to show past mappings? If you had that then you could auto match them and if there was a no match you could intervene, add that new info to the dictionary and continue to build it up.
I think it could be possible if you put the selection group of nodes in a recursive loop filtering out the selected match. Though I am not aware of a way to go backwards there.
Just some thoughts, sorry for no specific solution,
Jason
Gotcha. I might first would start to reproduce in Knime (almost) exactly what you already have in Excel.
One question, though. How respectively when do you determine the new column names? Do they already exists upon the moment you want to map or do you create them during the mapping process?
I could think about some sort of process which follows these steps:
Create temp table file
Use Column Selection and String Widget
Save value combination into temp table
Read temp table and display content
Whilst this is based on a novice level of interactive UI in Knime and I am pretty enthusiastic, I could nto at this moment figure out a proper way of triggering the table write (yet). Here is a very early alpha draft:
@j_ochoada,
I keep mapping on my end for now to avoid delays and reduce reverse engineering clues. My Excel approach works alright for now. It is relatively easy to work in, and an Excel file can be easily incorporated into a new client as a starting point.
I tend to incorporate a lot of “mom and pop” industry specific management software platforms to get more granular data. They pride themselves on shooting from the hip and customizing a ton of specialty “one off” use columns per client. (Which of course could be incorporated into a wider template, but then they couldn’t charge through the nose for every customization ) Most of the heavy setup work is per platform, but there is probably about 10% mapping variability and a few tweaks per year per client.
My core internal column names are definitely predefined and need to be set in stone (and limited to single route selection) to avoid problems.
I considered trying to build a component around a looped Table Editor node with a refresh button controlling an iterative loop cycle… Upon each loop I would need to remove the selected Column Header names from Dropdown Column in the Table Editor node and then update the domain. Seems pretty pie in the sky that I could get the refresh node to run an iteration and refresh the view in a user friendly way…
In that case if you have the historical mapping then I would create a dictionary with that then use string similarity node to match and convert the headers. Then like you said most of the time it goes 100% but when not (when the difference is not zero) then you intervene and fix and update the dictionary.
Definitely nothing a string similarity test would work on. It would be impossible to overstate how crazily different and seemingly unrelated the column headers can be. One could be some obscure casket descriptor and another could be some obscure electronic component configuration. Mapping to the template requires intimate business and system knowledge. I think that systematic mapping might push into the realm of legit Sci-Fi level AI.
Oh, sorry I misunderstood. I had assumed you have seen most of the permutations come through and have them matched in the past. If that was the case then it is just a case of string matching. Agreed if your two examples you mention point to your same internal header and are always changing then the objective becomes much more difficult.
I admit I’m not entirely getting what you want to do and why you need to do it. You read data from a proprietary system that has the option for admins to define custom columns?
This configuration must be stored somewhere right? likley the database? can’t you read out the configuration from the database or god forbid if it is stored in xml/json from there and have an automated mapping?
@iCFO For the sake of a clear understanding: You envision a node/component where you could select a “StandardColumnName” (ideally coming from a table at a second input port) for every weird column name on the incoming table at the first port? Did I get that right?
Sidenote: When pasting the image I realized that I subconsciously even introduced a typo in this minimal example to better reflect the real world situation…
Here is an Interactive View driven component that I just threw together that allows for easy dropdown selection for routing with an update button to easily refresh changes and ensure that they are saved.
I would consider this a relatively user friendly solution with some minor changes.
1- The dropdown list options included only the unselected column names from the domain
2- The dropdown list order remained in the order of the original input domain instead of auto sorting.
3- The Apply and Close “options” to exit the Interactive View could be forced to a simple default without additional user interaction.
I am implementing some cool workarounds now. I should have a workable dynamic solution in the next day or so. I will share it via my public hub as soon as it’s ready!
I somehow pulled it off thanks to the new Refresh Button Widget! I definitely had to use every trick in my bag to get there. Once new column names are used, they are removed from the dropdown with the Refresh Button (duplicates assignments are also blocked for good measure). The domain sorting issue is inherent in the Table Editor, but I added the ability to categorize and filter the dropdown column names which helps a good bit. I also made it so that the Refresh Button handles the saving function so users could simply close / dismiss when they are done working.
This actually went so well that I will probably build something similar for my final step of manual Find & Replace cleanup of entry errors that slip past my formula nets! (That controller is also currently excel based)
I don’t generally mark my own solutions, but I made an exception for this one to help people find the shared component / workflow on a search.
I honestly didn’t think this kind of thing was possible without developing custom nodes. I was expecting to fail, but maybe learn a few new tricks with this thread… I doubt that I would have put this much effort towards a long shot if it were not for forum back and forth. Thanks as always!
I moved the component’s log files into their own folder in the main workflow to avoid accidental deletion in the data file.
I doubt anyone is using it actively at this point since it has been less than a day, but Prior settings can be manually copied and pasted to replace the new one in the “Header_Re-Map_Logs” folder to recover old settings.