I have a joiner that compares data in my wf to a table coming from an Excel Reader node.
The top input has ~90K rows and the bottom has ~100K rows. I have compared them in Excel and see that ~4K do not match on the key I am defining. But in Knime, the joiner has zero output.
I am not using Row Number. The three join fields are: an ID field, a start date field, and a status field. The matched pairs are the same data type. (I had an issue getting the Excel Reader to import the ID field as Int, so I changed both fields to Long.)
I had it set to output all three ports, so I can’t understand how nothing can result at all.
Here is an example of two records that should match which I’ve pasted into Notepad (and added ‘|’ manually):
@LB_Knime maybe you can save an example of your workflow without resetting before save so we can see what the data look like. Or would it be possible to create a workflow that would reproduce the issue?
I have a simplified version of the workflow that has an Excel file that it pulls from. I saved it as a knwf file. It has the same error even though a couple of fields aren’t coming through with exactly the same issues as in the full workflow (the ID field is no longer a Double).
Do I need to upload the excel file as well as the knwf? Or does the Knime file package the data source?
I have a bit of an emergency at the moment so I won’t be able to get back here until tomorrow at the earliest.
You might want to select fewer lines. But if you include the Excel file in the /data/ sub-folder that should also be OK. If you have larger files I think you can upload that to the KNIME Hub for up to 50 MB.
I’ve added my test WF to my public Knime Hub: https://hub.knime.com/-/spaces/-/latest/~nQP0QrWVqCKkh2gZ/
When I created this version, the error disappeared. But, oddly, the joiner still is not behaving as expected.
I am only seeing joined rows with nothing in the left or right unmatched outputs.
So, let’s start with why there is no left or right output while I try to figure out how to reproduce the error.
I took the inputs to the joiner from the original WF and saved them to the Excel file hoping that would recreate the error and avoid having to slog through a bunch of other steps (and keep things confidential), but after we resolve this simple error, I will upload the full WF if necessary.
My understanding (and what has appeared to work so far), is that the following option section is what controls that output. The one you highlight controls, I believe, the difference between join types, so if you check all three, for example, you get a full join, not an inner join.
So, now I’m looking at this and thinking I may be misunderstanding the config of the Joiner. My interpretation was: Do an inner join and then capture the unmatched left/right leftovers - like ‘on a = b and b is null’. I couldn’t really grasp how to apply the two separate option sets in the tool.
I have adjusted my original joiner to include matching, left and right rows.
But it is getting zero rows in the matched output.
I have copied the incoming table data for the left and right inputs into an Excel sheet and used that as the source of a simple workflow. I copied the Joiner node from the original source into the new workflow. It runs correctly.
I am sure there are matching rows. I’ve tested this in Excel. 90% of the rows should match, which is what I get when I build the test WF with the Excel data source.
@bruno29a - I converted to date-time because I assumed it would be better to work with date data in date format. @mlauber71 - I also get it to run properly using the test version - as if something happens when I paste the WF table data into Excel and then read it in again, using the same Excel Reader config.
More details @bruno29a :
The dates start out as two fields: a date field and a time field, which I connected into one field to make it easier to work with them. That’s why they aren’t staying as Strings. But when I reimport the date fields, the Excel Reader makes them Strings again.
The full WF ingests data from Excel and cleans it up and then one output saves the clean results for comparison the following day. Each day it starts with new source data, cleans it, compares it to the previous day and then outputs the results of the new comparison and saves the new ‘prior day’ file for the next day’s run.
So, the data go from Excel into Knime, back to Excel, and back into Knime again the next day.
It’s very frustrating that it doesn’t stay the same between saving and retrieving.
This is a good test, @bruno29a . I tested the join with the date fields only and it worked.
But when I added the ID fields, I got no joins, which can’t be right.
Both ID fields are in Long Number type. One starts out as Int, but that type is not an option in the Excel Reader for the second input (which defaults it to String) - even though the values come from the same source.
@mlauber71 - Haven’t had a minute to look at the page you referenced. Will do asap.
I stopped using the Excel Reader to transform the data type and inserted another [String to Number] Node instead as I had done with the first input. With both ID fields coming in as INT, the Joiner works as expected. I don’t know if there’s a precision issue with the other numeric type that would have prevented any joining.
I am going to run it with real data again, now, and see if it’s working. I’ll report back here.