Joiner node creates empty data tables on all out-ports

LB_Knime · March 3, 2023, 1:34pm

I have a joiner that compares data in my wf to a table coming from an Excel Reader node.
The top input has ~90K rows and the bottom has ~100K rows. I have compared them in Excel and see that ~4K do not match on the key I am defining. But in Knime, the joiner has zero output.

I am not using Row Number. The three join fields are: an ID field, a start date field, and a status field. The matched pairs are the same data type. (I had an issue getting the Excel Reader to import the ID field as Int, so I changed both fields to Long.)

I had it set to output all three ports, so I can’t understand how nothing can result at all.

Thanks for any advice.

mlauber71 · March 3, 2023, 2:00pm

@LB_Knime maybe you can save an example of your workflow without resetting before save so we can see what the data look like. Or would it be possible to create a workflow that would reproduce the issue?

LB_Knime · March 3, 2023, 7:29pm

I have a simplified version of the workflow that has an Excel file that it pulls from. I saved it as a knwf file. It has the same error even though a couple of fields aren’t coming through with exactly the same issues as in the full workflow (the ID field is no longer a Double).
Do I need to upload the excel file as well as the knwf? Or does the Knime file package the data source?
I have a bit of an emergency at the moment so I won’t be able to get back here until tomorrow at the earliest.
Thanks again.

LB_Knime · March 3, 2023, 7:30pm

Sorry for the Noob question: How do I upload a file bigger than 4MB?

mlauber71 · March 3, 2023, 8:24pm

You might want to select fewer lines. But if you include the Excel file in the /data/ sub-folder that should also be OK. If you have larger files I think you can upload that to the KNIME Hub for up to 50 MB.

LB_Knime · March 6, 2023, 5:16pm

I’ve added my test WF to my public Knime Hub:
https://hub.knime.com/-/spaces/-/latest/~nQP0QrWVqCKkh2gZ/
When I created this version, the error disappeared. But, oddly, the joiner still is not behaving as expected.
I am only seeing joined rows with nothing in the left or right unmatched outputs.
So, let’s start with why there is no left or right output while I try to figure out how to reproduce the error.
I took the inputs to the joiner from the original WF and saved them to the Excel file hoping that would recreate the error and avoid having to slog through a bunch of other steps (and keep things confidential), but after we resolve this simple error, I will upload the full WF if necessary.

mlauber71 · March 6, 2023, 6:06pm

@LB_Knime is it possible that you would just have to select the options to also output the Left and Right unmatched rows?

LB_Knime · March 6, 2023, 7:25pm

My understanding (and what has appeared to work so far), is that the following option section is what controls that output. The one you highlight controls, I believe, the difference between join types, so if you check all three, for example, you get a full join, not an inner join.

LB_Knime · March 6, 2023, 7:34pm

So, now I’m looking at this and thinking I may be misunderstanding the config of the Joiner. My interpretation was: Do an inner join and then capture the unmatched left/right leftovers - like ‘on a = b and b is null’. I couldn’t really grasp how to apply the two separate option sets in the tool.

bruno29a · March 6, 2023, 8:46pm

Hi @LB_Knime it looks like you are looking for matching results only, so if there is no match, nothing will be returned.

As per @mlauber71 , if you want to see all the possible data, you need to also check the Left unmatched (Left Join) and Right unmatched (Right Join).

I think the Split applies if you select the left or/and right unmatched

LB_Knime · March 6, 2023, 8:47pm

I have adjusted my original joiner to include matching, left and right rows.
But it is getting zero rows in the matched output.
I have copied the incoming table data for the left and right inputs into an Excel sheet and used that as the source of a simple workflow. I copied the Joiner node from the original source into the new workflow. It runs correctly.
I am sure there are matching rows. I’ve tested this in Excel. 90% of the rows should match, which is what I get when I build the test WF with the Excel data source.

bruno29a · March 6, 2023, 8:51pm

Hi @LB_Knime , I wonder if it has to do with the precision of the datetime column.

Can you try to remove the datetime column from the condition to match as a test?

Also, why do you need to convert from String to Time if both sources are strings?

mlauber71 · March 6, 2023, 9:51pm

@LB_Knime maybe you check again. This is the match I get. Results are stored in the /data/ folder also and RowIDs marked:

LB_Knime · March 6, 2023, 9:58pm

@bruno29a - I converted to date-time because I assumed it would be better to work with date data in date format.
@mlauber71 - I also get it to run properly using the test version - as if something happens when I paste the WF table data into Excel and then read it in again, using the same Excel Reader config.

LB_Knime · March 6, 2023, 11:01pm

More details @bruno29a :
The dates start out as two fields: a date field and a time field, which I connected into one field to make it easier to work with them. That’s why they aren’t staying as Strings. But when I reimport the date fields, the Excel Reader makes them Strings again.
The full WF ingests data from Excel and cleans it up and then one output saves the clean results for comparison the following day. Each day it starts with new source data, cleans it, compares it to the previous day and then outputs the results of the new comparison and saves the new ‘prior day’ file for the next day’s run.
So, the data go from Excel into Knime, back to Excel, and back into Knime again the next day.
It’s very frustrating that it doesn’t stay the same between saving and retrieving.

mlauber71 · March 6, 2023, 11:13pm

@LB_Knime can’t you store the date and time in a format that would stay the same?

bruno29a · March 7, 2023, 4:34am

@LB_Knime , have you tried joining WITHOUT the datetime column as I suggested, to see if your Joiner node is working as expected?

LB_Knime · March 7, 2023, 5:54pm

This is a good test, @bruno29a . I tested the join with the date fields only and it worked.
But when I added the ID fields, I got no joins, which can’t be right.
Both ID fields are in Long Number type. One starts out as Int, but that type is not an option in the Excel Reader for the second input (which defaults it to String) - even though the values come from the same source.

@mlauber71 - Haven’t had a minute to look at the page you referenced. Will do asap.

LB_Knime · March 7, 2023, 6:03pm

I stopped using the Excel Reader to transform the data type and inserted another [String to Number] Node instead as I had done with the first input. With both ID fields coming in as INT, the Joiner works as expected. I don’t know if there’s a precision issue with the other numeric type that would have prevented any joining.
I am going to run it with real data again, now, and see if it’s working. I’ll report back here.

bruno29a · March 8, 2023, 4:27am

Hi @LB_Knime , I think it does have to do with some precision, that is why it is better to join on the same type in some cases. In terms of datetime, it might have to do with some milliseconds.

If the date time are string, you can join on them as string, and then convert later after the join if you really need them as type datetime.