Excel Writer Node Creats 6 Extra Tabs (and large file size)

As per @elsamuel , it’s really hard for us to answer this question without seeing what is being done…

I’m not sure what else I can share, but thanks for trying to help.

Can you explain what are you trying to achieve with the Nominal Value Filter node? Are you just trying to filter on the data?

I am trying to isolate the customer transactions marked TIRE from the large list.

This is working for 7/8 industries.

Can’t you use a rule-engine filter instead?

Also, in these cases, you should run a group by on the transaction type to see how many records you expect per type.

You can then compare the amount of data you get using the Nominal Value Filter and the rule-engine filter. I think in this case, you need a rule-engine filter

Edit: rule-engine filter, I mean rule-based row filter :slight_smile:

1 Like

How do you figure? Did you run a group by and compare the numbers?

The Excel file in 7/8 cases makes sense. I have not counted rows in the raw data and compared (I saw no need).

You should run the groupby so you can compare, that would be the best practice and the real validation. You are just lucky (or unlucky, depending on a certain point of view) that you are getting a big number for one of them that attracted your attention. You may think that 7/8 of them worked, and it could be that they do, but if I think of it the other way, 1/8 is not working, it means that there is a chance that none of them worked since they were all processed the same way.

2 Likes

I will try this, but I hoped I could rely on KNIME without having to personally system test all results. I’ll check this in the morning. Thanks.

You can definitely rely on Knime without having to personally system test all results. Knime is only doing what it’s being told to do :slight_smile:

It comes down to how it is used and what we ask it to do :slight_smile:

Try with the groupby and the rule-based row filter (for all 8 that is), and let us know how it goes.

3 Likes

I’m still struggling with this. I’ve created a simple workflow which works perfectly. In my real example, the output from the JOINER has the proper number of columns, but 50x too many rows. I’m not sure what’s causing any rows to be appended.

Hi @Bob_Nelson , sorry to hear that you are still struggling with this.

Have you tried the row filter instead of the Nominal Value filter? I am not sure why you are using the Nominal Value filter here.

And for the Joiner part, can you elaborate what you are doing?

Again, re-iterating that Knime is only doing what it is supposed to do. You may have duplicates in your data if you are getting more rows than what you expected - hence why you should do groupbys as pre-validation of data before processing.

Based on your screen shot, the Joiner seems to be doing what you are telling it to do (Left join).

Can you share this workflow with the 3 sample files, and can you explain what is the result you want?

2 Likes

Hello @Bob_Nelson ,as @bruno29a has rightly said, the results that you have, whilst unexpected, will be because Knime has done what it’s been asked, but probably either the join condition, or the input data is not quite as you intended.

If you have a LEFT OUTER join between two tables A (left) and B (right), then under a proper join condition with a row on A matching no more than one row on B, the number of rows in the resultant table can never be greater than the number of rows in A. In fact, it should be exactly the same as the number of rows in A, since the result should be all the rows in A which match exactly one row in B plus the additional unmatched rows from A.

The only way the resultant table can be larger than A is if your join condition does not cause a one-to-one match. This is either because your join condition is incorrectly stated, or because there is duplication in B resulting in more than one row in B matching to a row in A.

I would therefore run the workflow, and then hover the mouse over the black output triangles of both the data feeds leading to your Joiner and make a note of the number of rows in each input flow. I would then compare that to the rows generated by the Joiner. If the joiner is generating wildly more rows than the “left” (top) input and your condition is a LEFT OUTER join, then either your join rule is missing something, or you have duplication in the rows in the Right (lower) table.

I would then add a duplicate row filter (for testing purposes only!) to the feed from the lower (right) table, telling it to remove duplication based on all the columns you are using in your JOINER’s join condition. Rerun, and see if the rows produced are more in line with expectations. If they are, then go and inspect the “lookup” data coming from your “right” table.

As an example of what I mean, please see the attached workflow.KNIME_join_unexpected_rows.knwf (20.2 KB)

1 Like

Thank you both for your suggestions.

I found the problem. I was filtering on a string. I used STRING TO NUMBER so that I was filtering on an integer instead. Filtering on a string must have been introducing many variants, which each produced a new row.

4 Likes

This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.