Domain recalc after join?

Dear KNIMErs,

 

The Nominal Value Row Filter will only show one set of nominal columns after a JOIN node. While a domain calc node fixes this, I wonder whether it wouldn't make sense to recalc domains after a join automatically? That behaviour could be disabled manually in case of undue performance impact.

 

 

Thanks for considering it. :)

E

Interesing, if I set up Data Generator > Joiner > Nominal Value row filter, I see both domains for both Outer, left, and right joins.  Can we see an example of when this is not the case?

 

Cheers,

 

Aaron

Uh, I'll try to dig it out, hoping I can still find it in my intensely messy data work this week. :-)

 

However, I think the key to the problem was that my join (on RowIDs) created something like {string columns LHStable},{int/double columns LHS table},{string columns RHS table}. I then had issues when trying to use the Nominal Value Row Filter node on RHS columns, which would simply not appear in the node's dropdown. I though of blaming the filter node, but a domain recalc on the join results fixed its behaviour. Hence my speculation about (incomplete?) domain recalculation after JOIN. FWIW, I resorted to "tabula rasa" domain recalc, dropping all existing domains and not imposing limits on the nubmer of nominal values.

 

Hope that helps,

E

I'm beginning to suspect that the "Nominal Value Row Filter" is actually the one with domain handling issues. I just plugged it in to filter a table, aiming to reduce my count of plotted conditional box plots, and the old (filtered-out) dimensions keep appearing there - without box-plots above, of course.

 

-- E

Both nodes seem to be working as intended.  

 

One obscure gotcha in working with nominal value columns is that the unique values are only stored in the case that there are less than 60 unique values.  This particular limit is a bit arbitrary, but it prevents us from accidentally copying an entire column in the table spec which would cause some nasty performance problems. The domain calculator node is then designed to be used in cases where you want to track more than 60 nominal values for a column.  

 

I would also note in regards to your last post that the domain of the table retains it's historical values, so if you want to update the possible values, this will need to be done explicitly (again with domain calc.).

 

Does that clear things up?

 

 

 

 

 

Thanks Aaron,

 

I'm still puzzled though that some freshly-joined columns would not appear in the "Nominal Value Row Filter" node's drop-down list *at all* before re-calcing the joined table's domain - is that an intended behaviour?

 

Thanks

E

I am guessing that it didn't appear because the possible values were never stored to begin with because they exceeded the magic number of unique values.  In the test case I put together, the possible values are carried through a join as you expect.  

I see, thanks Aaron. I'll keep an eye on the "magic threshold" next time I encounter issues with that, chances are you are right. :-)

 

Cheers

E