Find / replace individual items in an array/list/set?

roberting · April 5, 2022, 5:20pm

Hi forum,

is it possible to find and replace individual items in an array (also known as list or set),
be it in a cell or in a variable?
If there was a way to find/replace against a dictionary table - that would be extra cool?!

In the replacement nodes that I tried, array cells or variables don’t even show up to be used.

bruno29a · April 5, 2022, 6:07pm

Hi @roberting , it definitely is possible, and by a few different ways.

Do you have sample data and can you show us what you tried?

roberting · April 5, 2022, 7:17pm

Hi @bruno29a ,

this is difficult to explain, and I don’t have a workflow for that problem right now, but I have layed out the background in my previous post: compare the uniqueness of two value columns per class

I need to differentiate between product variants by their features (columns of technical data), and I am tasked to use a) as many many features as needed, but b) as few as possible.
I have implemented everything to achieve a) (see the other post mentioned, that workflow is too complex to post), but now i need to optimize and reduce.

the hint of @elsamuel in that thread brought me to the idea to use the Linear Correlation node on my columns of product features, which gives me a list of column pairs with their correlation value (see example below).
I filter this list to get the pairs with high correlation and want to use that list as a dictionary to sort out and replace “redundant” features from my variant axis (keep only one column of these that closely correlate) .

First column	Second column	Correlation Value	p	df
diameter_d1_max	diameter_d2_max	0.9875020661420031	0.0	87
diameter_d1_max	diameter_d2_min	0.9791718455122758	0.0	87
diameter_d1_min	diameter_d1_max	0.9981497249228584	0.0	158
diameter_d1_min	diameter_d2_max	0.9940460340862238	0.0	87
diameter_d1_min	diameter_d2_min	0.9892210896098003	0.0	87
diameter_d2_min	diameter_d2_max	0.9983489385188768	0.0	87
diameter_outer	diameter_inner	0.9954515453015725	0.0	28

So I thought of using that list as a dictionary on an array of my column names to say i.e.:
“If there is a value in column 'diameter_d1_max’, the I don’t need the column ‘diameter_d2_max’, because it likely doesn’t contribute to further differentiation of the product variants”

This way I would shrink the array (or list) of column names, and this list will finally become concatenated to a string that defines the product model in the receiving system; however, this is technically limited to a maximum of 5 features (or columns) per variant dimension axis.

By the way, once that works, I will have to think some mechanism to priorize which features to keep and which ones to drop in case of redundancy, but that comes later.

Thanks !

duristef · April 5, 2022, 10:32pm

@roberting Maybe I’m completely wrong, but your problem sounds to me like the opposite of an anonymization one. If I understand correctly, you are looking for the (minimum) subset of attributes which maximizes the differentiation between products. Subsets of this kind are called “quasi-identifiers”, because they can distinguish most of the items (whereas a key uniquely identifies each item). If the number of attributes (columns) is not very high you could try the Anonimity Assessment Node (part of the Redfield Privacy Nodes). It calculates Distinction and Separation for each possible combination of attributes (i.e. subset). If you have k columns, that means nearly 2^(k+1) calculations, so you’d better keep the number of columns (and rows) involved in the calculation to a minimum. If you sort the resulting table by Separation (descending) and number of attributes in the subset (ascending) you can select the subset(s) with the highest Separation and the lowest number of attributes.
Here’s the output of an AAN. The columns are filtered out of a FIFA dataset, just for the example. We can see that all the subsets shown are equally good at separating the rows (in fact, they are keys), but the first 6 are those with the minimum number of attributes.

(I apologize if I have misunderstood the problem)strong text

roberting · April 6, 2022, 8:05am

Thanks @duristef,
i will have a look at this. typically I have between 10 - 30 attribute columns, but not all of them apply to all product classes of the table. I will see how the node can cope with this.

system · July 5, 2022, 8:05am

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.