unusual behavior

anon33357744 · May 22, 2019, 5:18pm

Hi knime Team,

i would like to know whether there is a example workflow that solves my Problem.

i have 5 Information About a Transaction. The model that i Need should find unsual behavior within this data, for example Aksu Ltd sends mobile phones to Canan ltd from Singapore to Germany.

According to historical data this Transaction in this constellation occurs 20 times.

So now Aksu Ltd sends mobile phones to Canan ltd from Singapore to Thailand or Aksu Ltd sends pens to Canan ltd from Singapore to Germany or Aksu Ltd sends mobile phones to Canan ltd from China to Germany then the outlier model should find These Transactions. This means, if the sent good or the sent Country or the Destination changes, then the outlier deetection model should be able to identify it.

Best,
Canan

Aswin · May 23, 2019, 8:24am

Not sure if it will work, but perhaps you can try the following: convert the categorical columns to numerical ones using the One-to-many node, and feed the new columns into the Hierarchical Clustering node. If all goes well the outliers should be in “lonely” branches of the tree.

anon33357744 · May 23, 2019, 12:38pm

Hi @Aswin,

can i also use the category to number node, because i dont know why bu the one to many node doesnt work.
KNIME_project.knwf (1.2 KB)

Please have a look on my workflow, i think there’s no getting away from building an isolation forest.
Do you know another way to find These outliers, Maybe something in that waym_074_weka_isolation_forest.knwf (1.9 MB) ?

But i dont know how to adapt it to my model

Best,
Canan

Aswin · May 23, 2019, 3:06pm

I don’t think Category to Number will work, because subsequent nodes will then treat the data as quantitative, but your categories are nominal. That’s why i picked the Many to One node, which creates a dummy variable for every category.

anon33357744 · May 23, 2019, 3:07pm

Do you know why i get this error message? you can see it in my workflow.

Andrew_Steel · May 23, 2019, 7:41pm

Hi @anon33357744,

the KNIME_project.knwf (1.2 KB) is empty:

Best,
Andrew

anon33357744 · May 23, 2019, 8:47pm

hi Andrew,

can i send it per mail to you, because it is too large?

Kind regards,
Canan

Andrew_Steel · May 24, 2019, 12:16pm

Hi Canan,

write the flow into dropbox or github pp. and publish the link here

Best,
Andrew

anon33357744 · May 24, 2019, 1:37pm

HI @Andrew_Steel,

I hope you can help me, i would appreciate that really really.

Thanks,
Canan

Andrew_Steel · May 24, 2019, 9:17pm

Hi @anon33357744,

I don’t know if I understood your problem correctly and if this is a possible solution for you?

Based on this topic and on your topic https://forum.knime.com/t/outlierdetection/15247/16 I understand:

Normal Transaction >= m times of Transactions(BENEFICIARY, APPLICANT, PortOfOrigin, PortOfDestination, GOODS)
Outlier < m times of Transactions(BENEFICIARY, APPLICANT, PortOfOrigin, PortOfDestination, GOODS)

The outlier calculation should find the variable m.

I started with your workflow from the dropbox link and expanded it with this nodes

Bildschirmfoto%20von%202019-05-24%2022-27-04

To got all necessary Transaction Columns I expanded the first Column Filter (Data Selection Annotation) with the Column GOOD.

The Partioning node is a copy from your node in Calculate Average/Interquartile of Amount with Numeric Outlier Technique - Annotation.

With GroupBy I counted the necessary Transaction Columns.

Bildschirmfoto%20von%202019-05-24%2022-53-13

With Joiner nodes I expanded the original data sets with this counts.

At last the Numeric Outliers and Numeric Outliers (Apply) nodes.

In the summary from Numeric Outliers we will see:

In fact the outliers lie outside the lower and upper bound but from your definition the upper bound is your variable m.
For the usual transaction the Counter value in the Treated table (Numeric Outliers *) is a missing value.

I hope this is a possible solution.
Andrew

anon33357744 · May 24, 2019, 9:55pm

Hi @Andrew_Steel,

Is it possible to upload the workflow that you have done?

So that I can see all the configurations that you applied.

Thank you

Andrew_Steel · May 25, 2019, 8:21am

Hi @anon33357744,

here is the workflow without your data.

Andrew
TradeFinanceFinal.knwf (287.2 KB)

anon33357744 · May 25, 2019, 12:03pm

Thank you very very much

anon33357744 · May 27, 2019, 11:48am

Hi @Andrew_Steel,

could you please explain me what you have done? I am afraid I don’t understand it, especially the column (Count (Applicant)).

Thank you very much.

Kind regards,
Canan

Andrew_Steel · May 27, 2019, 2:22pm

Hi @anon33357744,

this: “… for example Aksu Ltd sends mobile phones to Canan ltd from Singapore to Germany. According to historical data this Transaction in this constellation occurs 20 (m) times …” is your definition of normal transactions.

Normal Transaction >= m times of Transactions(BENEFICIARY, APPLICANT, PortOfOrigin, PortOfDestination, GOODS)

The GroupBy node counted the transaction. As Aggregation Column I used the column Applicant to counted the transactions (see the image Dialog - 0:309 - GroupBy). At the left side you will find Available columns for the count aggregation function. You can use every other column. We need only a column with counted values. The name or the column doesn’t matter.

The Numeric Outliers node used this count column to find out the outliers. But in fact, in the summary you will see, that your definition of normal transactions are the outliers in the dataset. Most of your rows are inside the Lower and Upper bound. Only 2055 rows are outside this bounds, which means, the same combinations of (BENEFICIARY, APPLICANT, PortOfOrigin, PortOfDestination, GOODS) includes 7 times or more in your datasets.

Best regards
Andrew

system · June 3, 2019, 2:22pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.