unusual behavior

Hi knime Team,

i would like to know whether there is a example workflow that solves my Problem.

i have 5 Information About a Transaction. The model that i Need should find unsual behavior within this data, for example Aksu Ltd sends mobile phones to Canan ltd from Singapore to Germany.

According to historical data this Transaction in this constellation occurs 20 times.

So now Aksu Ltd sends mobile phones to Canan ltd from Singapore to Thailand or Aksu Ltd sends pens to Canan ltd from Singapore to Germany or Aksu Ltd sends mobile phones to Canan ltd from China to Germany then the outlier model should find These Transactions. This means, if the sent good or the sent Country or the Destination changes, then the outlier deetection model should be able to identify it.

Best,
Canan

Not sure if it will work, but perhaps you can try the following: convert the categorical columns to numerical ones using the One-to-many node, and feed the new columns into the Hierarchical Clustering node. If all goes well the outliers should be in “lonely” branches of the tree.

Hi @Aswin,

can i also use the category to number node, because i dont know why bu the one to many node doesnt work.
KNIME_project.knwf (1.2 KB)

Please have a look on my workflow, i think there’s no getting away from building an isolation forest.
Do you know another way to find These outliers, Maybe something in that waym_074_weka_isolation_forest.knwf (1.9 MB) ?

But i dont know how to adapt it to my model :frowning:

Best,
Canan

I don’t think Category to Number will work, because subsequent nodes will then treat the data as quantitative, but your categories are nominal. That’s why i picked the Many to One node, which creates a dummy variable for every category.

Do you know why i get this error message? you can see it in my workflow.

Hi @anon33357744,

the KNIME_project.knwf (1.2 KB) is empty:

Best,
Andrew

1 Like

hi Andrew,

can i send it per mail to you, because it is too large?

Kind regards,
Canan

Hi Canan,

write the flow into dropbox or github pp. and publish the link here

Best,
Andrew

1 Like

HI @Andrew_Steel,

I hope you can help me, i would appreciate that really really.

Thanks,
Canan

Hi @anon33357744,

I don’t know if I understood your problem correctly and if this is a possible solution for you?

Based on this topic and on your topic https://forum.knime.com/t/outlierdetection/15247/16 I understand:

Normal Transaction >= m times of Transactions(BENEFICIARY, APPLICANT, PortOfOrigin, PortOfDestination, GOODS)
Outlier < m times of Transactions(BENEFICIARY, APPLICANT, PortOfOrigin, PortOfDestination, GOODS)

The outlier calculation should find the variable m.

I started with your workflow from the dropbox link and expanded it with this nodes

Bildschirmfoto%20von%202019-05-24%2022-27-04

To got all necessary Transaction Columns I expanded the first Column Filter (Data Selection Annotation) with the Column GOOD.

The Partioning node is a copy from your node in Calculate Average/Interquartile of Amount with Numeric Outlier Technique - Annotation.

With GroupBy I counted the necessary Transaction Columns.

Bildschirmfoto%20von%202019-05-24%2022-53-13

With Joiner nodes I expanded the original data sets with this counts.

At last the Numeric Outliers and Numeric Outliers (Apply) nodes.

In the summary from Numeric Outliers we will see:

In fact the outliers lie outside the lower and upper bound but from your definition the upper bound is your variable m.
For the usual transaction the Counter value in the Treated table (Numeric Outliers *) is a missing value.

I hope this is a possible solution.
Andrew

1 Like

Hi @Andrew_Steel,

Is it possible to upload the workflow that you have done?

So that I can see all the configurations that you applied. :slight_smile:

Thank you

Hi @anon33357744,

here is the workflow without your data.

Andrew
TradeFinanceFinal.knwf (287.2 KB)

1 Like

Thank you very very much :pray::pray::pray:

1 Like

Hi @Andrew_Steel,

could you please explain me what you have done? I am afraid I don’t understand it, especially the column (Count (Applicant)).

Thank you very much.

Kind regards,
Canan

Hi @anon33357744,

this: “… for example Aksu Ltd sends mobile phones to Canan ltd from Singapore to Germany. According to historical data this Transaction in this constellation occurs 20 (m) times …” is your definition of normal transactions.

Normal Transaction >= m times of Transactions(BENEFICIARY, APPLICANT, PortOfOrigin, PortOfDestination, GOODS)

The GroupBy node counted the transaction. As Aggregation Column I used the column Applicant to counted the transactions (see the image Dialog - 0:309 - GroupBy). At the left side you will find Available columns for the count aggregation function. You can use every other column. We need only a column with counted values. The name or the column doesn’t matter.

The Numeric Outliers node used this count column to find out the outliers. But in fact, in the summary you will see, that your definition of normal transactions are the outliers in the dataset. Most of your rows are inside the Lower and Upper bound. Only 2055 rows are outside this bounds, which means, the same combinations of (BENEFICIARY, APPLICANT, PortOfOrigin, PortOfDestination, GOODS) includes 7 times or more in your datasets.

Best regards
Andrew

1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.