Insider Threat detection and Prediction

Working on CMU insider Threat datast version 6.2. Unable to find similar workflow . Dataset comprises of 8 csv files structure is as follows

File Name Headers
decoy_file decoy_file, pc
device.csv id, date, user, pc, file_tree, activity
email.csv id, date, user, pc, to, cc, bcc, from, activity, size, attachment, content
file.csv id, date, user, pc, filename, activity, to removable media, from removable media content
logon.csv id, date, user, PC, activity
LDAP.csv employee name, user_id, email, role, projects, business_unit, functional_unit, department, team, supervisor
http.csv id, date, user, pc, url, activity, content
pschrometic.csv employe_name, user_id, O, C, E, A, N
Request help in formulating the workflow for prediction of a malicious user

Hey @ashokkumar21,

Generally speaking, you can segment your workflow into sections.

  1. Data input
  2. Process Data
  3. Train/Prediction

You mention the use of multiple files, and that can complicate your initial step, but for a purely threat prediciton; I would say you only need a couple of those like logon, file, email, http, LDAP. Basically anything involved with user activity as you would like to find any outliers in that. You can use the CSV reader for this.

You would then want to use the Joiner node and join on ‘id’ as that seems to be the data point linking the tables together. You could aggregate different things if you group by id and date like for example:

  • count(logon) – amount of logins each date
  • sum(email.size) – typical email size the user sends
  • count(email) – typical emails the user sends in a day
  • etc.

These are the features you want to be looking for and you can feed this into a couple different models to test which performs well, if you are not sure I would point you to using the AutoML component.

TL

2 Likes

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.