Java Snippet Help

Hi Everyone,

I am trying to do the following and I think the way to do it is a loop with a java snippet, but played around a few times and have not gotten it to work. Also surprised there is not more knime tutorials on shopify data but thats another topic…

What I am trying to do for a given customer in column C is the following:

-Look at the “Customer #” then sort by the “order day” (so the first order is listed first)
-For that “order_id” check if “customer_type” is “First-time”. If it is then append a new column “Order #” starting at 1 for all rows with that “order_id”. So in the data Customer # 1 would have all 1s in rows 2-5 since it was their first order. If “customer_type” is “Returning” for their first order I want to put an identifier somehow that their first order is outside of my date range for the data pull and I do not know what actual Order # this is for them. So maybe like an r-1 in “Order #” so I know they are returning and 1st order in my data set.
-Then I want to look at the next order ID for the given customer have that one labeled as 2 or r-2 if the 1st order was returning.

Consecutively I want this to continue for all orders for a given customer. So I can then see how many times they ordered, average time between orders, if the first order had a given product in it what did they buy after that, if their first order was small did they spend more on the 2nd or 3rd based on first purchase product etc

Test Data.xlsx (15.6 KB)

This is how I started the Java Snippet but running into issues one making a string that increases the numerical value and two updating when it looks to the second order that it doesnt default to the else if that would be a returning 1st order.

I did a lag column previously to get the order IDs to see when they changed and be able to compare but maybe there is a better way.

Hi @BSFL89 , I must confess that I haven’t quite managed to follow exactly what you are trying to do but that may be because I’m not at my computer at the moment and am just reading this on my mobile.

I wonder if you could give a sample of the output for a given customer as it is hard to visualise just from a long description.

In terms of the Java Snippet, whilst it really has direct access to only a single row of data, it can be made to work ‘cumulatively’ and as it processes each row, it can be made to ‘remember’ information that has been acquired from previous rows which I think is possibly what you are trying to do here.

Maybe give this post a read which gives some examples of this feature of Java snippets and maybe something here can be adapted to what you need, or may give some pointers…

3 Likes

@BSFL89 in addition to what @takbb has said. The Column Expression node now allows for Multi-row access. This might also be worth exploring.

1 Like

I did pull that thread and was playing with the examples to try to get my stuff to work which was where I started initially. Updated the data set with the desired results - see callouts below:

Customer 7 - their first order in the range I pulled was a returning order, so I don’t know if its their 2nd or 8th order so I just marked it as r-1 so I can analyze those separately
Customer 8 - their first order on 10/25 was a first order so I labeled that one order number 1. Then they made a second order on 12-17 so that one was labeled 2. If they made a third that would be 3 and so on.

If on customer 8 their first order was a returning and not first time, it would be r-1 then the second order would be r-2. I just filtered the data to 100 rows but there is over 100k rows in the initial data set.

Test Data.xlsx (15.9 KB)

I will play around with this when I get out of the my morning meetings, thank you!!

1 Like

Hi @BSFL89, the thing with using the Java Snippet in this way is that you need to ensure that at the end of the snippet you store away all values that you want to know about on the next invocation (ie. the next record). The snippet remembers nothing from one record to the next unless the values are stored in the “Custom Variables” section.

If you don’t do that there is no way to keep track of what is happening. So you need to have variables that tell you things like who was the last customer I processed, and what was the order number, and what sequence of orders for this customer did we get up to?

You also need to make sure you re-initialise those values at the correct time (e.g. when the customer changes).

If you do those things it can work well, but if you don’t it won’t work the way you want it.

I generally advocate the use of java snippets where:
(1) the processing requires “cumulative” information to be calculated on each row, or
(2) where the algorithm means that use of standard nodes becomes so complicated that writing code is arguably the better solution.

In this case I have not attempted to find a solution using standard nodes, which is something I would normally do. This is primarly because you are asking for help with a java snippet so I’ll assist. It doesn’t mean though that there isn’t a better non-java solution available, such as in the direction that @mlauber71 has pointed, or others may also guide you to.

However it does feel to me that you have “cumulative” calculations in your query (i.e. you are wanting to retain information calculated across multiple previous rows). So in this case I’m willing to accept that the use of the java snippet meets my own “entry-criteria” :wink:

I have put together a snippet to give you the framework. It loosely does what you want but I’m still not 100% certain I understand exactly what that is :-). However please take a look and see how it works. Hopefully if there are any corrections required they should be relatively straightforward but feel free to ask further if it is unclear.

Most importantly, please allow for the fact I may have unintentionally introduced bugs in the code, and it is only very basically tested against my own understanding (the primary reason that I don’t advocate coding in my own case these days unless I have to, as I’m only human!) :wink:

java snippet

// Your custom variables:

/* variables defined here carry their value across to next iteration */
int lastCustomerNo=0;
int currentCustomerLineNo=0;
int currentCustomerOrderSeq=0;
boolean isCustOutsideRange=false;
long lastOrderForCustomer=0;


// expression start
    public void snippet() throws TypeException, ColumnException, Abort {
// Enter your code here:

/* variables defined here are for this iteration only, and reset on next iteration */
boolean isFirstLineForCustomer=false;

// customer is either returning or not returning (i.e. First Time)
boolean isReturningCustomer=c_customer_type.equals("Returning");

// initialise things based on whether this is a new customer or not
if (c_Customer!=lastCustomerNo)
{
	// change of customer means this line is first for this customer
	isFirstLineForCustomer=true;
	currentCustomerLineNo=1;   // reset line no for customer
	currentCustomerOrderSeq=1; // used to number the orders
	lastOrderForCustomer=0;     // reset last order number
}
else
{
	// same customer as last time
	isFirstLineForCustomer=false;
	currentCustomerLineNo+=1;  // increment line no for customer
	if (c_order_id!=lastOrderForCustomer)
	{
		// the order id has changed
		currentCustomerOrderSeq+=1;
	}
}


if(isFirstLineForCustomer)
{
	// do some stuff when this is the first line for a customer:
	
	if (isReturningCustomer)
	{
		// is "First line" but also "Returning"
		isCustOutsideRange=true;
		out_outsidedaterange = true;
	}
	else
	{
		isCustOutsideRange=false;
		out_outsidedaterange = false;
	}
	
}
else
{
	
	// do some different stuff:
	// set if they are outside date range based on what happened on the first 
	// line for this customer
	out_outsidedaterange = isCustOutsideRange;
}

if (isCustOutsideRange)
{
	// OUT OF RANGE: if customer is considered out of range we put "r-" in front of sequence
	out_order_sequence_no = "r-" + currentCustomerOrderSeq ;
}
else
{
	// IN RANGE: increment order sequence for customer
	out_order_sequence_no = "" + currentCustomerOrderSeq;
}

// make a note of what the customer no , last order no is so we can check it on next row
// variables here need to be created in "Your Custom Variables"
lastCustomerNo=c_Customer; 
lastOrderForCustomer=c_order_id;

// expression end

java snippet help with returning customers.knwf (25.9 KB)

1 Like

Ok so had to change a few things (ie Customer num is actually an email, so had to change it to a string). But here is the weird thing is when I created test emails, it works. So I copied the node to my actual file where I made the test data from and it is acting different.

Customer Pink - the highlighted row should be Order #2 not r-1 as the previous order above was the first time. Somehow it said outside of range. Yet on the test data that same customer where I just made a dummy email returned the same. Both customer emails show up as strings so I am not sure why copying it to the real data with real emails caused the different behavior.

Java Snippet Help.knwf (21.0 KB)

Hi @bsfl89

I think I know what the issue is. You changed the numeric to a string, but the code which is comparing with previous customer number (i.e. email address) still says this:


// initialise things based on whether this is a new customer or not
if (c_customer_email!=lastCustomerNo)
{
	// change of customer means this line is first for this customer
	isFirstLineForCustomer=true;
	currentCustomerLineNo=1;   // reset line no for customer
	currentCustomerOrderSeq=1; // used to number the orders
	lastOrderForCustomer=0;     // reset last order number
}

As it’s java, you need to change the condition for non-numeric comparisons, as it is now comparing String objects. The condition will need to be written using ! and .equals():

// initialise things based on whether this is a new customer or not
if (!c_customer_email.equals(lastCustomerNo))
{
... etc

A further possibility (in addition to the above) is of course that maybe some spaces have been added to some of the email addresses, so they look the same but are actually different. But I’d go with the above first. A String Manipulation node could be added between Excel Reader and Java Snippet if there is any data-cleanup (e.g. strip($customer_email$) ) required.

2 Likes

I was literally about to post back as I was putting some tests (just having it output data to see where in the sequence it was going) in there and realized it was not entering that part of the if statement.

That fixed it! Thank you so much for your help!


image

1 Like

Excellent! Glad you got it sorted :slight_smile:

1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.