Strategies to find matching rows in a database

Hello all,

I am looking for ideas on how to tackle a problem using knime.

I have a table containing many rows of data where any of the fields can be NULL. For example:

NAME AGE Cubicle COMMENT1 COMMENT2
John 23   rhinocerous  
Alice 37 1    
Tim 56 2   Banana

I want to query an Oracle databse to see if each of these rows already exists in the database. For a row to exist, all fields must match. My initial thorugh was to use a "Table Row To Variable Loop Start" --> "Database Reader," however, the NULL/missing values don't work in this paradigm since:

  1.  "Table Row To Variable Loop Start"  doesn't allow NULL values to be passed
  2. The SQL query used in the database requires me to use for "IS NULL" instead of "=" when comparing NULL values.

The only other way I can think to do this is to use a separate SQL query, each in its own "Database Reader" node, to indiviaually handle the different cases of possible NULL values. This is a bit unweildy since it would require dozens of separate database reader nodes.

I'm sure there is something simpler that is escaping my mind, so all suggestions are welcome.

Thank you.

Hi,

maybe there are more KNIMEish solutions, but as I'm usually faster with code snippets, here is a quick solution using the Java Snippet node. It builds the query string by iterating all input columns with dedicated treatment for missing values. You should then be able to input the query to the DB nodes through a flow variable.

StringBuilder query = new StringBuilder();
query.append("SELECT * FROM \"table\" WHERE ");
for (int i = 0; i < getColumnCount(); i++) {
	if (i > 0) {
		query.append(" AND ");
	}
	String columnName = getColumnName(i);
	query.append('"').append(columnName).append('"');
	if (isMissing(i)) {
		query.append(" IS NULL");
	} else if (isType(i, tString)) {
		String value = getCell(i, tString);
		query.append(" = '").append(value).append("'");
	} else if (isType(i, tInt)) {
		int value = getCell(i, tInt);
		query.append(" = ").append(value);
	} else {
		// TODO add treatment for further types if necessary,
		// e.g. double, boolean, ...
	}
}
out_query = query.toString();

Maybe you need to make some adjustments depending on your SQL dialect. If you use further data types beside String and Integer, you can add them following the given schema.

Hope that helps,
Philipp

Philipp. That is a really slick way to do it. I have adapted your code and it works incredibly well. And most importantly: it is a very flexible solution.  

Thank you!

--Josh

Hi Josh,

just as a comment, the Table Row to Variable does handle missing values, you just need to enable in the configuration dialog.

Best, Iris