data base-table query

Hi, All

I need help with something.

Say I have a master database and need to check how many times in that database a person and their information occur. How would I go about doing it?

In more detail, I have generated a table, this table consists of points of interest. I then need to perform a query through the entire master database and anytime this point of interest occurs, have all the information associated with it isolated so that I can work it into a table and then analyze it from there.

Does this make sense? I know how to export it the information to a table, I just can’t figure out how to perform the query through the master data file and isolate the points of interest everytime they occur to later export.

Thanks for any help!

MK

I do not really get what you want to do. There are several Database nodes that might help you and you have the full power of SQL at your disposal in the Database connections.

And as always it is much easier to understand what you want to so if you provide some (dummy) data that illustrate your question.

My Apologies for not making sense.

Okay. I have attached two files. Each with dummy data, noting that the master file is 3 million rows of excel not a few hundred.

Essentially what I need to do is have a query performed through the master data file where it is referencing the Insured Member Identifier deidentified and searching for all the information associated with it (i.e., the other 80 or so columns).

So that when the query finds an occurrence of the specific Insured Member Identifier deidentified sequence, it knows to pull all the other information with it and puts it in a table. My goal is that if the list of Insured Member Identifier deidentified is 200 rows, it looks through the 3 million rows of data and every time that the Insured Member Identifier deidentified we are referencing occurs, it knows to pull all the other data associated with it.

Does this make it any clearer?

Person Code.xlsx (9.7 KB)

DummyCLaimsData2.xlsx (83.5 KB)

Perhaps I’m reading this incorrectly, but it sounds like a simple join would do the trick? Though that won’t be possible based on the two files you provided… You would need a common ‘key’ between the two files/databases… To clarify, there would have to be a common identifier (such as an ID number) between the two sets of data.

1 Like

Hmmmm… I thought of that. It is essentially a large scale row filter where I am filtering for a everyone that is in that person code table that I attached to my earlier attachement. I am just trying to automate it so I dont need to intervene and make the filter step. Because its possible I could be filtering for 450 names, so to go through and look for each one manually would not be an option. Hence I want a way to cross reference one table to search for data point in the mater data file…

Ya know, let me know if I can clear up what I am trying to do at all.

Any help is appreciated

The answer is simple although disappointing as @Snowy said: you need the key to be present in both files or you need components like name, adress, date of birth … to construct an artificial key (fingerprint).

With the data you uploaded it is not possible.

Okay, lets double back to this after I have given it some thought over the weekend. In regards to the common key between the files, there is one. It is the Insured Member Identifier deidentified, the reason the connection did not show in the files I uploaded is that what i uploaded was a very brief snippet of the data and the results of the table where pulled from various files that make up the master data file.

Here is the idea I have, and @Snowy or @mlauber71 please let me know your thoughts. Is there a way to use the rule-based row filter but make the rule the results of the group-by node?

If this is the case, I will have a list of the Insured Member Identifier deidentified I am seeking to target in the master data file and can create a new subset database as it will be filtering for all the unique keys in the group-by node.

As for more context into what and why I am seeking to do. The data in the group-by node at the end of the workflow is the Insured Member Identifier deidentified of just three CPT codes. Now, I need to take those Insured Member Identifier deidentified strings and check to see if they occur in conjunction with any of the other CPT codes throughout the master data file.

Note, the metanode just uploads all the 12 files and combines them into one.

Updated Files for Reference

DummyData.xlsx (3.7 MB)

Person Code.xlsx (9.7 KB)

Hi @mkaplan603 -

Using your updated files, here’s a simple workflow that filters the Dummy Data file based on whether a person’s unique identifier appears in the Person Code table. It uses the Reference Row Filter node.

2019-01-21%2012_39_43-KNIME%20Analytics%20Platform

Reference_Row_Filter_Example.knwf (10.1 KB)

In context of your above workflow, the output of the GroupBy node could be fed into the bottom port of a Reference Row Filter node as a lookup, and the complete dataset could be fed into the top port of the same node.

Is this what you need?

2 Likes

Yes. It looks like it is! Thank you so much!