Combining Column Headers and Values

Ewerton · October 16, 2020, 2:11am

IHi Everyone,

I am new to Knime but I did some homework and learned some of the basics. I created some workflows to practice but I came out with a use case that I cannot figure out.

I am trying to create a workflow that sends data to InfluxDB. The easier way would be making HTTP calls and sending the POST content using what InfluxDB calls 'Inline Write Protocol’.

In order to do that I’d need to format the content in a certain way, combining some of the names of the column with its value for each entry. As an example:
If I have a table with columns temperature, humidity, and timestamp I’d have to arrange the string in the following format:

weather,location=us-midwest temperature=82,humidity=71 1465839830100400200

where:
weather is the Influx Measurement (something like a Table in InfluxDB)
location=us-midwest is a Tag in InfluxDB, a way of grouping information
column/value sequence separated by a comma
1465839830100400200 is the timestamp

Once I have the content in this format, I can have multiple lines and push a block of data. My original file has 250k lines so I think it would be hard to do it in one single call but I could break it into chunks.

I looked at the REST Post request and it allows me to indicate the column that has the body so I think I’d need to generate a column that has the string described above as its content.

BTW: I have a Python script that does this transformation but I was not able to get it working in Knime

Any insights/ideas? Any feedback is appreciated.

Thanks!
Ewerton

elsamuel · October 16, 2020, 4:25am

There are probably a few different ways to accomplish this, I think.

One would be to extract the column headers then convert them to flow variables, and use a string manipulation node to assemble the query piece by piece.

Another way would be to use the Column Expressions node function “columnNames” which returns an array of column names. If you expand this array you have access to the column names, each in its own column. You can then assemble the query using a string manipulation node.

Here’s an illustration of these 2 methods:

It would be helpful if you provided us with your workflow and sample data so that we could give more specific suggestions for your use case.

morpheus · October 16, 2020, 4:29am

Hi @Ewerton,

first of all welcome to the forum.

i suggest using string manimulation node with join option e.g.
join(“weather,loacation=”,"$location$," temperature=",string($temperature$),",humidity=",$string($humidity$)," ",sting($timestamp$))

BR

morpheus · October 16, 2020, 5:12am

ups… some tipos inside:
here is the correct one.

join(“weather,loacation=”,$location$," temperature=",string($temperature$),",humidity=",string($humidity$)," ",string($timestamp$))

Ewerton · October 16, 2020, 1:47pm

Thanks @morpheus, I will try both approaches but the challenge I see with the approach is that the columns are dynamic so I would have to iterate through the columns to get the current columns.

I tried the approach of unpivoting and combining the headers with the value and I am almost they. Olny thing is that Pivoting again created a new column for every Row, and Ideally all of them should be in a single Column.

I will prepare some dummy data to create the sample workflow.

Ewerton · October 16, 2020, 6:03pm

Thanks guys,

I created a flow (possibly more complicated than I should). It does almost what I want but it seems I am hitting on the string length limit because the Node 23 (Column Combiner) has a different length for each column.

Another point I will look at it later is that the columns should be dynamic as it depends on the input CSV.
I am attaching the flow and a sample file. Had to change the sample file to TXT but it is a CSV file.

Also, the current flow will make a lot of requests and once I got it figured out I will try to combine chunks of data (something like having the body column to have content of N rows combined)

SampleData.txt (2.4 KB)
Send File to InfluxDB.knwf (84.3 KB)

morpheus · October 17, 2020, 6:36am

Hi @Ewerton
a first impression on the data you provided.
The column names are not meaningful all starts with “column_” followed by a number. If your data structure is dynamic how you are able to indentify which column contains which parameter value? Only based on table content i have my doubts dealing with such data.

If you have meaningful column names available and the columns are dynamic you can unpivot them after looping other the files and reading the data. After this is done you can pivot the data again for further processing.

BR

system · April 17, 2021, 6:37pm

This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.