Split a string with comma separated values in rows

gstoel · October 8, 2013, 2:47pm

Hi there,

I am very new to Knime but trying to parse some HTML code.

I got so far to get to a string that contains the value below (this is just a snippet):

{ date: '2013-10-08T14:28:56.151072+02:00', value: 3 }, { date: '2013-10-08T14:43:56.151072+02:00', value: 5 }, { date: '2013-10-08T14:58:56.151072+02:00', value: 1 }, { date: '2013-10-08T15:13:56.151072+02:00', value: 2 }, { date: '2013-10-08T15:28:56.151072+02:00', value: 4 }, { date: '2013-10-08T15:43:56.151072+02:00', value: 7 }, { date: '2013-10-08T15:58:56.151072+02:00', value: 3 }, { date: '2013-10-08T16:13:56.151072+02:00', value: 4 },  etc....

I would like to get everything between the curly braces on a new row in a table. But I have no clue how to get this done using Knime.

Any tips would be helpfull.

Regards,

Geoffrey

shinwachi · October 21, 2013, 4:54pm

For a quick-and-dirty soultion, you could use Python Snippet module (community contribution), with code like this:

import re, collections

# initialize output table (add first row to prevent null pointer error when nothing is found - delete this in the next node)
pyOut = collections.OrderedDict()
pyOut["original"] = ["#deleteme"]
pyOut["sepval"] = ["#deleteme"]

# iterate through each row
for row in zip(*kIn.values()):

    # create row dictionary to locate the input values
    rowdict = dict(zip(kIn.keys(), row))

    # assume 'column1' is the table with list of {} bracketed values
    colval = rowdict['column1']

    # use regular expression to find all bracketed strings
    for sepval in re.findall(r'{([^}]*)}',  colval):
        # append original value (for later reference - optional)
        pyOut["original"].append(colval)
        # append found values (get rid of surrounding spaces using .strip <- optional)
        pyOut["sepval"].append(sepval.strip())

This assumes you have the string in "column1". It will expand each row into multiple rows depending on number of bracketed values.

There are other nodes available for scripting, such as R and Perl.

richards99 · October 22, 2013, 7:56am

If you want to avoid python, you can use;

Cell splitter node with , as delimiter And choosing output as list.

Ungroup node to separate all the list cells in one row. Simon.