Replacing first n instances of a character

casm · December 1, 2019, 6:20pm

Hi, all. I’m running into an issue with importing space-delimited data from a flatfile, and it’s not totally clear how best to go about it.

In the flatfile, each line is formatted as follows:

abc def ghi jkl mno pqr stu natural_language_sentence(s)

All fields are space-delimited strings of varying lengths. The eighth contains sentences written in plain English, which in turn contain spaces of their own.

Because the first seven spaces are always present, I would like to replace those with tabs. It should then be easy to process the result into eight separate columns.

What I can’t figure out is where in any of the nodes I’ve looked at this can be accomplished, and while I’ve found some examples that get close to what I’m looking for, none really fit the bill. Any assistance would be appreciated.

HansS · December 1, 2019, 6:28pm

Hi @casm welcome to the forum,

I’m not sure if I understood your question well. But it looks like you want to split a string based on a space-delimiter to convert it into columns. Have you checked the Cell Splitter node?
But better is to provide a sample/dummy dataset that is like your input and an example of your desired output.
gr. Hans

casm · December 1, 2019, 6:56pm

Hi @HansS,

Sorry, I could’ve been clearer about this. I understand what you’re saying regarding the cell splitter, but I’ve realised that my data will need to be preprocessed before I can run it through the cell splitter. This is because the file is space-delimited, but one of the resulting cells will contain data that itself has spaces in it.

Let me see if I can describe the data structure better:

abc_def_ghi_jkl_mno_pqr_stu_natural_language_sentence(s)

Every underscore between ‘abc’ and ‘stu’ represents a space that I’d like to convert to a tab. This also applies to the underscore between ‘stu’ and ‘natural’.

Every underscore after ‘natural’ should remain unchanged. Basically, this means only changing the first seven instances of a space in the line, and ignoring any after that.

Each line follows this format, so there will always be seven delimiting spaces before the free-form string data that I don’t want to change.

Does this help to describe what it is I am attempting to do? Essentially, it comes down to, “convert the first seven spaces found in the line to tabs, then move on to the next line.”

Thanks

HansS · December 1, 2019, 7:38pm

Hi @casm

That makes it a little bit more complex. But take a look at this workflow. KNIME_project2.knwf (47.5 KB)
Hope this helps

gr, Hans

casm · December 1, 2019, 8:21pm

@HansS, that did it! Thank you!

HansS · December 1, 2019, 8:37pm

@Gasm, glad I could help you out, still learning myself (You can mark the topic solved)

ipazin · December 2, 2019, 3:43pm

Hi there @casm,

welcome to KNIME Community!

Maybe I’m missing something but if you read your file as one column and then use Cell Splitter with array size 8 shouldn’t you get your 8 columns separated properly?

Br,
Ivan

casm · December 2, 2019, 6:45pm

Hi @ipazin,

What you’re saying makes sense, but when I tried it a cell was created at every space. For the first seven cells this was fine, but depending on the contents of the eighth cell this could result in up to a couple of hundred cells being created.

It seemed odd to me, but as I’m new to KNIME I just put it down to not being familiar with how it handles operations like that.

Thanks!

ipazin · December 4, 2019, 4:32pm

Hi there @casm,

but the solution @HansS provided does exactly that as first step and produces 8 columns so I’m a bit confused

CellSplitterArray

Br,
Ivan

system · December 11, 2019, 4:42pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.