String splitting and recomposing

Hello
I’ve a very common problem while transferring datas from one system to an other.
I found several ways to deal with it but I found no one that is really “nice”.

The problem:

  • In my first system, I’ll call “input”, I’ve company names coded in string of 60 caracters long.
  • In my secound system I’ll call “output” I’ve 2 fields , string type, each one is 40 caracters long, to receive the company name from input system.

So I’ve to separate all words from input system (using space as separation caracter) and split them in the 2 fields on the second system if size in input system is more than 40 caracters. Of course I must not cut a word in 2 parts when transferring in second system.

The solution I’m looking for is one taking advantage of standard nodes without having to write java code, because I’m not a java programmer.

I used a string manipulation to split the input name in a string list containing all separated words the compose the company name. I’m able with a simple java node to get the number of words generated.

The remaining task I’m looking for is split words in the 2 new destination fields.

Any idea?

Best regards
JMarc

Your problem isn’t completely clear. Is the input always exactly 60 characters? That’s what you imply. If so do you want the first 40 characters in one output field and the remaining 20 in the second field adjusted for not splitting a word? Some sample data would be very helpful.

1 Like

Hi @JM64 , if I’m understanding correctly…

  1. Any company whose name is longer than 40 characters needs to be split into two strings at the position of the last space that appears within the first 40 characters.

  2. Any company whose name is less than 40 characters remains as a single string

You can achieve this by using String Manipulation nodes to find the position of the last space that appears within the first 40 characters, and then using subsequent String Manipulations to “substring” the string at that point. A Rule Engine can be included to handle the case where the name does not already exceed 40 characters:

String Manipulation - get length of company name:
(appends “CompanyNameLength” column)

length($Company Name$)

String Manipulation - find first space prior to position 40
(appends “lastSpaceBefore40” column)

lastIndexOfChar(substr($Company Name$,0,40),' ')

(not the use of single quotes in the above, rather than double quotes. This is a java thing to be aware of, and the lastIndexOfChar function takes as its last parameter a “char” rather than a “string”. Long story made short: this means it needs to be passed in single quotes)

Rule Engine - determine whether to use whole string or string up to located space
(appends “splitAtPosition” column)

$CompanyNameLength$ <= 40 => $CompanyNameLength$
TRUE => $lastSpaceBefore40$

String Manipulation - everything before split
(appends “CompanyName1” column)

strip(substr($Company Name$,0,$splitAtPosition$))

String Manipulation - everything after split
(appends “CompanyName2” column)

strip(substr($Company Name$,$splitAtPosition$+1))

Alternatively, in fewer nodes, this could also be achieved with just two String Manipulation nodes, using a more “codey” String Manipulation Hack, the conditional part of which you can find documented here, and builds all of the above logic. It does make it feel more like you are programming! :wink:

image

String Manipulation for CompanyName1

string(
length($Company Name$)<=40
?$Company Name$
:strip(substr($Company Name$,0,
	lastIndexOfChar(
		substr($Company Name$,0,40)
		,' ') 
     )
   )
)

String Manipulation for CompanyName2

string(
length($Company Name$)<=40
?""
:strip(substr($Company Name$,
	lastIndexOfChar(
		substr($Company Name$,0,40)
		,' ') +1
      )
   )
)

split string at last space prior to specific position.knwf (83.3 KB)

4 Likes

Hello Rfeigel,
Thank’s for taking time to help.

No, the input is not necessarily 60 char long. 60 is the maximum. some company names can be 2 char long.

But as input I have a name that can be composed of several parts space separated that must be stored in an other structure composed of 2 elements of 40 char each (maximum).

So I must cut the input name but I must not cut inside a part.

Here is a simple example :slight_smile:

|Column 1 | Column 2 | Column 3 | Column 4 | E | F|

|— | — | — | — | — | —|

|Input company name | size | output first part | size | output second part | size|
|The Standard Company to test splitting process | 47 | The Standard Company to test splitting | 39 | process | 7|
|The Company to test splitting process | 38 | The Company to test splitting process | 38 | | 0|
| | | | | | |
| | | | | | |
| | | | | | |

Best regards

JMarc

1 Like

Hello takbb
It is exactly what I was looking for :slight_smile:

I like both solutions. I did not thought solving it this way. What I built was using java . This is really much more elegant and more understandable for non programmer.

Best regards
JMarc

2 Likes

You’re welcome @JM64 and glad it works for you, and thanks for marking the solution.

I just added additional "strip( ) statements around the returned names in the code of the second solution above, similar to what I did in the first solution.

This isn’t strictly necessary but removes any superfluous spaces if they happen to appear at the beginning or ends of the returned names.

Thank you again.
I’ll have a look. I already implemented the first solution in my project :slight_smile:

Regards
JMarc

1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.