Complex filtering and dismembering data

Hi!

I just downloaded Knime and have been trying to make some manipulation that seems quite complex

7897813800032 BISC PALU BROA
7897826900125 BISCOITO BOKINHA 80G
7897949901597 BISC TARANTELLA 350GR ROSCA BRANCA
7898046910680 COOKIES GRANULADOS 1KG VABENE BAUNILHA

I have 2 full columns, one with code bar and the other, description.

I need to:

  1. Extract all the rowsn in column 2 where I dont have any number inside text
  2. Put in another column all the rows only the numbers (note that some of them have G, GR and the numbers can be in the middle of the string and in the end as well)
  3. In row 4, I have some cases that are in kilograms, meaning that i would have to transform in grams in the same column that I mentioned in the second point.

I tried to use Regex from another forums but I can’t figure out why it didnt work.

The ideal output would be:

7897813800032 BISC PALU BROA ---------------------------------------------------- NO NUMBERS
7897826900125 BISCOITO BOKINHA 80G ---------------------------------------- 80
7897949901597 BISC TARANTELLA 350GR ROSCA BRANCA----------- 350
7898046910680 COOKIES GRANULADOS 1KG VABENE BAUNILHA–1000

Can someone help me please?

Thank you

you can try string manipulation node with regexReplace and replace “[^\\d+]” with “” to get the numbers. If you like to keep the original column you can use string manipulation node to create a copy of your column first
br

2 Likes

Hi Daniel,

image

Which one would be the ideal one?

Hi @lawrencetu welcome to KNIME Forum

See this wf some_filtering.knwf (35.6 KB)
It uses the regexreplace function suggested by @Daniel_Weikert that helps to identify the numbers. With that info available you some additional KNIME nodes to find a way to a possible solution.


gr. Hans

3 Likes

Thank you so much!

I still have some cases where there are some items where it have 2 numbers or it didnt work but it already helped a lot!

1 Like

Hi,

Question1: First of all, can you change the export this information position or put some delimiter to separate the kind of information? If yes, It can be easier for you when you import using csv reader node.

Question 2: You can use regex to find the item and put it to the end of this string, using something likle it:
= regex replace(string,“(.)( \d+\w)( .)”,“$1$3$2”
This will get all the first part the string and set as the first part to result, then get the number and letters between then to put at the end of the string.

\d+ => digitis from 0 to 9 repeatly
\w => a word (letters, hifen, underline, point)

Example:

7897949901597 BISC TARANTELLA 350GR ROSCA BRANCA
$1 = 7897949901597 BISC TARANTELLA
$2 = 350GR
$3 = ROSCA BRANCA

Changing the position between 3 and 2 will give you the result

$1$3$2 = 7897949901597 BISC TARANTELLA ROSCA BRANCA 350GR

And you can replade 2x " " for 1 time " " to correct the string too.

You can exlore more the regular expressions to help you to correct some cases…

If you can exporte the file with another information position, I can suggest to put at the end the width/size… or just put another separator symbol… much more quick and easy way.

Seeya,

Denis

1 Like

Hi denisfi

Yes! it actually would help because I could padronize all.

Where can I find a video or article to study more about regex replace sintax?

Thank you!

Hi Lawrence,

There is a loto of websites to learn regex.

A online tools:

Some youtube videos:

There a lot of book too…

1 Like

Hi @lawrencetu

Well another approach to filter the relevant numbers only, is to use the KNIME TextProcessing extension.
The BagOfWords node, expand every line into separate words, which gives mores opportunities to filter and modify, See: KNIME_project_2.knwf (68.9 KB)



gr. Hans

3 Likes

Thank you so much! It solved the problem

1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.