Regex Parsing different types of lines

Hi, I am trying to parse a BGPO route table export

   Network          Next Hop          Metric LocPrf Weight Path
*  3.0.0.0          131.103.0.2                          0 1225 1239 701 80 ?
*                   205.238.48.3                         0 2914 1280 701 80 ?
*                   158.43.206.96                        0 1849 702 701 80 ?
*                   204.212.44.129         1             0 234 1225 1239 701 80 ?
*                   194.68.130.254                       0 5459 5413 701 80 ?
*                   144.228.240.93         4             0 1239 701 80 ?
*                   204.70.4.89                          0 3561 701 80 ?

BGP Full Route Parse Only.knwf (58.5 KB)

the file is in data folder within the workflow, it has 82 lines and at least 4-5 types of lines. what is the best practice, I am trying to find the perfect regex for the whole thing but my regex knowledge is all about what I have done so far.

the full route has also a weird format, with fixed lengths, number of spaces change according to the previous fields length though.

anyways, your recommendations are appreciated

PS : the regex I came up with in 2 hours is as follows

^*\s+(?[0-9]{1,3}.[0-9]{1,3}.[0-9]{1,3}.[0-9]{1,3}/[1-3][0-9]{1,2}|[0-9]{1,3}.[0-9]{1,3}.[0-9]{1,3}.[0-9]{1,3})\s+(?<next_hop>[0-9]{1,3}.[0-9]{1,3}.[0-9]{1,3}.[0-9]{1,3})\s+(?\d+)\s+(?<local_pref>\d+)\s+(?\d+)\s+(?.*)\s+(?\w|?)$

Hi @hakandurgut,

I could get to a good point by doing a little manipulation. Just after the Line Reader node I used a String Manipulation with this expression:

regexReplace(substr($lines$,3), "\\s{2,20}", "_")

Then a Cell Splitter with underscore “_” as delimiter. Which gives you this:

From here, It should be easy to get everything as desired.

2 Likes

Hi @hakandurgut , I know you mentioned that the “fixed length” appear to have variations, but with the example data I found the Cell Splitter by Position was also able to produce what appears to be a reasonable result:

I used split positions of:
2,20,38,45,52,59
and column names of:
Ind,Network,Next Hop,Metric,LocPrf,Weight,Path

At the end I used String Cleaner to trim spaces from the returned columns. In KNIME 4.7, this could be done using String Manipulation (Multi Column) instead.

BGP Full Route - split by position1.knwf (80.9 KB)

3 Likes

thank you so much for your replies, now I was able to parse it easier.

1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.