Extract Coordinates with Regular Expression


#1

Hello,

I need extract coordinates from polyline

I am using KNIME 3.4.1; and I can not Extract with pattern: ([\\d.]+\\s[\\d.]+,\\s){1,}[\\d.]+\\s[\\d.]+  for two rows.

- Node is Java Snippet (Simple)

- Messages are:

WARN  Java Snippet (simple) 0:2        Evaluation of expression failed for row "Row0": null
WARN  Java Snippet (simple) 0:2        Evaluation of expression failed for row "Row1": null

- I upload de data and model

 

Thank you for the help

 


#2

Hi,

I'm not entirely sure what your final data structure should look like, but I guess you want to havetwo columns for the two coordinates and one row for each record. Why don't you just use the Cell Splitter nodes to do this (as you are using them anyways after your RegEx)? First split at every comma and select list output and after the Ungroup node you can use another Cell Splitter to split at every space, appending the output as new column. Hope that helps.

Cheers,
Marten


#3

Hi Martin,

 

Thanks for your response.

 

I attach a new model for further clarification.

As shown, there are four rows; and only two is evaluated by the regular expression.

Is there anything more to configure?

 

Thank you


#4

Hey,

I hunted it down for you. If you look into your log, you will find a java.lang.StackOverflowError. Probably two of the strings you are parsing are too long.

Once again, what are you looking for in the data or how should your final dataset look like? Do you want to extract a specific pair of coordinates or do you just want to find every pair within the string? Probably you are better off using a String Manipulation node then.

Cheers,
Marten


#5

Hi Marten,

I want to find every pair within the string.

Final dataset look like (Linestring with coordinates 30 20, 30 10, 40 5):

Id, X ,Y

1, 30,20

1,30,10

1,40,5

Thank you very much,


#6

As your input data has always the same structure, like LINESTRING (coordinateX coordinateY, ...), you don't need such a generic regex to get the results you want. Instead I'd suggest to use the following nodes:

1. String Manipulation node: Strip everything except the coordinates with the following function regexReplace($PUNTOS$, "LINESTRING \\(|\\)", "")

2. Cell Splitter node: Set comma as delimiter and output as list.

3. Ungroup node: Ungroup the just created list to have a single row for each ID and pair of coordinates

4. Cell Splitter node: Select the coordinate-column as input, define a whitespace as delimiter and output as columns

This should give you the desired results. To tidy things up a bit you can use a Column Filter and Column Rename node afterwards. Hopt this helps.

Cheers,
Marten


#7

Hi Marten,

Thanks