I am importing a CSV file that is the output from a preceding workflow.
As some feedback ,
I set up a VM with a clean install of Knime 4.3.4 and ran the same flow with the same CSV file input and the output worked exactly as expected.
I want to do 2 further tests before I call foul.
Set up a new VM with a clean install of Knime 4.4.1 and do the test with the same flow and the same CSV file and confirm the result. This will ensure that there is not some issue with the install on my computer that has had a number of Knime upgrades in the past 2-3 years.
Rebuild the same flow from scratch on a clean 4.4.1 build. It may be that the flow built in 4.3.4 has some issue somewhere and clean 4.1.4 nodes resolve the issue.
If both of the above tests fail I will export the flow with anonymised data and hopefully, with great community and Knime Team support, figure out what is happening and what the solution might be.
@TigerCole CSV typically is an insufficient way to store information (preserving data types), especially if the data is from another KNIME (?) workflow. KNIME .table, Parquet or ORC might be much more stable as well as local H2 or SQLite database. If you absolutely must have a TEXT file ARFF could be another choice preserving structure and data types.
Thanks for the suggestion on alternative formats to save data for forward flows. The SQLlite idea sounds good because I am using it for another application. I will give it a try.
Indeed, CSV is an insufficient format, in a similar way than JSON. On the other hand, CSV is widespread and non-proprietary. Using SDMX or a proprietary format is not always an option, and let us not even mention the unpredictable Excel format.
If type safety is a concern for CSV, one way to deal with it is to insert a “dummy row” just below the header row (so basically row number 1), in which, for each column, there is a value of either abc or 123 to push that column into either string or numeric format. For date columns, using ISO date formatted strings is probably the best choice. After importing, simply filtering the said row will accomplish the trick.
Congratulations btw for having included more options for enforcing type safety in the readers of the recent KNIME version.
I found the problem… I have a java snippet in the Knime flow that takes the values in a Latitude array and Longitude array and creates a WKT polygon.
The values in the arrays are double with “.” as the decimal. For some reason in 4.4.1 when node java snippet creates the WKT polygon, which is a string, the “.” in the double is changed to a “,” is incorrect.
language and regional settings if you are on Windows. You have set “,” as decimal separator. It’s possible in 4.3 it just used “.” by default and in 4.4 it was “fixed” to use what you have as OS setting.
I wish it was that simple, it was the first thing that I checked. The default decimal for my computer s definitely “.” … checked, confirmed, and tested.
Maybe due to change in Java version with KNIME v4.4.0? (see here). Don’t know your Java code and if you can address it there (or is it related to OS settings) but can’t you replace comma with dot using String Manipulation node for example?
And please, don’t open multiple topics for same issue.
Hi @TigerCole , not sure why it is happening, but may be you can force it to be “.” instead of “,”. There are a few ways to do this, I think it could even be done in Java. If you can show what you are doing in your java snippet, someone for sure can help.
Alternatively, you can always manipulate the results as @ipazin suggested via String Manipulation.
My apologies about the new topic. I assumed a new issue and a new topic.
It does not look like a change in the Java version is an issue.
I probably can change the “,” with a String Manipulation node but it is an extra step in a flow that is processing 12 million rows that I didn’t need in 4.3.4 and I would like to figure out why it is happening. Adding the extra node is like a bandaid on a bleeder.
I don’t think that the Java code in the snippet node is complicated.
// Your custom imports:
import java.util.List;
import java.util.ArrayList;
import java.io.StringWriter;
import java.util.stream.Collectors;
import java.text.NumberFormat;
import java.text.DecimalFormat;
// system variables
public class JSnippet extends AbstractJSnippet {
// Fields for input columns
/** Input column: "latArr" */
public Double[] c_latArr;
/** Input column: "lngArr" */
public Double[] c_lngArr;
// Fields for output columns
/** Output column: "the_geom" */
public String out_WKT;
// Your custom variables:
// expression start
public void snippet() throws TypeException, ColumnException, Abort {
// Enter your code here:
//Target: MULTIPOINT ((LON1 LAT1), (LON2 LAT2), (LON3 LAT3), (LON4 LAT4), (LON5 LAT5), (LON6 LAT6))
NumberFormat fmtr = new DecimalFormat("###.###############");
List<String> points = new ArrayList<>();
for (int idx=0;idx<c_latArr.length;idx++){
double lat = c_latArr[idx];
double lng = c_lngArr[idx];
StringWriter segment = new StringWriter();
// segment.append("(");
segment.append(fmtr.format(lng));
segment.append(" ");
segment.append(fmtr.format(lat));
// segment.append(")");
points.add(segment.toString());
}
//add the 1st lati and longi to the tail to close out the multipoint spec.
{
double lat = c_latArr[0];
double lng = c_lngArr[0];
StringWriter segment = new StringWriter();
// segment.append("(");
segment.append(fmtr.format(lng));
segment.append(" ");
segment.append(fmtr.format(lat));
// segment.append(")");
points.add(segment.toString());
}
out_WKT = "POLYGON (("+points.stream().collect(Collectors.joining(", "))+"))";
// expression end
Is there anything above that may be the cause. or a change that may resolve the problem?
If an export of the node will help, I can send it with a data sample.
“Broken” output after I did a copy and paste of your code. I think that I am going to have to speak to a developer and find a way to force the “.” when doubles are converted to strings.
It does not happen when I use a “number to string” node so it must be something in this particular snippet.
You can force a specific locale during conversion, to ensure consistent results, by setting it explicitly in your code, like in this example from stackoverflow:
DecimalFormat df2 = new DecimalFormat("#.##");
df2.setDecimalFormatSymbols(DecimalFormatSymbols.getInstance(Locale.ENGLISH));
I agree with @gab1one , this kind of behaviour is usually because of the region setting. For example, I know that comma is used instead of dot in French.
@TigerCole I think the setDecimalFormatSymbols(DecimalFormatSymbols.getInstance(Locale.ENGLISH)) suggested by @gab1one should work as it looks like it will force the Locale to English.
My apologies for the delayed response to the comments. I am not sure why, but for some reason having my computers regional settings set to “South Africa” seemed to be the problem. I changed to UK and configured it to work for me and “voila” my workflow works.
I have spoken to our developers who are really experienced Java developers and they could not figure out why South African regional setting caused a problem. It may be that Java is reading the default for region and not picking up the changes in my local configuration.
Thanks for all the support. It is much appreciated.