XLS Border Formatter

There is no special formatting needed with the python program. Only said that so I can review the data in the csv file. My main concern is making sure the data is clean.
Here is the history.
Downloaded excel file from the web
Removed the macros
Combine the excel files using knime
Added an additional column in knime.
Filled in some blank cells with data
Still have some blank cells
This is where I am now.

If you have a rule for filling in the blank cells (e.g “all blank cells get ‘DATA MISSING’ inserted”) then it seems like you could use an existing KNIME node like the String Manipulation node to say ‘if the trimmed [all leading and trailing whitespace removed] cell content has a zero length, then insert the value of XXX into the cell’.

Thanks for the information. I appreciate all the feedback. I ended my Current workflow with the CSV Writer and it looks similar to the Excel File. Which is what I expected. Now when I try to load the file into the File Reader I can’t get it to show the columns like the workflow example I found. Plus If I try the CSV File Reader I get an error message New line in quoted string (or closing quote missing) in line 29 but when I look at the file table there is no data.
Thanks
Scott

Just to be a repetitive pedant, that your CSV file “looks similar to the Excel File” is total random luck since the CSV file contains no styling information and no data type information, so whatever application is displaying the CSV file content for you just happens to be guessing the formatting and types that your Excel file actually has in it.

Now with that PSA out of the way, if you can’t attach a portion of your CSV file (like the first 10 lines) to this thread, could you include a screenshot of the configuration dialog for your File Reader and a screen shot of some of the output table of the File Reader node, and a more descriptive description of the difference between what you’re expecting and what you’re seeing?

Hi,
Here is what the file looks like from the CVS Writer.

Master_id Names Address_1 Address_2 Town State Interests_at_this_location Primary_id_text Primary_id_value Secondary_id_text Secondary_id_value
10001832 ----- 3 Center Rd NO LOT NO Amherst NH Water Well Record Well Id (WRB#): 7.1249 Map: - Lot: -
10143622 14-lot Subdivision Plan Tax Map: 004 Lot No: 142 Amherst NH Alteration Of Terrain Permit File No: 20050606-02 NO INFO NO INFO
14803 27 Lyndeboro Rd. Residence 27 Lyndeboro Rd NO LOT NO Amherst NH On-premise Use Facility Containing Fuel Oil Site No: 199411004 Project No: 5325
10137969 305.7 Acre Subdivision ADDRESS UNKNOWN NO LOT NO Amherst NH Alteration Of Terrain Permit File No: 19870508 B Permit No: WPS-2166
10125091 4nh Homes Llc Old Manchester Rd. NO LOT NO Amherst NH Water Well Record Well Id (WRB#): 7.1309 Map: 8 Lot: 85-1
10138958 9 Columbia Drive ADDRESS UNKNOWN NO LOT NO Amherst NH Alteration Of Terrain Permit File No: 19881004 A Permit No: WPP-3387A
10007868 9 Wilkins Rd Wilkins Rd NO LOT NO Amherst NH Water Well Record Well Id (WRB#): 7.0921 Map: 5 Lot: 118-0
61 ??? Overlook Dr Rtes 122101a NO LOT NO Amherst NH Underground Storage Tank Program Facility Id: 112990 Site No: 200106045
10004895 A. Arsenault 9 Warren Way NO LOT NO Amherst NH Water Well Record Well Id (WRB#): 7.0187 Map: 2 Lot: 146-42
10003858 A. Barraford 13 Juniper Dr NO LOT NO Amherst NH Water Well Record Well Id (WRB#): 7.1217 Map: 40 Lot: 3
10107440 A. Bourzikas 41 Embankment Rd NO LOT NO Amherst NH Water Well Record Well Id (WRB#): 7.0415 Map: 8 Lot: 116
10007352 A. Brown 36 Standish Way NO LOT NO Amherst NH Water Well Record Well Id (WRB#): 7.117 Map: 1 Lot: 35-24
10007351 A. Brown 36 Standish Way NO LOT NO Amherst NH Water Well Record Well Id (WRB#): 7.1169 Map: 1 Lot: 35-24
10005218 A. Carraccio 44 Green Rd NO LOT NO Amherst NH Water Well Record Well Id (WRB#): 7.0408 Map: 7 Lot: 94-3
10004318 A. Chawla 46 Lyndeborough Rd NO LOT NO Amherst NH Water Well Record Well Id (WRB#): 7.1186 Map: 5 Lot: 64-1

File Reader Viewer

File Reader View from Sample example

When you look at the File Reader that my output put together and the File Reader Sample example.
I would like mine to look like the File Reader example with the Columns. I would like it to look like this so there is no special characters @#%/n etc in my csv files

In your file reader configuration screenshot, your “Column delimiter” is listed as “none” - try making this a comma.

Thanks works with my Sample File but not my large combined file with my 5 Combined EXCEL files.
So for now I am trying to figure out the errors.

From that screen shot, it looks like your CSV’s first row is the names of columns, so you probably want to tick the ‘read column headers’ checkbox seen in the screenshot

If your CSV has a heterogeneous per-row column cardinality, click on the “Advanced…” button seen in the screenshot, then the “Short Lines” tab, then the check box for “allow short lines”

I am now missing Data not sure why my Data would get out of sync. Plus I want to get rid of the quotes. I removed the quote support but the quotes are still there.


Thanks for all your support

If you want to strip out quotes, this has been discussed previously in the forum - like: Remove double quotes beginning and ending of a string

This may be a dumb question
I used Csv writer to create the csv file
I just imported it into file reader and created a csv table in the reader but it messed up my table and it is out of sync. Meaning data isn’t lined up in the right columns in certain rows and columns. Not sure if removing the quotes will fix it. But it is worth a try. So if I connect the string manipulation to remove the quotes this may fix my issues.
Is there any other nodes you would recommend before adding the csv file into my python script in order to clean the csv file.

no dumb questions - only dumb answers! (or something like that…)

Here’s another thing that seems kind of weird in my understanding of what you’re doing. As i understand it, you’re using the CSV Writer node to make a CSV file, but then you’re using the File Reader node to read it and do further stuff with it. … Why are you creating that CSV file instead of linking the node just before the CSV Writer node to the node just after the File Reader node? Are you doing some sort of processing on the CSV file before you use the File Reader to read it back in?

So once I use the csv writer to convert the file from an excel to a csv file then I should be ready to use with my python script and I shouldn’t need to do anything else?
Or am I missing something and getting confused from things I read from others in the forum documents and sample workflows?
I was just using the file reader since I thought there was more I needed to do to prepare my csv file to make sure it is clean and ready to use in my script. I will try my csv file in the script to test for errors with my file
Thanks for all your help,
Scott

Hmmm… so, again i’m not super clear here, but i understand your workflow to be:

[Excel Reader] → [Some Nodes that fix empty Cells] → [CSV Writer] (→ Python Script)

Is this the case? Is the output table of the last of the ‘some nodes’ lacking in correct data?

((Once this much is nailed down, you should then be able to improve your workflow to use Python nodes to continue to control the execution from KNIME without needing to step outside of KNIME.))

Sounds Great Thanks
So I want to Format the Columns so I can see where the empty cells are using the XLS Cell Formatter then I will use your suggestion of fixing the empty cells with the string Manipulation before making a csv file.
I tried doing something similar to the sample workflow but it failed. Do I need to add another node to get this to work.

Oooph - i feel like we’ve just jumped to square one again.

I thought the only reason you were formatting and writing an Excel file out again was to do some sort of human-review for empty cells; … and that after that human review, you’d eventually write to your preferred format of CSV. Are we on the same page here?

If we’re on the same page, why not look for the cells with missing data (by which i assume you are saying “blank cells”… ?) in KNIME.

(In other words, i’m not sure why we keep coming back to “I want to Format the Columns so I can see where the empty cells are using the XLS Cell Formatter.” Why is a human reviewer needed in this situation?)

The guy I am working with keeps asking so to satisfy him I would give him what he wants and the plan is to add the Missing Value Node to the XLS Formatter (apply) then the CSV Writer I figure I can give him what he wants if the XLS Formatter Node


XLS Formatter (apply ------------------Missing Value Node---------------------then the CSV Writer
I don’t think I really need it but I am also taking it as a learning experience as I learn the KNIME Analytical Tool.

The Missing Value node might not do what you want. There is a specific kind of cell value for missing data - for example, it might be used if a null value has been read from a database. The Excel Reader node may not be using this, but may instead just be producing empty (e.g zero length strings) cells.)

Hmm… ok - so something like:

[Excel Reader] → [Missing Value or other node] → [CSV Writer]
                     \
                      \ → [XLS Cell Formatter] → [XLS Formatter (apply)]
                       \                             / (via flow variable output)
                        \→ [Excel Writer]   ←-------/

i guess. (I’ve never used the XLS formatter nodes, but just looking at what they’re doing, it doesn’t make sense for the Writer’s output to flow into the XLS Formatter apply node - how would the formatting information ever make it to the writer?)

The flow variable connection between Writer and Formatter (apply) is not for passing variables, but for setting a temporal dependency. The XLS file needs to be completely written first, before the formatting can start.

In the last workflow screenshot provided, you are missing a XLS Control Table Generator node. You can never pass direct data into the XLS formatter nodes, it expects column names and rowids just like Excel (e.g. A to 7, 1 to 10). Generating these ‘control tables’ from existing data makes the usage of the extension very dynamic and adjusting to varying input table specifications. To get started, you could rather try the XLS Control Table from Cell Range node (without any inputs connected). The tag you assign there (e.g. ‘header’) should re-appear in the cell & border formatter nodes.

It’s also a good idea to look at the examples (either on the KNIME server or KNIME hub) and the documentation:
https://www.knime.com/community/continental-nodes-for-knime-xls-formatter

Best regards

Arne

1 Like

This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.