Files Corrupted During File Transfer

I am connecting to an SFTP site via the SSH node, using the File List node to get the address of the files, and looping through each file individually to transfer to a local folder.

They are pdf files and all of them are corrupted after transfer no matter what I do.

image

If the files are downloaded from the SFTP outside of Knime, they are not corrupted. However, there are 3700 files located throughout 760 subfolders, so doing this manually is not a desired option.

What can i do to prevent the files from becoming corrupted?

Hi @djarrett , you may need to give more details about what you are doing, because we can connect to sftp sites via the SSH node and download files without them being corrupted. So just saying that the file is corrupted without any details is hard to know what is going on.

Can you show us how your workflow looks like, what nodes you are using for downloading, and what kind of configurations you are using?

Hi @bruno29a below is my workflow. I am simply connecting to the sftp address with the credentials provided, listing the path to all of the files, turning those paths into row variables, then feeding them one by one into a transfer file node to download. All of the files download, but all of the files are corrupted when i attempt to open them.

Hi @djarrett , thank you for the screenshot.

So, in situation like this, you should first try downloading 1 file, without the loop, and see how this goes first. Try this to see if the file you downloaded is still corrupted.

Secondly, I think you can avoid using loop if you use the Transfer Files (Table) node:
image

2 Likes

Hi @bruno29a Thank you for the suggestion on using the Transfer Files (Table) node.

I followed your advice, and downloaded 1 file, and I still get the same result, which is a corrupted file.

@djarrett could you provide more details about your operating system, your file system and the location where the files are to be stored. Also could you provide us with a log file in debug mode an also with a successful file and a corrupted one.

How is it corrupted? Is the file size roughly the same? Do the file names have special characters. Are there any blanks or special characters in the folder names?

Could you show us a screenshot of the file transfer configuration.

1 Like

Hi @mlauber71 i am transferring files from an sftp site given to me by a client. I am transferring pdf files from the sftp site to the downloads folder of my windows 10 machine. The file size after transfer is the same as when it is stored on the sftp site. I get the error message shown in my initial post when i try to open the downloaded files.

I cannot share the file here for confidentiality reasons. The filenames have underscores “_” in them. However, i have attached my logs:
knime log June 2 2022.log (873.7 KB)

@djarrett I could not initially find any problems. You could get rid of the vernalis error messages by installing RDKit.

Could you create a LOG in DEBUG mode? Open the existing log and delete the content:

image

or delete the file from the workspace directory

{workspace_directory}/.metadata/knime/knime.log

activate the DEBUG level. Make sure that it would not contain any sensitive informations and remember to set it back to WARN after you are finished:

image

You might want to include further informations about your system and environment, like the KNIME version you are using (cf. also this):

In other cases providing a minimal example of your workflow might also help, but in this case this might not be possible.

And could you provide us with screenshots about the settings in the data transfer nodes.

2 Likes

@mlauber71 I am using:

Windows 10 Pro
KNIME Analytics Platform 4.5.2.v202203171119 org.knime.desktop.product

Below is a screenshot of my workflow

I get the following error when i try to open the pdf files that were transferred from the SFTP site to a downloads folder on my PC.

When I transfer the same pdf files manually via WinSCP, they open without issues.

Attached are my logs after going to debug mode:

Knime log June 3 2022.log (20.1 KB)

Below is a screenshot of the file transfer node settings:

1 Like

Not sure if this helps, but when i open with an alternative pdf reader i get the following message:

image

1 Like

@djarrett thank you for this information. I do not see an immediate problem I would suggest to try these options. Switch the internal storage from snappy in the knime.ini to

-Dknime.compress.io=GZIP
-Dknime.compress.io=NONE

there are some error messages about the storage format in the LOG. Just an idea.

Then just to be sure - could you try to download the PDF file into a location provided by a path variable or relative path (instead of a c:\xyz path):

Next thing to try would be to use deprecated nodes and see if that does make a difference. That could give the KNIME developers an idea that it is an issue with the transfer node. You might find examples about their usage attached below the nodes description:

1 Like

Have you tried to download one of them without KNIME and see whether the file is corrupted?
Sounds similar to and excel question where the file itself was the issue.
If I overread that you already did that, sorry.
br

@Daniel_Weikert Yes. I have. The files are not the issue. They open without any errors.

@mlauber71 thank you for these suggestions. The legacy Download Node did not produce the same issue. The pdf files that actually transferred opened without any problems.

4 Likes

@djarrett glad this does work.you could construct a workflow that you could restart if not all file transfer does work.

@tobias.koetter this sounds like the transfer files node might have a problem with sftp or something else changing things when downloading a pdf (might be under special circumstances). Maybe an investigation with @djarrett could yield additional results. Reminds me of this:

2 Likes

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.