put an xml and csv file togehter

Paddymaster · June 13, 2019, 2:27pm

Well. Its good. Since you answered i developed an own solution

Instead of work with this variablecolumn and variableprimarykey i replaced it with string.
but your solution is very good.

upload.zip (824 Bytes)
BUT both solutions have one mistake. i show you with a new xml and fileupload.

The LIstfiles are alphanumeric sorted. The XML isn’t alphanumeric sorted. Its important to match the filename in xml-xpath URL with the datanames. Otherwise you get the false column names

armingrudd · June 13, 2019, 3:26pm

Your XML again has an error:
2 successive opening <Table> tags.
Here is the modified version:
upadestructure.xml (1.5 KB)

And here is the modified version of the workflow regarding:

xml_headers.knwf (68.2 KB)

The change in this version is the added String Manipulation (Variable) node in which the file name is extracted one step before the XPath creation. So we can use the file name in XPathes. I used the file name to specify the true tags we are looking for.
Regarding the changing structure of the CSV files, I had to replace the File Reader node with a CSV Reader node and add a Missing Value node.

Paddymaster · June 14, 2019, 9:29am

Your such a brain. Unbelievable. I tried your Solution. I’ll do a little bit smaller with String Manipulation.

#1 If you have Filenames with blank space you get errors. For example the Filename = “GB London.csv” will be Filename =“GB%20London”. Than you can’t match the Filename with the XPath. In the XML files are the blank space filenams replaced with a “". I don’t know how good my solution is but in String Manipulation I’ll correct this line regexReplace($${SURL}$$, “(./)|(\.csv)", “”) to
replace(regexReplace($${SURL}$$, "(./)|(\.csv)”, “”),"%20","”).

And the String Manipulation next to XML-Reader i wrote "replace(replace(replace(replace($XML$,"</VariableColumn>",""),"<VariableColumn>",""),"</VariablePrimaryKey>",""),"<VariablePrimaryKey>","")".

Is there a way to write multiple commands in different rows instead of writing command in each other?

for example
" replace($XML$,"<VariablePrimaryKey>","") replace($XML$,"</VariablePrimaryKey>","") replace($XML$,"<VariableColumn>","") replace($XML$,"</VariableColumn>","").

armingrudd · June 14, 2019, 10:18am

Well, you come with a new condition every time…

Change: Now uses the “Location” instead of “URL” to overcome the file name issue where there is a space in the file name. This change applies to String Manipulation (Variable) and CSV Reader nodes.

Yes, you can remove those tags and have all the column headers in the same level so you can skip the loop:
regexReplace($XML$, "(<VariableColumn>)|(</VariableColumn>)|(<VariablePrimaryKey>)|(</VariablePrimaryKey>)", "")

xml_headers.knwf (58.0 KB)

Paddymaster · June 14, 2019, 10:28am

genius: its a better solution.
Well thank you.

You’ve Right. Next time i’ll take more time for good Examples. Sorry.
I’m new at knime. I understand the principes but its hard for me to understand the syntax from code.

Paddymaster · July 16, 2019, 1:52pm

well; the solution is great.

sometimes i get CSV-Files with a comma at the end of the line.
Than the reader gives me one column more as aspected.
The Match in the Insert Column Header gives than an error and the workflow is broken.
I get also sometimes a file with the same name but only in small letters.
how can i change the workflow for this problem? i’ll upload a csv for this problem.file_headers_new.zip (385 Bytes)

armingrudd · July 16, 2019, 3:08pm

Use Missing Value Column Filter after the CSV Reader node to remove the additional column.

Add a upperCase or lowerCase function in String Manipulation nodes (Nodes 18 & 21).

Node 21:
lowerCase(regexReplace($XML$, "(<VariableColumn>)|(</VariableColumn>)|(<VariablePrimaryKey>)|(</VariablePrimaryKey>)", ""))

Node 18:
lowerCase(regexReplace($${SLocation}$$, "(.*\\\\)|(\\.csv)", ""))

And also change all capital letters of the constant strings in Variable Expression (Node 13) to lowercase:
join("/root/table[name/text() = \"", variable("fileName"), "\"]/variablelength/name")

Paddymaster · July 17, 2019, 1:57pm

well;

i’ll tried;

i just need to set up in the Node18 the first letter of the variable as uppercase.
how can i do that?.

maybe a rule engine after that ?

edit: Maybe with Rule Engine Variable.

$${SfileName}$$ LIKE “[a-z]*” => write First Letter Big.

don’t know how to write.

armingrudd · July 17, 2019, 2:55pm

That’s possible with “capitalize” function but I considered cases like “GB London” and thought it’s better to make all the characters lowercase or uppercase to prevent any problems.

Paddymaster · July 17, 2019, 3:12pm

my solution is “upperCase(substr($${SfileName}$$,0,1)) + substr($${SfileName}$$,1,length($${SfileName}$$)-1)”

yeah sure; but dont know how i can just lowercase the name in the xml. thats prob

armingrudd · July 17, 2019, 8:01pm

You have to use upperCase or lowerCase. For lowercase use the expressions in my previous reply or for uppercase use these expressions:

Node 21:

regexReplace(
	upperCase(regexReplace($XML$, "(<VariableColumn>)|(</VariableColumn>)|(<VariablePrimaryKey>)|(</VariablePrimaryKey>)", "")),
	"<\\?XML.*?>",
	"<?xml version='1.0' encoding='UTF-8'?>"
	)

Node 18:
upperCase(regexReplace($${SLocation}$$, "(.*\\\\)|(\\.csv)", ""))

Node 13:
join("/ROOT/TABLE[NAME/text() = \"", variable("fileName"), "\"]/VARIABLELENGTH/NAME")

Paddymaster · July 18, 2019, 8:11am

okay. thx.

The Workflow is already very good. But some specials make that workflow better.
The Final file is a XLSX-Sheet. But if i Have more than 1,5 m rows than excel cannot create a file. I must take a CSV-Writer. Does Knime have a Node with a IF-Decision of Count-Rows ?
For Example:

Below 1 000 000 rows do a XLS-Write and Above 1 000 000 write in CSV ?

ipazin · July 18, 2019, 11:03am

Hi there,

for workflow control there is IF Switch node and multiple Case Switch nodes. You can get number of rows in couple of ways. For example Extract Table Dimensions.

Here is example workflow:
https://kni.me/w/o4Cc5HNiVgCiXX5W

Br,
Ivan

Paddymaster · July 18, 2019, 11:08am

thats really genius @ipazin thx

but in this case you didn’t take the if switch node; just the case switch data.

I’ve a small problem: I added your workflow to mine. But I’ve more than one file. Thats why i ’ ve a loop like that given workflow above. I geht the Error --> “Active Scope End node in inactive branch not allowed.”. And the Error is given because I’ve the Variable Loop End after the CSV Writer and Excel Writer. I’ll give you a Screenshot .

I

armingrudd · July 18, 2019, 12:07pm

Based on @ipazin’s suggestion:

Node 25:

$${INumber Rows}$$ <= 1048576 => "top"
TRUE => "bottom"

And use the variable in IF Switch node. That’s it!

Paddymaster · July 18, 2019, 12:16pm

why do you need table row to variable and 2 times variable to table row ?

and my end loop is Variable Loop End. Its not possibel to connect with End If

edit: well its okay; i can change Variable Loop End with Loop End

armingrudd · July 18, 2019, 12:29pm

The Nodes 32 & and 33 are used to make it possible to use the End IF node.

You can change it. This change will make no difference in the output.

system · January 17, 2020, 12:29am

This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.