I’m trying to upload the Australian Business Register data to my company SQL server.
This is the link with the files for download: https://data.gov.au/dataset/abn-bulk-extract
The data is stored in an XML format. What would a workflow that converts the data to a row by column format look like? The website also supplies the XML schema.
I don’t need to worry about privacy either since this is publicly available data.
Thanks for your help everyone!
See attached workflow. However, due to the memory problem with reading large XML file,
I have done it just on a small sample file abn.xml.
Hope it helps !
ABN.knwf (9.0 KB)
abn.xml (4.1 KB)
Thanks for your response. I managed to get it to work using
File Reader --> String to XML --> Xpath --> Database Writer
The present challenge is how I can make the workflow loop through a folder which divides the ABN data into 20 files and append each new upload to the existing set.
See attached workflow - it reads data from multiple *.table files (created by the Table Writer node)
into a single table. I hope it’s that what you need.
ABN_read_table.knwf (10.2 KB)
Thank you for your quick response, your example workflow is incredibly helpful. I’ve attached a screenshot of my workflow with your chunk loop start and loop end nodes. My goal is to upload this data to my companies’ SQL server, with this in mind I’ve used the database writer node. Would this be the right setup?
that workflow looks fine, but you need to use a Variable Loop End node at the end of the loop, instead of the Loop End node. It is meant for exactly this use case, where you do not have data at the end of your loop execution that you want to keep in the workflow, e.g. because you have written it to disk or to a database.
I’m not sure if this should be working, but it writes data on server in the loop. IMHO, it’s better to collect all data which has to be written in “Loop End” node and write them on server in single step which look like this.
What seems to me more important, before writing data on production database, I would setup some testing environment like MS SQL Server Express to check if data are written correctly to avoid problem with data consistency.
Good luck !