Hey, I have thousands of log files with average volume size of 150 MB (in a txt format). I am starting to plan how to store and ETL the Data. First I was wondering whats the best way to store the data-Log files, SQL DB. Second I would like to know if this size of data is big enough to use Big Data tools (hive for example)? p.s. The entire data is more then 10 Billion rows. thanks
More than 10B rows will inevitably give you a headache - that count is well past the "integer limit" of 2B, which affects most RDBMS. I'd advise you to do some research about this beforehand.
Switching to "Big Data" to avoid such constraints is fashionable, but implies a steep learning curve in a very dynamic ecosystem. Maybe other scalable noSQL systems like MongoDB or CouchDB could fill the gap, though KNIME doesn't quite facilitate their use yet (due to change soon).
Finally, flat files have all kinds of speed issues, so you don't really want to go there with this volume of data.
Bottom line: to my mind, ETL into an adequate RDBMS should give you the best result in the near to mid-term.
thank you very much,
if we going to change the file format to tdms, is it going to change something?
Never heard of TDMS, so I googled it - suffice to say it probably doesn't change anything for KNIME, no matter what benefits it may have for certain NI applications.