Hello everyone,
I am faced with the challenge of processing a large number of files (~ 3 billion) for hot and cold. The data comes from a database and should be written to different nodes, depending on whether they are classified as cold or hot. The problem or bottleneck is the DB reader, as it has to read each line and then pass it on to one of the next two nodes.
Do you have any ideas as to whether the whole thing can be built differently and faster?
Thanks in advance!