While processing whatsapp chats, we are facing problem. Every new chat start with date,time-user: “ChatText”. If “ChatText” contains “Enter” ( pressing "Enter"while discussing in whatsapp) then chat starts from new line.
While processing using “File Reader” , First row contains “Start of the chat with date,time-user” but text after “Enter” going in to next row. No. of rows increases with no. of “Enter”.
We want all the chat text as a single record in single row instead of in multiple row.
Sample input Text file is as below:
11/30/18, 10:09 AM - xxxxxxxxxxx: Hi all ,
Could anybody please help with engineering management universities
I’m give TOEFL tomorrow
Need to add score recipients
11/30/18, 12:20 PM - xxxxxxxxxxx: Manhattan premium account for sale at a low price. If anyone interested PM.
11/30/18, 12:51 PM - xxxxxxxxxxx: its fine for Germany???
11/30/18, 12:52 PM - xxxxxxxxxxx: Can anybody review my profile for rwth and Mannheim data science
TOEFL - 103
3+ years of experience in data field as a software engineer
BTech - 8.12 CGPA
12th- 74
11/30/18, 12:53 PM - xxxxxxxxxxx: Woaajh
Expected Ourput
11/30/18, 10:09 AM - xxxxxxxxxxx: Hi all ,Could anybody please help with engineering management universities I’m give TOEFL tomorrow Need to add score recipients
11/30/18, 12:20 PM - xxxxxxxxxxx: Manhattan premium account for sale at a low price. If anyone interested PM.
11/30/18, 12:51 PM - xxxxxxxxxxx: its fine for Germany???
11/30/18, 12:52 PM - xxxxxxxxxxx: Can anybody review my profile for rwth and Mannheim data science TOEFL - 103 3+ years of experience in data field as a software engineer BTech - 8.12 CGPA 12th- 74
11/30/18, 12:53 PM - xxxxxxxxxxx: Woaajh
Every row should start with date,time-user: .
whatever the text comes in next row until next chat should come under above row.
either you can, based on format of your file and playing with File Reader many options, try to get expected output directly from File Reader node. Or after you read it as above shown apply logic in KNIME to get your output.
Logic (mine at least) would be something like:
add unique indicator (number for example) to rows belonging to the same message
use Group Loop Start node on that indicator and in each iteration transpose rows and then combine columns
Logic used in workflow attached is a bit unusual but there are some comments and the idea is pretty much what I have wrote above. Just to get to this idea funny methods are used
I see now. That is fine as well. Only two different messages can have same time I guess so grouping might be wrong. That is way I would rather go with identifier which should be unique. Additionally you can change delimiter in GroupBy node. Default is comma but you can put space.
Actually your solution gave me idea which simplifies workflow a lot! Instead of rowindex() -1 missing value is inserted. This way you can use Missing Value node immediately. Check it out. Also regex is bit improved