WhatsApp Chat Analysis

Hi,

While processing whatsapp chats, we are facing problem. Every new chat start with date,time-user: “ChatText”. If “ChatText” contains “Enter” ( pressing "Enter"while discussing in whatsapp) then chat starts from new line.

While processing using “File Reader” , First row contains “Start of the chat with date,time-user” but text after “Enter” going in to next row. No. of rows increases with no. of “Enter”.

We want all the chat text as a single record in single row instead of in multiple row.

Sample input Text file is as below:

11/30/18, 10:09 AM - xxxxxxxxxxx: Hi all ,
Could anybody please help with engineering management universities
I’m give TOEFL tomorrow
Need to add score recipients
11/30/18, 12:20 PM - xxxxxxxxxxx: Manhattan premium account for sale at a low price. If anyone interested PM.
11/30/18, 12:51 PM - xxxxxxxxxxx: its fine for Germany???
11/30/18, 12:52 PM - xxxxxxxxxxx: Can anybody review my profile for rwth and Mannheim data science
TOEFL - 103
3+ years of experience in data field as a software engineer
BTech - 8.12 CGPA
12th- 74
11/30/18, 12:53 PM - xxxxxxxxxxx: Woaajh

Expected Ourput :slight_smile:

11/30/18, 10:09 AM - xxxxxxxxxxx: Hi all ,Could anybody please help with engineering management universities I’m give TOEFL tomorrow Need to add score recipients
11/30/18, 12:20 PM - xxxxxxxxxxx: Manhattan premium account for sale at a low price. If anyone interested PM.
11/30/18, 12:51 PM - xxxxxxxxxxx: its fine for Germany???
11/30/18, 12:52 PM - xxxxxxxxxxx: Can anybody review my profile for rwth and Mannheim data science TOEFL - 103 3+ years of experience in data field as a software engineer BTech - 8.12 CGPA 12th- 74
11/30/18, 12:53 PM - xxxxxxxxxxx: Woaajh

  1. Every row should start with date,time-user: .
  2. whatever the text comes in next row until next chat should come under above row.

I mean to say, Every chat in the discussion should read as new record to process it further for the analysis.

Hi there,

either you can, based on format of your file and playing with File Reader many options, try to get expected output directly from File Reader node. Or after you read it as above shown apply logic in KNIME to get your output.

Logic (mine at least) would be something like:

  • add unique indicator (number for example) to rows belonging to the same message
  • use Group Loop Start node on that indicator and in each iteration transpose rows and then combine columns

Anyways sharing a sample file could help :wink:

Br,
Ivan

1 Like

Thank you for your reply.

Please find attached file.ForProcessing.txt (3.1 KB)

Output file would be like ForProcessingOutput.txt (3.0 KB)

Hi there,

Logic used in workflow attached is a bit unusual but there are some comments and the idea is pretty much what I have wrote above. Just to get to this idea funny methods are used :smiley:

If any questions feel free to ask.
2019_06_25_WhatsUp_Analysis.knwf (19.4 KB)

Br,
Ivan

2 Likes

Hi ipazin,

Thanks for your reply, I have used below nodes.

Please find attached.WhatsApp.knwf (12.4 KB)

Hi @prashantk,

I see now. That is fine as well. Only two different messages can have same time I guess so grouping might be wrong. That is way I would rather go with identifier which should be unique. Additionally you can change delimiter in GroupBy node. Default is comma but you can put space.

Actually your solution gave me idea which simplifies workflow a lot! Instead of rowindex() -1 missing value is inserted. This way you can use Missing Value node immediately. Check it out. Also regex is bit improved :wink:

2019_06_25_WhatsUp_Analysis.knwf (14.0 KB)

Br,
Ivan

3 Likes

Hi @ipazin,

Its a great solution…

Thank You.

1 Like

Hi,

Now we have date and time. How can we run a report : No of discussions in an hour of the day.

By looking at this report we can see at what time most discussions happen and at what time less discussion happen.

I have split date and time but unable to operate on time part. its in HH:mm a but in string format.

Thank You.

Hi,

Solved by using “h:mm a” in “String to Date&Time” Node.

Thank You.

1 Like

Based in your example, I started to test multiple Chat exports. When they come from different sources, the chat structure changes.

(He said, She said Group Chats)

I’ve been testing to split the chat: Date - Sender - Message

This is the Idea.

Regex Split.
[\[]?((?:\d{1,4}.+[ap]\..m\.)|(?:\d{1,4}.+\d{2}:\d{2})|(?:.+))(?:\s-\s|\]\s)(.+?)[:](.*$)

Requieres a Definition file to uploads the Chats.
chatCode,encode,dateFormat,path

not finished.

1 Like

Hello,

did you ever figure this out?