A few uncommon tasks

Fingers crossed! The internals of the component are the addition of 3 nodes to perform this. It uses a regex in a column filter to exclude any columns containing the repeat pattern in their name.

It then counts the columns remaining and passes that value to the math formula node. Hope it works! :wink:

1 Like

Somehow I missed this part of your reply. I agree that we can further automate the divisor part of the workflow. The divisor is actually the number of the different selections during the configuration of the Twitter Connector. Let me double check if I can extract this info as a variable somehow.

Update: Yes, the automation is possible via this formula:

(Total no. of columns in the loop end column append node)/(The iteration number of the last column of that particular node + 1)

The +1 is needed because the first iteration starts with zero. Hope you get the concept @takbb The sample workflow I uploaded didn’t include #Iter 1 by the way. I skipped it since it was just for demonstration purpose.

I’ll continue reading (and probably writing more if there’s more discussion) tomorrow :ok_hand: I’ll take a rest for the remaining hours of today. Really appreciate all replies so far!

2 Likes

Alrighty, I have tested the component with my own workflow. It seems that the correct regex for your column filter should be modified to this new one:

(^.Iter #.)

The mistake originates from my initial dummy workflow which had an incorrect syntax.

Other than that, everything is okay. The string widget can be removed as it serves no purpose since the fixed regex works just fine :grin:

1 Like

I went back and looked at the component and realised from what you said, that there was a bug in it. The String configuration wasn’t being passed through to the Column Filter!

I’ll fix that. I won’t remove the String configuration but I’ll update the default on it.to reflect this different string. Obviously I only tested it against the original sample data, and unfortunately the pattern that I had manually entered in the column filter was working for the sample data, so I didn’t spot the mistake.

I prefer that a component can be kept versatile, and the whole point is somebody can then use the component in future for a slightly different use. Ideally you should never have to edit a component or even look inside it to use it, just change the config in your workflow.
So if for example, a different scenario has fields simple like
Name
Address
Name (#1)
Address (#1)
and so on, it can still work by simple reconfiguration.

I’ll set the default pattern as slightly differently to how you suggested, as this is the pattern which identifies “repeat” columns, and is used to exclude them. I thought that was more user friendly than including “negation” on the config dialog. Just a shame I had misconfigured the column filter internally in the component. :wink:

Anyway, glad you got it working

Edit:

I’ve now updated the component. As you can see from the following screenshot, I’ve set the default config to allow for two different variations and of course this can be overridden if a different pattern is required.


btw @badger101 , when you use components off the hub, do you get them into you workflow by using drag and drop or do you click the “download” link?

I’m asking simply because I’ve noted that both with this and another component you have “dived into” the component’s innards to sort out problems which whilst not an issue, I find unusual. On a different thread you talked about “losing connections” when the nodes in the component were cut and pasted into a different workflow, which is something I’ve not encountered other people doing before. Normally you take the whole component!

Your comments got me wondering. Especially when you suggested I could then remove the String Configuration node… You know that if you drag and drop the component from the hub, directly into your workflow, it can be used straight off as if it were a “node” without having to open it up and look inside, and that the String configuration node does actually serve the purpose of allowing configuration without digging around inside the component, don’t you?

1 Like

@badger101 @takbb this is the expression I usually use to remove the Iter #x columns:
^((?!Iter #).)*$

May be there’s some irrelevancy there though, since I’m not an expert at Regex :rofl:

But that’s what I’ve been using since it works. Maybe someone can optimize it for me :wink:

3 Likes

do you get them into you workflow by using drag and drop or do you click the “download” link?

With regards to drag and drop, I wish I can solve this issue I mentioned in my other post:

I am not able to drag and drop items between my Knime window and my browser. It’s a longstanding issue that can’t be solved by the many ‘solutions’ I Googled on the net. I believe majority of the active forum contributors here use Apple products, so I’m lacking help in this area.

Interesting, I’m on Windows 10 and have used KNIME on four different machines without problems with dragging and dropping components from hub.

Maybe good if you posted a new thread (apologies if you have already) with the problems you get, describing what happens when you try. Would be interested to try to help resolve it as is a useful feature.

1 Like

Thank you! When I do, I’ll tag you. But it won’t be in the near future, since I am preoccupied with my Twitter project at the moment. :ok_hand:

1 Like

I have to minimize the browser window so that I can see both open programs to get it to work, but it has always worked for me on windows 10 / 11 on dozens of PCs.

Thank you @bruno29a for sharing an alternative syntax. At the moment, I am using that simple syntax I wrote, and it seems to be working fine; it’s merely for identification rather than removal.

1 Like

@bruno29a @takbb @iCFO , if you guys are still online, would you be kind enough to do a quick check for me ? I want to confirm whether the timestamp we’re seeing on Twitter is standardized to everyone living at different timezones, or if it’s different according to one’s device? For example, this tweet by NBCNews about 20 minutes ago appears to me like this:

Is it different for you?

I see it in my local timezone (BST) in the UK.

1 Like

Thanks! It seems that both the Twitter web app and Knime’s Twitter API results show updated timestamps as soon as I changed the timezone of my computer.

Good day everybody. It is never my intention to delve into the Column Aggregator topic further, but since I made a new Twitter query today, I decided to adopt @DiaAzul 's assumption into my revised workflow. In alignment with the assumption that

which I thought to make sense, I went with that choice. I had never tested the Missing value count method the other day.

To my surprise, it took quite a long time to finish. What happens after that is my curiosity (which kills the cat) develops, so I decided to do a proper trial for all 3 options. So here are some background, and the results:

  1. Test table dimension was 7816 x 11, where most columns were of String types, along with Long, PNG Image and Integer types. All methods result in a table of 3844 x 11 dimension.

  2. In between each test, I closed and re-opened the workflow, and ran the Garbage Collector before and after the workflow re-opening.

  3. The Timer Info Node was used to measure the execution time.

Curiosity did kill the cat. We all (or most of us) sided with Concat, but it turned out that Set won the race!

Maybe what @takbb wrote above may help explain this:

:man_shrugging:

Anyways, have a good weekend!

2 Likes

If you want to do benchmarking of different methods you can also use the Benchmark Start / End nodes from the Vernalis community contribution

You can also run the garbage collector during each iteration if you with using either of teh 2 Vernalis Garbage Collector nodes:

Steve

2 Likes

Thought I’d tag your other post re drag and drop here, in case future people find this and have a similar question…

2 Likes

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.