A few uncommon tasks

Hi again @takbb , first of all, thank you for such a detailed post.

As a component, that column rename element is a nice addition. I haven’t checked the component, but based on what you’ve written (and please correct me if I’m wrong), for this particular issue of mine, it seems that the component needs to be fed by #Iter.* strings right? I have no doubt that it’ll work when all column names contain that substring. As I mentioned in the post, the issue with the Loop End (Column Append) Node is that the first iteration (Iteration 0) doesn’t have that substring. Here’s a dummy outcome from a real result of that node to re-emphasize my point:

Since the info we have at hand is how many columns we have per group of iterations, it’s possible to use this info to rename the first x columns such that they include #Iter 0 substrings, before feeding the table into your component. What I can do is to manually rename the first x columns using the Column Rename Node. I can see that this is doable. Do you have any pointers as to how can I do it in a less tedious manner using the Column Rename (Regex)? (I’m guessing @ArjenEX might have figured out something by now, since this node was mentioned earlier.) I would appreciate any workaround that automates things as much as possible.

Let me first point out that as far as example workflows with Twitter API in the hub is concerned, it can be considered that iterating query variations in loops is uncommon. Most of the workflows up there shows that people are querying high volumes of Tweets and analyzing them one query at a time. While that’s something I do too, my task also involves many variations from that one query.

The typical loop end node is not able to hold the result from a previous query, as the incoming results overwrite the table. The typical way to overcome this hurdle is by writing the results on an output file, as shown in —> this workflow <— on the hub (not mine), or suggested —> on this post. <—

It’s just my personal choice to find my own way around it, which is by using the Loop End Column Append Node. I like to keep everything inside the workflow, and I don’t like creating unnecessary files that I won’t use. Saves me space, and avoids cluttering in the destination folder. Again, it’s just my personal choice :grin:

1 Like

Hi @badger101 the component should deal with it automatically. In fact it relies on the #iter pattern (or whatever pattern is typed into the component’s config dialog ) to NOT be present in the first n columns and from this it determines the value of n.

It’s intended that it be fully automatic. If your repeat columns contain the stated pattern in their name, then no matter how many columns wide or how many iterations, the component should do the rest.

2 Likes

@takbb If that’s so, I believe that is the solution for #2 . I’ll continue with my work tomorrow and will test it out! Thank you, in the meantime!

2 Likes

Fingers crossed! The internals of the component are the addition of 3 nodes to perform this. It uses a regex in a column filter to exclude any columns containing the repeat pattern in their name.

It then counts the columns remaining and passes that value to the math formula node. Hope it works! :wink:

1 Like

Somehow I missed this part of your reply. I agree that we can further automate the divisor part of the workflow. The divisor is actually the number of the different selections during the configuration of the Twitter Connector. Let me double check if I can extract this info as a variable somehow.

Update: Yes, the automation is possible via this formula:

(Total no. of columns in the loop end column append node)/(The iteration number of the last column of that particular node + 1)

The +1 is needed because the first iteration starts with zero. Hope you get the concept @takbb The sample workflow I uploaded didn’t include #Iter 1 by the way. I skipped it since it was just for demonstration purpose.

I’ll continue reading (and probably writing more if there’s more discussion) tomorrow :ok_hand: I’ll take a rest for the remaining hours of today. Really appreciate all replies so far!

2 Likes

Alrighty, I have tested the component with my own workflow. It seems that the correct regex for your column filter should be modified to this new one:

(^.Iter #.)

The mistake originates from my initial dummy workflow which had an incorrect syntax.

Other than that, everything is okay. The string widget can be removed as it serves no purpose since the fixed regex works just fine :grin:

1 Like

I went back and looked at the component and realised from what you said, that there was a bug in it. The String configuration wasn’t being passed through to the Column Filter!

I’ll fix that. I won’t remove the String configuration but I’ll update the default on it.to reflect this different string. Obviously I only tested it against the original sample data, and unfortunately the pattern that I had manually entered in the column filter was working for the sample data, so I didn’t spot the mistake.

I prefer that a component can be kept versatile, and the whole point is somebody can then use the component in future for a slightly different use. Ideally you should never have to edit a component or even look inside it to use it, just change the config in your workflow.
So if for example, a different scenario has fields simple like
Name
Address
Name (#1)
Address (#1)
and so on, it can still work by simple reconfiguration.

I’ll set the default pattern as slightly differently to how you suggested, as this is the pattern which identifies “repeat” columns, and is used to exclude them. I thought that was more user friendly than including “negation” on the config dialog. Just a shame I had misconfigured the column filter internally in the component. :wink:

Anyway, glad you got it working

Edit:

I’ve now updated the component. As you can see from the following screenshot, I’ve set the default config to allow for two different variations and of course this can be overridden if a different pattern is required.


btw @badger101 , when you use components off the hub, do you get them into you workflow by using drag and drop or do you click the “download” link?

I’m asking simply because I’ve noted that both with this and another component you have “dived into” the component’s innards to sort out problems which whilst not an issue, I find unusual. On a different thread you talked about “losing connections” when the nodes in the component were cut and pasted into a different workflow, which is something I’ve not encountered other people doing before. Normally you take the whole component!

Your comments got me wondering. Especially when you suggested I could then remove the String Configuration node… You know that if you drag and drop the component from the hub, directly into your workflow, it can be used straight off as if it were a “node” without having to open it up and look inside, and that the String configuration node does actually serve the purpose of allowing configuration without digging around inside the component, don’t you?

1 Like

@badger101 @takbb this is the expression I usually use to remove the Iter #x columns:
^((?!Iter #).)*$

May be there’s some irrelevancy there though, since I’m not an expert at Regex :rofl:

But that’s what I’ve been using since it works. Maybe someone can optimize it for me :wink:

3 Likes

do you get them into you workflow by using drag and drop or do you click the “download” link?

With regards to drag and drop, I wish I can solve this issue I mentioned in my other post:

I am not able to drag and drop items between my Knime window and my browser. It’s a longstanding issue that can’t be solved by the many ‘solutions’ I Googled on the net. I believe majority of the active forum contributors here use Apple products, so I’m lacking help in this area.

Interesting, I’m on Windows 10 and have used KNIME on four different machines without problems with dragging and dropping components from hub.

Maybe good if you posted a new thread (apologies if you have already) with the problems you get, describing what happens when you try. Would be interested to try to help resolve it as is a useful feature.

1 Like

Thank you! When I do, I’ll tag you. But it won’t be in the near future, since I am preoccupied with my Twitter project at the moment. :ok_hand:

1 Like

I have to minimize the browser window so that I can see both open programs to get it to work, but it has always worked for me on windows 10 / 11 on dozens of PCs.

Thank you @bruno29a for sharing an alternative syntax. At the moment, I am using that simple syntax I wrote, and it seems to be working fine; it’s merely for identification rather than removal.

1 Like

@bruno29a @takbb @iCFO , if you guys are still online, would you be kind enough to do a quick check for me ? I want to confirm whether the timestamp we’re seeing on Twitter is standardized to everyone living at different timezones, or if it’s different according to one’s device? For example, this tweet by NBCNews about 20 minutes ago appears to me like this:

Is it different for you?

I see it in my local timezone (BST) in the UK.

1 Like

Thanks! It seems that both the Twitter web app and Knime’s Twitter API results show updated timestamps as soon as I changed the timezone of my computer.

Good day everybody. It is never my intention to delve into the Column Aggregator topic further, but since I made a new Twitter query today, I decided to adopt @DiaAzul 's assumption into my revised workflow. In alignment with the assumption that

which I thought to make sense, I went with that choice. I had never tested the Missing value count method the other day.

To my surprise, it took quite a long time to finish. What happens after that is my curiosity (which kills the cat) develops, so I decided to do a proper trial for all 3 options. So here are some background, and the results:

  1. Test table dimension was 7816 x 11, where most columns were of String types, along with Long, PNG Image and Integer types. All methods result in a table of 3844 x 11 dimension.

  2. In between each test, I closed and re-opened the workflow, and ran the Garbage Collector before and after the workflow re-opening.

  3. The Timer Info Node was used to measure the execution time.

Curiosity did kill the cat. We all (or most of us) sided with Concat, but it turned out that Set won the race!

Maybe what @takbb wrote above may help explain this:

:man_shrugging:

Anyways, have a good weekend!

2 Likes

If you want to do benchmarking of different methods you can also use the Benchmark Start / End nodes from the Vernalis community contribution

You can also run the garbage collector during each iteration if you with using either of teh 2 Vernalis Garbage Collector nodes:

Steve

2 Likes

Thought I’d tag your other post re drag and drop here, in case future people find this and have a similar question…

2 Likes

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.