Create a lsrt day month from an text

I all, I new in Knime, and I want to create a date from a text like this: “20201205” but in a date of last day of this data as: “31-12-2020” or at first day of month like: “01-12-2020”.

I need do this at many columns that originaly are text, and I know that this is possible using “java snipped” but Im not an Java advance programer. And I know that this is posible using “String to date” and using some extra node, but my data is really big and I really want to reduce the number of steps/node to use to reduce the time and memory use…

I really appreciate your help.

Thanks.

Take a look at these components provided by @takbb.

1 Like

Hi again,

There are an example of I nead:

=+FECHA(EXTRAE(A3;1;4);EXTRAE(A3;5;2)+1;1)-1

OK. Did you look at the components I posted?

hi rfeigel, yes I look them, but any of this do that I need :frowning: , As I said I think that the best way is using “java snipped” node but I dont know how to program it. Due that I have a really big big DB more than 200 million of lines.

Thanks for your fast answer :slight_smile:

Could you upload some sample input and output? It seems to me you need to convert your input strings to dates to be able to calculate month beginnings/ends. Why are you convinced a Java snippet is the most efficient way to do this?

1 Like

hi rfeigel, because as my DB have many columns dates to convert, and if I use other nodes I must to use at least two nodes to convert each one this string columns in to the dates with my needest.

thanks a lot :slight_smile:

Input output.xlsx (10.2 KB)

1 Like

hey @frepez ,

great question and welcome to the forum.

Here’s how I would solve it:

Some remarks:

Definitely have a look at the components that @rfeigel recommended. @takbb is a legend and especially for larger datasets Brian’s approach might be more efficient :wink:

I just had exactly the same problem some time ago and solved it as in the attached workflow:

  • convert the text (string) to a data column
  • shift the whole date one month into the future
  • extract the “day of month number” and make it negative (multiply with -1)
  • subtract it from the shifted date. Mathematically this would result in the 0th of the month, but as there is never a 0th of the month it will result in the last day before that shifted date which is the last day of month of the original date column
  • then take that same helper column (Day of Month), which is currently negative and add 1 (+1) to it
  • Do the date&time shift again but this time subtract it from the original date column . Voilà → you have the first day of the month.
  • clean up the table

Now all you would have to do is to basically loop through the columns one by and apply the same calculations.

Might be worth making a component out of it.

Let us know, if you need any further help with this.

2 Likes

Hi @frepez , Welcome to the KNIME community, and thank you to both @rfeigel and @kowisoft for the kind comments.

I think @kowisoft 's solution is excellent and a good demonstration of how this can be achieved using standard nodes and no scripting.

As my date components got a mention, I thought I’d take a look at your particular question anyway. The components I posted only handle months relative to the “current” date, and so would need some modification to handle specific dates as you have. This is something I have been thinking about because I have seen similar asked on the forum a couple of times now. The components I posted on the hub make use of some “hidden” features of String Manipulation which allows it to make some basic java calls, provided you can write it in a way that can be translated into a single line of java code.

This workflow contains two String Manipulation (Multi-Column) nodes to return the required dates as Strings

First and Last day of month with String Manipulation java call.knwf (21.0 KB)

This is a chatGPT chat I had to let it do the work of providing me with the java that I could adapt. I commonly take this approach now of letting AI do the hard work, and then I adapt it. :slight_smile:

This resulted in the following code for “last day of month”

string(
$$CURRENTCOLUMN$$==null
?$$CURRENTCOLUMN$$	
:java.time.LocalDate.parse($$CURRENTCOLUMN$$, 
	java.time.format.DateTimeFormatter.ofPattern("yyyyMMdd")).withDayOfMonth(
		java.time.LocalDate.parse($$CURRENTCOLUMN$$, 
			java.time.format.DateTimeFormatter.ofPattern("yyyyMMdd")).lengthOfMonth()
		)
	)

and this code for “first day of month”

string(
$$CURRENTCOLUMN$$==null
?$$CURRENTCOLUMN$$	
:java.time.LocalDate.parse($$CURRENTCOLUMN$$, 
	java.time.format.DateTimeFormatter.ofPattern("yyyyMMdd")).withDayOfMonth(1)		
	)

That code is definitely not something that everybody would enjoy, lol.

Unlike its non-multi column counterpart, String Manipulation (Multi Column) is very fussy about the data types it can handle, and I was not able to pass it actual Dates. I therefore had to pass it your dates as Strings in yyyyMMdd format.

At the end of this, I had two different formats of dates. Your originals in yyyyMMdd format, and the output from the String Manipulations which was yyyy-MM-dd format.

You may be interested to know that I was able to convert both of these with a single String to Date&Time call, by passing it two “optional” date formats as a single format:

[yyyyMMdd][yyyy-M-d]

It can be a useful trick, provided the formats are sufficiently different that it can decide which one a value conforms to.

As @kowisoft mentioned, it is possibly worth making a component, and I will probably turn these into components to add to the others listed as I can see they could be useful.

I think for your purposes that I would recommend using @kowisoft 's approach, at least at first and see how it goes, as although the node count is higher, it is probably more simple to understand the steps, but this String Manipulation/java hybrid approach is an option if you are happy with “more scripty”.:wink:

btw, @frepez if you really want to know how to do this with Java Snippet, I can demo how to do it for you (let me know and I’ll find some time), but these would have to be written either hard-wired to your four columns, or you’d have to have a snippet to deal with a single column at a time, and put it in a column loop.

It is true to say that java snippet performance is often much better than combinations of other nodes for large data sets simply because the code is compiled and it runs as a single module. But in my view, Java Snippets are best tested and turned into a component to hide the complexity, as otherwise they do have a tendency to frighten people away… a bit like my above String Manipulations, :rofl:

2 Likes

Hi @takbb,

I really appreciate your help, this really woks perfectly for me!!!

In any case, I was thinking in java snipped due this video or code tha I found, previously:

Please letme know if in the future you create a node that do this! :slight_smile:
Because after do this I need to do some calculations with this dates (days between, months between, etc)

Again thanks !!!

1 Like

Hi all,

Using your idea I do someting similar and I want to share: usin the same “string manipulation” node to create a first day extracting the first 6 caracters thar represent year and month and concatenate “01” for for all dates in db and then was used standar “string to date” node.

this is the used code:
join(substr($$CURRENTCOLUMN$$, 0,6),“01”)

I really appreciate if in the near future is created a node or adapted the node “string to date” to do this for the last day or firs date, in order to use the less number of nodes posible and increase the speed of my workflow.

Thanks again.

Hi @frepez , yes you are correct that the first day of the month can always be found with a simple String manipulation. Finding the last day though obviously requires more calculations and has to take into account leap years, and what happens on the century etc, which I think could actually be achieved with a single String Manipulation node using all the standard functions, but life is too short :wink:

I haven’t been building actual nodes, only components, so there are limitations both technically and to the level of effort required, so I’m not going to look to adapt the “string to date” node. However, I have built a component that can accept one or more Date columns, and return a selection of date offsets:

I figure that it isn’t that arduous to convert strings into dates first. After all if the component accepted strings or numbers, it wouldn’t know what format they were in so the entire String to Date&Time node would have to be replicated. !

Configuring your Excel to return your “integer dates” as strings was simply a case of modifying the transformation tab in the config of the Excel Reader.

This component does make use of a Java Snippet (instead of String Manipulation) to perform all the date calculations. It’s a relatively simple snippet and once again the actual body of the code was written for me by chatGPT, which I then adapted to work with the Java Snippet.

Of course when it comes to building this as a component, whilst the Java Snippet is the heart of it, there is a lot of extra baggage added to make the component generic, and allow user selections. This is the internals of the component just to be the “life support” for the Java Snippet (highlighted by yellow annotation):

Here is an updated workflow demonstrating the component:
First and Last day of month with Component.knwf (90.9 KB)

The component can be found on the hub here

Aaaah… @takbb - once again, you were faster. Wanted to do the same (of course not daring to touch code :smile: ) but you did it already.

1 Like

Still nothing stopping you @kowisoft :wink:

Have a great Christmas… doing any dungeons and dragons over the festive season? :slight_smile:

1 Like

Yeah, maybe I do it :wink:

Right now I am more into solo’ing for which is D&D a bit too crunchy. I played a lot “4 Against Darkness” recently :smile:

Great Christmas to you and your loved ones as well, Brian! :slight_smile:

1 Like

hi Takbb, thanks for your help.

I try your component and after run, I want to share the time results obtained after run, each component run first day or last day.

The blue is with my created process for fist day and yellow is with the new node.

Hi @frepez, is this related to a different thread?

I can’t recall what this was about and it seems unrelated to this thread.

I’m also not sure if you are asking a question, or just providing a demonstration of something.

hi @takbb, is the same thread.

I can´t understand why this new node take a long time to process the same data.
your previous proposal “string manipulation” using (join(substr($$CURRENTCOLUMN$$, 0,6),“01”)) and after “string to date” take 9 minutes and with the new node take 2,5 hour.

I do something wrong? or do you other options?

Thanks in advance

Hi @frepez, it’s difficult to say without being able to see anything :wink:

The component is going to be doing more work because it contains many more nodes and I’ve had to make it generic whereas the String Manipulation has been configured for exactly one purpose. I don’t know though if that is the cause.

Maybe you can upload a subset of the data as a workflow showing both methods as a comparison. Upload sufficient data to demonstrate a time difference, e.g. presumably if its a data size thing, then maybe upload about 1/100th of the data, and we should still be able to see a measurable difference.

Have you tried opening the component and executing it to see where it appears to have the biggest delay?

Hi @takbb, after open component is showed this advise: “this is a linked component and therefore cannot be edited” then I can´t put a time monitor to see subnodes duration (or I dont´ know how to, I´m newbie!!)
I share a workflow to comparing