extracting lines from a multiline content with a reference table

Hello. I need to extract specific patent numbers from a list of PNs in a single cell. That is I have this:
DE 102005039579
CA 2619278
AU 2006281783
WO 2007019845
KR 2008034925
EP 1917004
CN 101299998
JP 2009504688
JP 5512968
EP 2266544
KR 2011094222
IL 189079
CN 102716068
BR 2006014835
RU 2480201
ES 2635493
IN 2008MN00298
IN 247903
US 20080187595
AU 2011236018
US 20120225128
US 20160067338
US 9827312

and I want a new column with one patent number, extracted with kind of a priority table. If a WO number is found, extract this, if not, take the US, if that isn’t found, take EP and so on.

Is that possible?
Thank you.

Hi @lafringuella , for this Do you consider the patent number to be the whole string with what I assume is the country code and the numeric part, or is the patent number just the numeric part?

You would obviously have to define all the priority codes. I would assume you’d have a lookup table such as this:

I just used the priority numbers you have and then arbitrarily assigned the remainder down the table.

You can split the cell into rows using Cell Splitter into a List and then ungroup to generate rows. You can then split each row into the letter and numeric parts again using another Cell Splitter. The priority can be assigned using Value Lookup (KNIME 5.2).

What is the expected output from the sample list that you have given? I couldn’t quite understand what you meant. Do you just want all the previous list sorted in priority order? The numerics don’t repeat themselves in your sample data so what did you mean by “if a WO number is found, extract this, if not take the US…”. Maybe you just want a single value output (the first WO, or US, or EP… ?)

btw, welcome to the KNIME community.

I couldn’t help but notice in both this post and your other recent post you mention that your data is in a single cell. This to me is quite unusual. Normally data is already occupying many rows in a table, which is more straightforward to deal with. Is there any reason why it is in a single cell?

1 Like

Hi @takbb , the patent number is the whole string. Thank you for your suggestion, I will try that. I’m quite new to KNIME and have to find my way around.

The purpose of this action is to create a link to the fulltext for the customer. They have preferred countries, so I need the WO publication in case there is one. In case there’s no WO, I need the US. Just one for creating the link.

The reason why the values are in a single cell is that they come this way from the database. All patent numbers and also the dates belong to a single patent family, which is one record in the database.

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.