Extract http-link from Excel-content

Hi together,

I would like to read the content of an excel sheet.

One of the column contains a text information with a http-link behind.

I would like to extract only the http-link of the cell.

Is there a way to do this?

The cell looks like this:

image

And the result should look like this:

image

Thanks a lot and kind regards,

Andreas

Interesting challenge. I don’t think this can be solved using the Excel Reader. I had a quick play with a Python Script Node assisted by K-AI - however had to manually debug a little as well:

Here’s an image incl. those topics that I had to manually modify:

  1. Path to file - K-AI tried to load the file from the root of the knime workspace so I had to adjust it - the “r” ahead of the string is important to avoid escaped charcters etc.
  2. row / column: K-AI tried to start from row 1 col 1 - either way if you have a longer list you’ll have to iterate over each row…

So far for a quick and very dirty solution attempt - maybe someone else has a slightly easier solution at hand :slight_smile:

Here’s the code:

import knime.scripting.io as knio
import pandas as pd
import openpyxl

# Get the current workspace path
workspace_path = knio.flow_variables["knime.workspace"]

# Define the path to the Excel file
excel_file_path = workspace_path + r"\forum1\ExtractLink\data\test.xlsx"

# Load the workbook
workbook = openpyxl.load_workbook(excel_file_path)

# Get the first sheet in the workbook
sheet = workbook.worksheets[0]

# Get the hyperlink from the first cell
hyperlink = sheet.cell(row=2, column=1).hyperlink.target

# Load the data from the Excel file into a pandas DataFrame
df = pd.read_excel(excel_file_path)

# Add the hyperlink to a new column in the DataFrame
df['Hyperlink'] = hyperlink

# Output the DataFrame as a KNIME table
knio.output_tables[0] = knio.Table.from_pandas(df)
6 Likes

The excellent method from @MartinDDDD is the way to go I’d say. As mentioned, the Excel Reader cannot handle this.

2 Likes

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.