I am trying to filter a database table column via Python script, but when I am trying to execute it i am getting the error: output_table = input_table.copy() ^SyntaxError: invalid syntax

gnime · October 15, 2019, 7:56am

Here is the code:

input_table[input_table.Document.contains("<span", regex = true, na = False)

So this is not problem with the code it seems but rather with the output_table?

Somebody can help?

mlauber71 · October 15, 2019, 8:34am

Not exactly sure what you want to do but in Python dropping a column might look something like this:

import pandas
# https://stackoverflow.com/questions/21285380/find-column-whose-name-contains-a-specific-string
v_colums_to_filter = [col for col in input_table.columns if ‘<span’ in col]

# https://thispointer.com/python-pandas-drop-columns-in-dataframe-by-label-names-or-by-index-positions/

input_table = input_table.drop(v_colums_to_filter , axis=‘columns’)
output_table = input_table.copy()

kn_example_python_filter_columns.knwf (8.1 KB)

Edit: KNIME Snippets (5) — Python Overview | by Markus Lauber | Low Code for Data Science | Medium

mohitanandagarwal · October 15, 2019, 10:45am

is it that you are trying to select few columns from a table, if yes.

use ‘Column Selector’ Note to take out column , hope this helps. thanks

gnime · October 15, 2019, 11:15am

Hi @mlauber71,

Thanks for your answer.

My bad for not being clear enough with my problem.
I am not getting the error with the output table anymore.
I have a database table with 33.000 rows which contain the contents of HTML files.

I am trying to iterate over every row to filter the actual text in files, which is betwenn tags.
But when I am using this code, the rows are not altered.

import re


for index, row in input_table.iterrows():

text = str("Document")
f = re.findall("^<span.*?", text)
print(f)
output_table = input_table.copy()

mlauber71 · October 15, 2019, 12:04pm

No they are not because you just copy the input table to the output. If you want to extract content from HTML files there might be nodes (like Palladian) more suitable to do that, but I am not an expert in that field.

system · April 15, 2020, 12:04am

This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.