I am trying to filter a database table column via Python script, but when I am trying to execute it i am getting the error: output_table = input_table.copy() ^SyntaxError: invalid syntax

Here is the code:

input_table[input_table.Document.contains("<span", regex = true, na = False)

So this is not problem with the code it seems but rather with the output_table?

Somebody can help?

Not exactly sure what you want to do but in Python dropping a column might look something like this:

import pandas
# https://stackoverflow.com/questions/21285380/find-column-whose-name-contains-a-specific-string
v_colums_to_filter = [col for col in input_table.columns if ‘<span’ in col]

# https://thispointer.com/python-pandas-drop-columns-in-dataframe-by-label-names-or-by-index-positions/

input_table = input_table.drop(v_colums_to_filter , axis=‘columns’)
output_table = input_table.copy()

kn_example_python_filter_columns.knwf (8.1 KB)

1 Like

is it that you are trying to select few columns from a table, if yes.

use ‘Column Selector’ Note to take out column , hope this helps. thanks

1 Like

Hi @mlauber71,

Thanks for your answer.

My bad for not being clear enough with my problem.
I am not getting the error with the output table anymore.
I have a database table with 33.000 rows which contain the contents of HTML files.

I am trying to iterate over every row to filter the actual text in files, which is betwenn tags.
But when I am using this code, the rows are not altered.

import re

for index, row in input_table.iterrows():

text = str("Document")
f = re.findall("^<span.*?", text)
output_table = input_table.copy()

No they are not because you just copy the input table to the output. If you want to extract content from HTML files there might be nodes (like Palladian) more suitable to do that, but I am not an expert in that field.

This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.