Python Integration working with ArrowSourceTables

mlauber71 · November 20, 2023, 5:23pm

Hmm from my experience OpenPyxl will add significantly to every operation since it will have to heavily manipulate the excel structures depending on what you want to do (good new though the package is integrated into the KNIME Python extensions).

If you use OpenPyxl to import or export Excel files you might end up with a (well) Pandas or Arrow dataframe nonetheless …

Another hyped datatype currently is Polars. Another one is feather - which I am not really familiar with … but they all are additional packages.

As I said: if you work with KNIME and Excel and Python you will face some sort of data transfer in any case. Otherwise this might very well be a Python question.

But since we are at it I put your question to ChatGPT

data = [
    {"Name": "Alice", "Age": 30, "City": "New York"},
    {"Name": "Bob", "Age": None, "City": "Paris"},
    {"Name": "Charlie", "Age": 25, "City": None},
    {"Name": "David", "Age": None, "City": "London"}
]

# Dropping a Column

def drop_column(data, column):
    for row in data:
        row.pop(column, None)
    return data

# Example: Drop the 'City' column
data = drop_column(data, 'City')

# Filling NA Values

def fill_na(data, column, fill_value):
    for row in data:
        if row.get(column) is None:
            row[column] = fill_value
    return data

# Example: Fill NA in 'Age' with 0
data = fill_na(data, 'Age', 0)

# Forward Fill

def forward_fill(data, column):
    last_valid = None
    for row in data:
        if row.get(column) is not None:
            last_valid = row[column]
        elif row.get(column) is None and last_valid is not None:
            row[column] = last_valid
    return data

# Example: Forward fill the 'City' column
data = forward_fill(data, 'City')