Python node: More than one type in column. Found int and float

Hi all,

I run a little script in the Python node. It runs smoothly in de editor mode but if I run the script in the flow it generates an error (see subject line).

It looks it’s not a Python error but a KNIME one. If I run it without “output_table_1 = all_df.copy()” it runs smoothly (of course not sending no data to the output port)

Soneone an idea what’s going wrong?

This is the script I run
= = = = = = =

Copy input to output

#output_table_1 = input_table_1.copy()

import pandas as pd

from yahooquery import Ticker

funds = [‘fb’, ‘aapl’, ‘amzn’, ‘nflx’, ‘goog’]

ticker_def = Ticker(funds, asynchronous=True)
print(type(ticker_def))

all = ticker_def.price
all_df = pd.DataFrame(all)

print(all_df)

output_table_1 = all_df.copy()
= = = = = = =

@sanderlenselink pandas can have different data types within one column but KNIME and most other data files cannot. You might have to convert all your columns to strings and deal with the different formats later (maybe transposing the information)

all_df = all_df.astype(str)

@sanderlenselink

The issue with your code is that the function ticker_def.price returns a dictionary of dictionaries that cannot be directly converted into a Pandas DataFrame. For each fund symbol entry in the dictionary, there is another dictionary containing a wide variety of price information. The contents of the dictionary returned are defined in the documentation under price:

In order to create a dataFrame you need to extract the values you require from the returned dictionary for each ticker symbol and construct a dictionary that can be converted in to a Pandas dataFrame. The following code shows how that can be done for the value ‘regularMarketPrice’. You can add additional column by repeating the line "regularMarketPrice: ticker... replacing the column name in both positions in the row; or pick other variables as you need.

import knime_io as knio
import pandas as pd
from yahooquery import Ticker

funds = ["fb", "aapl", "amzn", "nflx", "goog"]

ticker_def = Ticker(funds, asynchronous=True)

ticker_responses = ticker_def.price

prices_from_response = [
        {
        "symbol" : symbol,
        "regularMarketPrice": ticker_responses.get(symbol).get("regularMarketPrice"),
        }
    for symbol in funds
    if isinstance(ticker_responses.get(symbol), dict)
]

output_table = pd.DataFrame(prices_from_response)

knio.output_tables[0] = knio.write_table(output_table)

For those not familiar with list comprehension in Python the price_from_response is generated as follows:

  1. The square brackets denote a list.
  2. The curly braces denote we are creating a dictionary for each list entry.
  3. Within the dictionary we have two keys(that will become column headers in the dataFrame): the key is titled symbol and returns the value of the symbol variable defined in the for statement; the second key is titled “regularMarketPrice” and gets the dictionary entry from the ticker_responses for the symbol, and then from that returned dictionary gets the value access under the key regularMarketPrice.
  4. The list is created by iterating over the symbols defined infunds list, each fund is stored in the variable symbol that is to generate the dictionary for each list entry.
  5. The if statement is important. Some of the symbols are incorrect (fb for example). Therefore, when ticker_responses.get(symbol) is called it returns a string (saying that the symbol does not exist) rather than a dictionary. As a string does not have a get method, the attempt to get regularMarketPrice fails and an error is thrown and no response is generated. To prevent errors being generated the if statement checks if the ticker_response.get(symbol) is a dictionary: if it is then it generates a line in the dataFrame for the symbol, if it is not then it skips that symbol.

Hopefully, this provides a bit of help with your application.

DiaAzul

2 Likes

Hi @mlauber71 and @DiaAzul

thnx for your suggestions. Both solutions work !!!

The solution of @mlauber71 is fantastic very practical but need some little editing afterwards in KNIME.

@DiaAzul extended the solution by checking if a symbol exists. Also your explanation is very clear and helpful.

In my case with about 500 symbols @DiaAzul is somewhat faster (takes 5 seconds compared to 15 seconds)

A remainng question to @DiaAzul . . . do you know if there are any developments regarding yahooquery? The latest update is from 2020. Or do you know fast alternatives (yfinance is an alternative but not that fast and handy). Otherwise I will ask the developer Doug Guthrie (douglas.p.guthrie@gmail.com)

2 Likes

@DiaAzul . . . you added a helpful piece of code to check whether a symbol/ticker exists or not. That works fine for the module “price” of yahooquery.

However, I need also some historical prices and therefore I have to use “history” (see: Historical Prices - yahooquery).

So I changed your code a bit in order to use the “history” function; see below: ticker_responses = ticker_def.history(period=‘1mo’, interval=‘1wk’)

It appears that now “prices_from_response” produces an empty list. I checked many things but have no idea what’s wrong.

Do you have any suggestion for a solution?

= = =
import knime_io as knio
import pandas as pd
from yahooquery import Ticker

funds = [“fb”, “aapl”, “amzn”, “nflx”, “goog”]

ticker_def = Ticker(funds, asynchronous=True)

#ticker_responses = ticker_def.price
ticker_responses = ticker_def.history(period=‘1mo’, interval=‘1wk’)
print(type(ticker_responses))

prices_from_response = [
{
“symbol” : symbol,
#“regularMarketPrice”: ticker_responses.get(symbol).get(“regularMarketPrice”),
“close”: ticker_responses.get(symbol).get(“close”),
}
for symbol in funds
if isinstance(ticker_responses.get(symbol), dict)
]

print(type(prices_from_response))

output_table = pd.DataFrame(prices_from_response)

knio.output_tables[0] = knio.write_table(output_table)

@sanderlenselink

You can try the following. I’ve tested it in Visual Studio Code, but don’t have KNIME with me at the moment to test it in the scripting node.

You need to read the documentation closely. Each function returns different data types and, therefore, they each require different checks and approaches for processing data into a format you can return to KNIME. In the case of the history method the returned object is a Pandas DataFrame, rather than a dictionary, containing the data requested. This object contains the open-close-high-low pricing for each data within the requested range. To return this data to knime you need to extract the data for each symbol, the closing prices, and the value of the price for each data in the data set. You need to create a triplet of symbol-date-value for each of the values returned to KNIME.

import pandas as pd
from yahooquery import Ticker

funds = ['fb', 'aapl', 'amzn', 'nflx', 'goog']

ticker_def = Ticker(funds, asynchronous=True)

ticker_responses = ticker_def.history(period='1mo', interval='1wk')

prices_from_response = list()

for symbol in funds:
    ticker_data = ticker_responses.get(symbol)
    if isinstance(ticker_data, pd.DataFrame):
        ticker_close = ticker_data["close"]
        for date, value in ticker_close.iteritems():
            record = {
                "symbol": symbol,
                "data": str(date),
                "value": value
            }
            prices_from_response.append(record)

output_table = pd.DataFrame(prices_from_response)

This should point you in the right direction.

DiaAzul

@DiaAzul . . . thnx for your great hints.

It works fine. But there’s one “little” thing . . . it presumes that there’s at least one wrong ticker (like “fb”).
If all the tickers are OK the code productes an empty dataframe .

I tried to fix that with try/except but that gives an error (e.g. NONE type). My best solution is as below but than I have to split the “Row ID” for the symbol and date.

Splitting in KNIME is not such a big problem . . . but I think it is better / more elegant to solve in Python.

I hesitate you to ask you after your great hint but have you any suggestion how to solve it in Python (like screenshot)

Many thnx in advance

image

= = = =
#Copy input to output
#output_table_1 = input_table_1.copy()

import pandas as pd
from yahooquery import Ticker

funds = [‘aapl’, ‘amzn’, ‘nflx’, ‘goog’]
#funds = [‘fb’, ‘aapl’, ‘amzn’, ‘nflx’, ‘goog’]

ticker_def = Ticker(funds, asynchronous=True)

ticker_responses = ticker_def.history(period=‘1mo’, interval=‘1wk’)
##print(ticker_responses)

prices_from_response = list()

for symbol in funds:
ticker_data = ticker_responses.get(symbol)
if isinstance(ticker_data, pd.DataFrame):
ticker_close = ticker_data[“close”]
for date, value in ticker_close.iteritems():
record = {
“symbol”: symbol,
“data”: str(date),
“value”: value
}
prices_from_response.append(record)

output = pd.DataFrame(prices_from_response)

if output.empty == True:
** print(‘DataFrame is empty’)**
** output = ticker_responses[[“close”]]**
else:
** print(‘DataFrame is not empty’)**

output_table_1 = pd.DataFrame(output)

@sanderlenselink ,

There is no reason the code provided would require a faulty ticker code in order to work.

I would suggest that you try running the code outside of KNIME using something like Visual Studio Code where you can use a debugger to step through the code, observe the workflow and inspect variables. It will give you a far greater insight into how the program works and what the problems are than asking on a forum.

This issue is more Python/Yahoo library related rather than an issue with KNIME. The problems are not so much about returning data to KNIME You may get a more detailed and faster response if you ask on the Yahoo or a Python forum.

DiaAzul.

@DiaAzul . . . the same behavior in Jupyter . . .

I will try to get a bit smarter by diving into yahooquery and contact the developer of YQ

For the time being I will add a faulty ticker in the underlying database. Quick and (very) dirty but for the time being that’s the easiest “solution” :wink:

Last but not least . . . your help and this forum are FANTASTIC !!!

I learned from the yahooquery-forum that that the “faulty ticker” problem is an issue to be solved by yahooquery itself . . .

see history: faulty ticker needed otherwise empty DataFrame · Issue #125 · dpguthrie/yahooquery · GitHub
and
Only return dataframes, omit bad symbols by dpguthrie · Pull Request #118 · dpguthrie/yahooquery · GitHub

2 Likes