Python Script Problem (Downloading Data Files)

Hello knimers,

I have a certain code that is downloading xml files, you can see in the following.
Furthermore I am using the following workflow you can see in the picture:
image

When I start the code in Pycharm with Python 3.9 there is no problem with downloading xml files.
When I start the code within the Python Script the code will be exectuted but very few data files are downloaded.
Is there a problem with timeout, or might you imagine other reasons why this problem is happening?

Thank you for your help in advance!

BR Bastian

import pandas as pd
import epo_ops
import requests

from epo_ops.models import Docdb, Epodoc, Original



def getpublisheddata(patentnummer, client):
    try:
        response = client.published_data(  # Retrieve bibliography data
            reference_type='publication',  # publication, application, priority
            input=epo_ops.models.Epodoc(patentnummer),  # original, docdb, epodoc
            endpoint='biblio',  # optional, defaults to biblio in case of published_data
            constituents=[]  # optional, list of constituents
        )
        print(response.text)
        with open('data' + patentnummer + '.xml', 'w', encoding='utf-8') as f:
            f.write(response.text)

        response2 = client.family(
            reference_type='publication',
            input=epo_ops.models.Epodoc(patentnummer),  # original, docdb, epodoc
            endpoint='biblio',  # optional, defaults to biblio in case of published_data
            constituents=[]  # optional, list of constituents
        )
        print(response2.text)
        with open('family' + patentnummer + '.xml', 'w', encoding='utf-8') as g:
            g.write(response2.text)
    except:
        pass

# innerhalb der Funktion werden Daten abgerufen und als xml gespeichert!

client = epo_ops.Client(key='XXXX', secret='XXXX')

# Hier muss die Liste eingelesene werden (trefferliste) --> pd.read_csv Achtung: Komma/Semikolon bei deutschen csv ist falsch! delimiter o.ö.

df = pd.read_csv('Output_H01M8_2404_ALL.csv')
for column in df[['publication_number']]:
    columnSeriesObj = df[column]  # Listenfeld Veröffentlichungs-Nummer erstellen
    for i in columnSeriesObj.values:
        print(i)
        patentnumberlength = len(i)
        patentoffice = i[0:2]
        patentnummer = i[2:patentnumberlength - 1]
        print(patentnummer)
        patentnummer = patentnummer.lstrip('0')  # patentnummer ohne fĂĽhrende nullen
        print(patentnummer)
        numeric_filter = filter(str.isdigit, patentnummer)  # alle nicht numerischen Elemente löschen
        numeric_string = "".join(numeric_filter)
        print(numeric_string)
        patentnummer = patentoffice + numeric_string
        print(patentnummer)

        print("Patentnummer: " + patentnummer)

        getpublisheddata(patentnummer, client)
      
        print(epo_ops.__version__)

Hi @8bastian8,

Not really an answer but there’s a big

try:
...
except:
    pass

around the body of your getpublisheddata function. Any chance the body throws a helpful error during its execution that is then swallowed? Could you try printing the error during each of the function’s invocations (or don’t try to catch it in the first place, just for the purpose of debugging)?

Marcel

3 Likes

I tried it, now I get an error like this :

Do you know, what the problem could be ? :thinking:

I think I have it:

I just adjusted
->print(response.text) to → print(response.text.encode(“utf-8”))

3 Likes

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.