Whois lib in python and Knime

Dears,
I read the below topic and try to run it, while I get the below error, and the root cause is not clear

Story:
I have 25K url (www.example.com) and I need to get expire date of domains with knime and python

Does anybody have advice?

ERROR Python Script (1⇒1) (deprecated) 0:359 Execute failed: [Errno 2] No such file or directory: ‘C:\ProgramData\Anaconda3\lib\site-packages\whois\…/data\public_suffix_list.dat’
Traceback (most recent call last):
File “”, line 2, in
File “C:\ProgramData\Anaconda3\lib\site-packages\whois_init_.py”, line 56, in whois
domain = extract_domain(url)
File “C:\ProgramData\Anaconda3\lib\site-packages\whois_init_.py”, line 102, in extract_domain
with open(tlds_path, encoding=‘utf-8’) as tlds_fp:
FileNotFoundError: [Errno 2] No such file or directory: ‘C:\ProgramData\Anaconda3\lib\site-packages\whois\…/data\public_suffix_list.dat’

Hi @natanzi,

I’d advice to use the most recent python scripting nodes with the conda environment propagation node: In your case, it seems like the file
C:\ProgramData\Anaconda3\lib\site-packages\whois\…/data\public_suffix_list.dat
is missing, which comes with the whois installation. With said node, you can make sure that anyone you share the workflow with uses the same environment (i.e. python modules) like you do.

This example workflow should solve the issue:

I modified the python code slightly so that it takes into account all the rows in the input column.

Kind Regards,
Lukas

3 Likes

Dear Lukas,
Many thanks, Would you please advice fro below error?

image

ERROR Python Script        3:4        Execute failed: module 'whois' has no attribute 'whois'
Traceback (most recent call last):
  File "<string>", line 9, in <module>
AttributeError: module 'whois' has no attribute 'whois'

Dear Lucas,

I have tried installing both: whois and python-whois but none of them worked for me.
I found out that the issue is quite common but did not manage to find a fix for it.

Dear @natanzi

Please find below an alternative solution based on a different Python library

https://anaconda.org/rapidsai/python-whois

which can be installed using Conda as follows:

conda install -c rapidsai python-whois

The Python code in -Python Script- node should be robust to Time Out errors (or others) and allows submitting a table with several queries at the same time:

from pandas import DataFrame

import pandas as pd

import whois

# Create DataFrame 
output_table_1 = DataFrame()

rowID = 0

for ind in input_table_1.index:
	print( ind)
	
	current_row_table = input_table_1.iloc[ rowID]

	URL = str( current_row_table['URL'])
	
	print( URL)
	
	try:
		w = whois.whois(URL)
			
		for key, value in w.items():
			# print( key, "->", value)
			current_row_table[ key] = str( value)
		  
		current_row_table[ "Response"] = w.text
		current_row_table[ "Error"] = ""
			
	except OSError as err:
		print("OS error: {0}".format(err))
		current_row_table[ "Response"] = w.text
		current_row_table[ "Error"] = "OS error: {0}".format(err)
		
	if rowID == 0:
		frames = pd.DataFrame( current_row_table)
	else:
		frames = [ frames, pd.DataFrame( current_row_table)]
		frames = pd.concat( frames, axis=1)
		
	rowID = rowID + 1

output_table_1 = frames
output_table_1 = output_table_1.T

The workflow is available here below:

20220104 Pikairos Whois lib in python and Knime.knwf (95.2 KB)

Hope this helps.

Best

Ael

1 Like

Hey @natanzi,

sorry, my bad: I forgot to tell the Python Script node to actually use the conda environment. I reuploaded the fixed version:

If you are interested, the setting is in the Executable Selection Tab:

Kind Regards,
Lukas

2 Likes

Dear Lucas,
Many Thanks for your notification.

BR,
Milad

Dear Ael,
You made my day, much appreciated.

1 Like

Dear Milad,

My pleasure :smiley: !

Just a couple of comments. I thought @LukasS’s solution was using a different library but actually we are using the same. Secondly, there is a minor error in my Python implementation which should be corrected. Where its says

except OSError as err:
		print("OS error: {0}".format(err))
		current_row_table[ "Response"] = w.text

it should be:

except OSError as err:
		print("OS error: {0}".format(err))
		current_row_table[ "Response"] = ""

I have corrected it and it is available now in this new version here below:

20220105 Pikairos Whois lib in python and Knime.knwf (98.9 KB)

Best wishes,

Ael

1 Like

@Dear aworker,
if error happening, the flow stop, How can continue the process and skip the error like the below:

ERROR Python Script        0:464      Execute failed: No match for "VALIAMR.COM".
>>> Last update of whois database: 2022-01-05T07:49:51Z <<<

NOTICE: The expiration date displayed in this record is the date the
registrar's sponsorship of the domain name registration in the registry is
currently set to expire. This date does not necessarily reflect the expiration
date of the domain name registrant's agreement with the sponsoring
registrar.  Users may consult the sponsoring registrar's Whois database to
view the registrar's reported date of expiration for this registration.

TERMS OF USE: You are not authorized to access or query our Whois
database through the use of electronic processes that are high-volume and
automated except as reasonably necessary to register domain names or
modify existing registrations; the Data in VeriSign Global Registry
Services' ("VeriSign") Whois database is provided by VeriSign for
information purposes only, and to assist persons in obtaining information
about or related to a domain name registration record. VeriSign does not
guarantee its accuracy. By submitting a Whois query, you agree to abide
by the following terms of use: You agree that you may use this Data only
for lawful purposes and that under no circumstances will you use this Data
to: (1) allow, enable, or otherwise support the transmission of mass
unsolicited, commercial advertising or solicitations via e-mail, telephone,
or facsimile; or (2) enable high volume, automated, electronic processes
that apply to VeriSign (or its computer systems). The compilation,
repackaging, dissemination or other use of this Data is expressly
prohibited without the prior written consent of VeriSign. You agree not to
use electronic processes that are automated and high-volume to access or
query the Whois database except as reasonably necessary to register
domain names or modify existing registrations. VeriSign reserves the right
to restrict your access to the Whois database in its sole discretion to ensure
operational stability.  VeriSign may restrict or terminate your access to the
Whois database for failure to abide by these terms of use. VeriSign
reserves the right to modify these terms at any time.

The Registry database contains ONLY .COM, .NET, .EDU domains and
Registrars.

Traceback (most recent call last):
  File "<string>", line 22, in <module>
  File "C:\Users\bagher.h\.conda\envs\py3_knime\lib\site-packages\whois\__init__.py", line 44, in whois
    return WhoisEntry.load(domain, text)
  File "C:\Users\bagher.h\.conda\envs\py3_knime\lib\site-packages\whois\parser.py", line 188, in load
    return WhoisCom(domain, text)
  File "C:\Users\bagher.h\.conda\envs\py3_knime\lib\site-packages\whois\parser.py", line 399, in __init__
    raise PywhoisError(text)
whois.parser.PywhoisError: No match for "VALIAMR.COM".
>>> Last update of whois database: 2022-01-05T07:49:51Z <<<

NOTICE: The expiration date displayed in this record is the date the
registrar's sponsorship of the domain name registration in the registry is
currently set to expire. This date does not necessarily reflect the expiration
date of the domain name registrant's agreement with the sponsoring
registrar.  Users may consult the sponsoring registrar's Whois database to
view the registrar's reported date of expiration for this registration.

TERMS OF USE: You are not authorized to access or query our Whois
database through the use of electronic processes that are high-volume and
automated except as reasonably necessary to register domain names or
modify existing registrations; the Data in VeriSign Global Registry
Services' ("VeriSign") Whois database is provided by VeriSign for
information purposes only, and to assist persons in obtaining information
about or related to a domain name registration record. VeriSign does not
guarantee its accuracy. By submitting a Whois query, you agree to abide
by the following terms of use: You agree that you may use this Data only
for lawful purposes and that under no circumstances will you use this Data
to: (1) allow, enable, or otherwise support the transmission of mass
unsolicited, commercial advertising or solicitations via e-mail, telephone,
or facsimile; or (2) enable high volume, automated, electronic processes
that apply to VeriSign (or its computer systems). The compilation,
repackaging, dissemination or other use of this Data is expressly
prohibited without the prior written consent of VeriSign. You agree not to
use electronic processes that are automated and high-volume to access or
query the Whois database except as reasonably necessary to register
domain names or modify existing registrations. VeriSign reserves the right
to restrict your access to the Whois database in its sole discretion to ensure
operational stability.  VeriSign may restrict or terminate your access to the
Whois database for failure to abide by these terms of use. VeriSign
reserves the right to modify these terms at any time.

The Registry database contains ONLY .COM, .NET, .EDU domains and
Registrars.



Dear @natanzi

WALIAMR.COM” does not exist as URL. To cover any possible error without distinction, I have added the following piece of code:

image

The amended version of the workflow is available here below:

20220105 Pikairos Whois lib in python and Knime.knwf (104.2 KB)

Hope this helps

Ael

PS: @natanzi please be aware that you are accessing a service “whois.com” which does not allow intensive quering. This workflow is fine if your aim is to automatically gathering the information of just a few URL :slightly_smiling_face:

4 Likes

Thanks @natanzi for validating the solution.

Best wishes

Ael

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.