AWS DynamoDB Scan lost first 200 rows

anguslou · April 15, 2021, 10:13am

I have been using DynamoDB Scan node for a few months and always work fine. Recent I connect to a new table and only return the rows on and after 200, so the first row is 200, then 201, 202… and so on. It has the same setting with my other projects and this is the first time happening, what are the possible reason?

I also check on AWS side, this table is also same as others (this table has only 3xx rows vs others like 3xx,xxx rows).

Why the first 200 rows are not returned?

julian.bunzel · April 22, 2021, 8:18am

Hi @anguslou,

does the Row ID in the original database start with 0 or 1? Or does the Row ID start with 200 and there are not really rows before that?

In case the rows are really missing, did you enter a filter expression in the node configuration?

Best,
Julian

anguslou · April 22, 2021, 1:11pm

The row ID start with 200 and I have no filter setup, so don’t understand what is the problem.

anguslou · April 23, 2021, 3:24am

As the DynamoDB gets better the return data start from row 300, quite strange as I don’t have this problem in other flows.

Screenshot 2021-04-23 at 11.21.13 AM

julian.bunzel · May 17, 2021, 9:04am

Hi @anguslou,

Sorry for the late response.
This is strange indeed.
We are trying to reproduce this issue and check if we can find a solution.

Best,

Julian

julian.bunzel · June 2, 2021, 7:34am

Hi @anguslou,

one possibility could be that the rows are locked due to access by another session.
Could you check if this is the case?

Best,
Julian

anguslou · June 2, 2021, 8:19am

Hi Julian,

Quite unlikely the db is locked due to access by another session.

Just check again there are 8647 rows in DynamoDB now and KNIME gets from row 6700. I should have no filter that limits the data retrieval.

julian.bunzel · June 4, 2021, 9:40am

Hi @anguslou,

we found a bug in the code of the node and created a ticket for it.
Thank you for reporting this issue, I will get back to you as soon as there is an update from our side.
(Ticket AP-16825 - for internal reference)

Cheers and have a great weekend,
Julian

anguslou · June 5, 2021, 4:31am

Great, thank you for your help, Julian. Have a great weekend.

szawadski · June 9, 2021, 5:11pm

Hello!

I have discovered the same problem! When load the data from one table in DynamoDB with the connector, I get 13035 records. Using a Python script based on boto3, I get 15085 records.
Unfortunately, this has led to some errors in our calculations…
Is the fix going to be released in the release 4.4? Just to know if we need to modify all our worklfows with the Python node or if we can expect a fix in the next days…
Regards,

Sébastien

szawadski · June 9, 2021, 5:15pm

Could you also take this opportunity to fix the filters? At least on my side, I need to try multiple times to have them persisting in the config tab. And the documentation could be clearly improved.

szawadski · June 14, 2021, 11:09am

Hi @julian.bunzel!
Any news of the release date / progress about AP-16825?

Thanks in advance,

Sébastien

julian.bunzel · June 14, 2021, 3:53pm

Hi @szawadski,

the issue with missing rows will most likely be fixed in 4.4.0.
I’ll check back with the developers regarding the filter issues.

Best,
Julian

szawadski · June 24, 2021, 11:33am

I see the fix for AP-16825 is indeed in version 4.4.0!
Looking forward to be able to use it in production with the stable release!

szawadski · July 7, 2021, 11:03am

I wanted to let you know that the fix AP-16825 corrects the problems of losing rows when reading a DynamoDB table. Thanks for the prompt reaction to the Knime Team!

system · January 5, 2022, 11:03pm

This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.