Forums

Pandas to_sql repeats insert indefinetly

Hello everyone

I have a problem regarding pandas and the MySQL-Database of pythonanywhere.

I append data into an existing table with the following code:

        from sqlalchemy import create_engine
        import pandas


        host = "dancingqueens.mysql.eu.pythonanywhere-services.com" 
        user = user
        passwd = passwd
        db = db

        conn_addr = 'mysql://' + user + ':' + passwd + '@' + host  + '/' + db
        engine = sqlalchemy.create_engine(conn_addr, pool_recycle=280)

        data.to_sql(name = data_name,con = engine,if_exists="append", index=False)

This works flawlessly for small tables, however for "bigger" tables (30'000 rows with 50 columns) it seems to continue appending indefinetly until I break it off. So e.g. if I start with an empty table and leave it running for 2 minutes, I have a table with 240'000 rows and multiple duplicates. It seems, that pandas doesn't realize it has already appended the table and does it again.

What i have tried so far:

  • Changing from "append" to "replace", however it replaces the data again after it has filled the data -> no change
  • Using a connection instead of the engine -> no change
  • setting poolclass to NullPool -> no change
  • set method="multi" and chunksize=1000 within to_sql -> no change

Nothing seems to help and i am out of ideas. Any help is appreciated. Thank you

Have you tried 'append' combined with chunksize?

So I found the solution:

I used the code above in a custom function, however i didn't use a "return" statement. Adding it solved my problem. However I still don't understand, how the problem would only occure with bigger tables, but at least it is solved for now.

Glad to hear that you made it work!