I want to copy my MySQL data (>200 Mio. rows) to BigQuery. Therefore I created a python script, which uses this library. At the moment it streams 1000 rows with one request and it generates about 1,1 requests/second. This is not really fast and it would take me days to transfer the whole dataset. I am sure that this can be optimized, but I don't know how. Would you have some suggestions? You can find my source code here
I thought about the following points:
- Each request contains 1000 rows, should I choose a bigger number?
- Does this library use gzip per default?
I want to copy my MySQL data (>200 Mio. rows) to BigQuery. Therefore I created a python script, which uses this library. At the moment it streams 1000 rows with one request and it generates about 1,1 requests/second. This is not really fast and it would take me days to transfer the whole dataset. I am sure that this can be optimized, but I don't know how. Would you have some suggestions? You can find my source code here
I thought about the following points: