Hi,
what is the best way to copy 100 Million records from one table to another having multiple partition.
Tables are there within 6 nodes having 3 GB of heap size for each node.
Thanks in advance.
INSERT INTO new_table (col_1, col_2) SELECT col_1, col_2 FROM old_table;
should work fine.
Do you think this single query without any where clause will copy 100 millions of records from one table to second table ? I think we also need to consider time as there 100 millions of records size in 2 TB. I just want to know that this might create impact on my hard-disk and CPU both and also may impact my running processes.
Yes, this is typically no problem.It will take some time for sure with 2TB
Also this operation is typically throttled (maybe even too much right now).
There is also already an optimisation on the way to improve the performance:
crate:master
← crate:j/insert-into-congestion-control
opened 04:28PM - 15 Mar 21 UTC
## Summary of the changes / Why this improves CrateDB
This improves the cur… rent throttling mechanism by dynamically adapting
the concurency limit and by taking the round trip time into
considerating when pausing operations.
The mechanism so far was too aggressive in some cases and led to a
mostly idle cluster taking a long time to process insert-from-query
operations.
The behavior so far:
n1:
q1 -> inserts to n2
q2 -> inserts to n2
q3 -> inserts to n2
q4 -> inserts to n2
q5 -> inserts to n2; concurrency limit reached, pause
... now q1 to q5 start competing and all but one go to sleep (starts at 1000ms, with exponential backoff)
New behavior:
- The concurrency limit is dynamic. (Currently between 20 - 200, we might
want to tune these numbers)
- We track the round-trip time for the requests by target node.
This round-trip time is used to adjust the concurrency limit and also
as sleep intervall in case the number of inflight requests exceeds the
concurreny limit.
---
```
[2021-03-16T10:47:58,868][DEBUG][i.c.c.l.Gradient2Limit ] [Grand Som] New limit=84 shortRtt=883.037 ms longRtt=926.912 ms queueSize=4.0 gradient=1.0
[2021-03-16T10:47:58,892][DEBUG][i.c.c.l.Gradient2Limit ] [Grand Som] New limit=85 shortRtt=710.5 ms longRtt=926.192 ms queueSize=4.0 gradient=1.0
[2021-03-16T10:47:59,131][DEBUG][i.c.c.l.Gradient2Limit ] [Grand Som] New limit=86 shortRtt=645.26 ms longRtt=920.761 ms queueSize=4.0 gradient=1.0
[2021-03-16T10:47:59,289][DEBUG][i.c.c.l.Gradient2Limit ] [Grand Som] New limit=87 shortRtt=545.017 ms longRtt=916.101 ms queueSize=4.0 gradient=1.0
[2021-03-16T10:47:59,352][DEBUG][i.c.c.l.Gradient2Limit ] [Grand Som] New limit=88 shortRtt=566.895 ms longRtt=914.939 ms queueSize=4.0 gradient=1.0
[2021-03-16T10:47:59,505][DEBUG][i.c.c.l.Gradient2Limit ] [Grand Som] New limit=88 shortRtt=946.852 ms longRtt=913.289 ms queueSize=4.0 gradient=1.0
[2021-03-16T10:47:59,507][DEBUG][i.c.c.l.Gradient2Limit ] [Grand Som] New limit=89 shortRtt=638.089 ms longRtt=912.374 ms queueSize=4.0 gradient=1.0
[2021-03-16T10:47:59,679][DEBUG][i.c.c.l.Gradient2Limit ] [Grand Som] New limit=90 shortRtt=665.027 ms longRtt=863.758 ms queueSize=4.0 gradient=1.0
[2021-03-16T10:47:59,789][DEBUG][i.c.c.l.Gradient2Limit ] [Grand Som] New limit=91 shortRtt=636.575 ms longRtt=862.481 ms queueSize=4.0 gradient=1.0
[2021-03-16T10:48:00,671][DEBUG][i.c.c.l.Gradient2Limit ] [Grand Som] New limit=92 shortRtt=750.06 ms longRtt=849.38 ms queueSize=4.0 gradient=1.0
[2021-03-16T10:48:02,504][DEBUG][i.c.c.l.Gradient2Limit ] [Grand Som] New limit=92 shortRtt=773.498 ms longRtt=669.04 ms queueSize=4.0 gradient=1.0
[2021-03-16T10:48:02,548][DEBUG][i.c.c.l.Gradient2Limit ] [Grand Som] New limit=93 shortRtt=658.408 ms longRtt=669.005 ms queueSize=4.0 gradient=1.0
```
## Checklist
- [x] Added an entry in `CHANGES.txt` for user facing changes
- [x] Updated documentation & `sql_features` table for user facing changes
- [x] Touched code is covered by tests
- [x] [CLA](https://crate.io/community/contribute/cla/) is signed
- [x] This does not contain breaking changes, or if it does:
- It is released within a major release
- It is recorded in ``CHANGES.txt``
- It was marked as deprecated in an earlier release if possible
- You've thought about the consequences and other components are adapted
(E.g. AdminUI)
An alternative would be to use COPY FROM / TO
1 Like
You can improve the performance by setting the number_of_replicas='0'
and the refresh_interval=0
temporarily.
1 Like