Hi
Due to a bug in Crate (see post: Adding column generates error) we are unable to add a column to our production database.
While waiting for the pull request to be accepted and included in the next release, I wanted to do some testing with renaming tables. (Creating a new table which includes the new columns and then switching the names.). See what the impact would be and if it would result in downtime.
All tests were done on a development cluster with the same structure as our production database but less data
As a fist step, I created a copy of a table with the same structure. Next, I wanted to copy the data of the original table into the test table.
The table contains 29 million records as is 7GB in size.
To copy the data I used the naive
insert into test.table (select * from test.table)
This query took a while to complete so I left it overnight.
When I returned in the morning, the query seemed to have completed.
But when trying to query the database (any table) everything was unresponsive and the admin panel reported /usr/share/crate/lib/site/index.html: Too many open files
I Rebooted al the nodes one by one in the 3-node cluster after which the admin panel reported that a lot of shards were missing!
The original table from which the data was copied, was completely missing in the shards panel as no shards were available.
After rebooting a second time, the shards were still not up.
by running alter cluster reroute retry failed
3 of the 6 shards came online.
select * from sys.allocations
reports
cannot allocate because all found copies of the shard are either stale or corrupt
- or
cannot allocate because allocation is not permitted to any of the nodes
for the 3 missing shards for the primary and secondary shards respectively.
Querying data form the table gives
org.jkiss.dbeaver.model.sql.DBSQLException: SQL Error [XX000]: ERROR: Couldn't create execution plan from logical plan because of: the shard 1 of table [data.meterreading/EXqQQPdfSsm_CV5qD0ylAg] is not available:
Can somebody give some insight into why this happened?
How can I get the table back online? (even with data loss?)
How can this be prevented?
Is it a bad idea to copy data with a simple insert into table2 (select * from table 1)
What is an alternative to efficiently copy data between tables?