Due to a bug in Crate (see post: Adding column generates error) we are unable to add a column to our production database.
While waiting for the pull request to be accepted and included in the next release, I wanted to do some testing with renaming tables. (Creating a new table which includes the new columns and then switching the names.). See what the impact would be and if it would result in downtime.
All tests were done on a development cluster with the same structure as our production database but less data
As a fist step, I created a copy of a table with the same structure. Next, I wanted to copy the data of the original table into the test table.
The table contains 29 million records as is 7GB in size.
To copy the data I used the naive
insert into test.table (select * from test.table)
This query took a while to complete so I left it overnight.
When I returned in the morning, the query seemed to have completed.
But when trying to query the database (any table) everything was unresponsive and the admin panel reported
/usr/share/crate/lib/site/index.html: Too many open files
I Rebooted al the nodes one by one in the 3-node cluster after which the admin panel reported that a lot of shards were missing!
The original table from which the data was copied, was completely missing in the shards panel as no shards were available.
After rebooting a second time, the shards were still not up.
alter cluster reroute retry failed 3 of the 6 shards came online.
select * from sys.allocations reports
cannot allocate because all found copies of the shard are either stale or corrupt
cannot allocate because allocation is not permitted to any of the nodes
for the 3 missing shards for the primary and secondary shards respectively.
Querying data form the table gives
org.jkiss.dbeaver.model.sql.DBSQLException: SQL Error [XX000]: ERROR: Couldn't create execution plan from logical plan because of: the shard 1 of table [data.meterreading/EXqQQPdfSsm_CV5qD0ylAg] is not available:
Can somebody give some insight into why this happened?
How can I get the table back online? (even with data loss?)
How can this be prevented?
Is it a bad idea to copy data with a simple
insert into table2 (select * from table 1)
What is an alternative to efficiently copy data between tables?