I am trying to export a table to an azure blob store.
The query looks like this:
COPY "remoteserviceassistant"."faq_raw"
TO DIRECTORY 'az://myblobstore.blob.core.windows.net/faqdata/faq'
WITH (
key = 'awnefaofawenfowfoawe'
);
For a small table this worked well and created a couple of json files, however, for a larger Table ~5GB in size, I get the following error:
NotSerializableExceptionWrapper[open_d_a_l_exception: Unexpected (permanent) at Writer::write => AzblobError
{ code: "BlockCountExceedsLimit", message: "The uncommitted block count cannot exceed the maximum limit of 100,000 blocks.
RequestId:2ae2b97d-e01e-0080-2fa7-4c05ac000000 Time:2024-12-12T15:08:12.4616782Z" }
Context: uri: myblobstore.blob.core.windows.net/faqdata/faq/faq_raw_3_.json?comp=block&blockid=gaegrserger/m5txTV0/L0rw%3D%3D
response: Parts { status: 409, version: HTTP/1.1, headers: {"content-length": "269", "content-type": "application/xml", "server": "Windows-Azure-Blob/1.0 Microsoft-HTTPAPI/2.0", "x-ms-request-id": "2ae2b97d-e01e-0080-2fa7-4c05ac000000", "x-ms-version": "2022-11-02", "x-ms-error-code": "BlockCountExceedsLimit", "date": "Thu, 12 Dec 2024 15:08:12 GMT"} } service: azblob path: faq_raw/faq_raw_3_.json size: 16384 written: 1268318208 ]
I found this post which suggests that there might be a bug in the upload code: How to list / delete uncommitted blob blocks, to avoid: “The uncommitted block count cannot exceed the maximum limit of 100,000 blocks.” - Microsoft Q&A
This is the schema of the problematic table, maybe the embedding vectors cause the issue:
Schema:
- Name:
- Type:
-
uuid
- Type: TEXT
-
question
- Type: TEXT
-
answer
- Type: TEXT
-
ticket_object_id
- Type: TEXT
-
ticket_number
- Type: TEXT
-
plant_id
- Type: TEXT
-
plant_uuid
- Type: TEXT
-
plant_name
- Type: TEXT
-
path
- Type: TEXT
-
title
- Type: TEXT
-
question_embedding
- Type: FLOAT_VECTOR(2048)
-
created_at
- Type: TIMESTAMP WITHOUT TIME ZONE