Export data to Azure blob storage

I am trying to export a table to an azure blob store.

The query looks like this:

COPY "remoteserviceassistant"."faq_raw"
TO DIRECTORY 'az://myblobstore.blob.core.windows.net/faqdata/faq'
WITH (
    key = 'awnefaofawenfowfoawe'
);

For a small table this worked well and created a couple of json files, however, for a larger Table ~5GB in size, I get the following error:

NotSerializableExceptionWrapper[open_d_a_l_exception: Unexpected (permanent) at Writer::write => AzblobError 
{ code: "BlockCountExceedsLimit", message: "The uncommitted block count cannot exceed the maximum limit of 100,000 blocks. 
RequestId:2ae2b97d-e01e-0080-2fa7-4c05ac000000 Time:2024-12-12T15:08:12.4616782Z" } 
Context: uri: myblobstore.blob.core.windows.net/faqdata/faq/faq_raw_3_.json?comp=block&blockid=gaegrserger/m5txTV0/L0rw%3D%3D 
response: Parts { status: 409, version: HTTP/1.1, headers: {"content-length": "269", "content-type": "application/xml", "server": "Windows-Azure-Blob/1.0 Microsoft-HTTPAPI/2.0", "x-ms-request-id": "2ae2b97d-e01e-0080-2fa7-4c05ac000000", "x-ms-version": "2022-11-02", "x-ms-error-code": "BlockCountExceedsLimit", "date": "Thu, 12 Dec 2024 15:08:12 GMT"} } service: azblob path: faq_raw/faq_raw_3_.json size: 16384 written: 1268318208 ]

I found this post which suggests that there might be a bug in the upload code: How to list / delete uncommitted blob blocks, to avoid: “The uncommitted block count cannot exceed the maximum limit of 100,000 blocks.” - Microsoft Q&A

This is the schema of the problematic table, maybe the embedding vectors cause the issue:

Schema:

  • Name:
    • Type:
  1. uuid

    • Type: TEXT
  2. question

    • Type: TEXT
  3. answer

    • Type: TEXT
  4. ticket_object_id

    • Type: TEXT
  5. ticket_number

    • Type: TEXT
  6. plant_id

    • Type: TEXT
  7. plant_uuid

    • Type: TEXT
  8. plant_name

    • Type: TEXT
  9. path

    • Type: TEXT
  10. title

    • Type: TEXT
  11. question_embedding

    • Type: FLOAT_VECTOR(2048)
  12. created_at

    • Type: TIMESTAMP WITHOUT TIME ZONE

Hi @thunderbug1, thanks for the report.

Could you please create an issue in the crate repository?

Created a github issue here: Azure Blob store COPY TO gives BlockCountExceedsLimit · Issue #17136 · crate/crate

1 Like

Thanks for creating an issue!

We will report upstream with a library specific reproduction