OpenAI Vector Stores API Explained

1. What the Vector Stores API is

A vector store is a hosted knowledge base that OpenAI manages for you. You upload files, and OpenAI automatically parses the content, splits it into chunks, creates embeddings, and stores everything in a system that supports both semantic search and keyword search. The stored content can then be searched directly, or used by higher level tools like the file_search tool in the Responses API and the Assistants API.

This makes vector stores a core building block for retrieval augmented generation. Instead of forcing a model to rely only on its built in knowledge, you give it access to your own documents and let it retrieve the most relevant passages at the moment it needs them.

2. The most important capabilities

2.1 Store and manage searchable knowledge bases

With the Vector Stores API, you can create and manage vector_store objects that track ingestion status, file counts, usage, and expiration policy. A vector store becomes usable when its status is completed.

2.2 Automatic ingestion pipeline for files

When you attach a file to a vector store, ingestion is automatic: parsing, chunking, embedding, and indexing happen without you implementing that pipeline yourself. File ingestion is asynchronous, so you typically poll until it is finished, or use official SDK helpers that upload and poll for you.

2.3 Semantic search directly on the store

You can query a vector store using client.vector_stores.search(...) with a natural language query and get back matching chunks and related metadata, including similarity scores and file of origin.

2.4 Power the `file_search` tool for Responses and Assistants

Vector stores are the backing index for the hosted file_search tool. In the Responses API, you pass vector_store_ids to the file_search tool so the model can retrieve relevant passages before answering.

In the Assistants API, you attach vector stores to assistants and threads via tool_resources, and the file search tool can query them during a run.

2.5 Metadata, filtering, batching, chunking strategy, and expiration

You can attach attributes to vector store files and use filtering in file search calls, for example filtering by a category attribute.

You can attach many files at once using file batches, with a maximum batch size of 2000 files, and optionally override metadata or chunking strategy per file.

You can also set an expiration policy using expires_after so the store is deleted after a period of inactivity, and you are no longer charged after expiration.

3. How to use it in practice, detailed workflow

3.1 Choose the interaction style

You typically choose one of these approaches:

Direct retrieval workflow
You create a vector store, upload files, then call client.vector_stores.search(...) yourself and decide what to do with the results.
Hosted tool workflow
You create a vector store, upload files, then let the model call the hosted file_search tool from within the Responses API or Assistants API.

3.2 Create a vector store

Creating a store is straightforward:


from openai import OpenAI

client = OpenAI()

vector_store = client.vector_stores.create(name="Support FAQ")
print(vector_store.id, vector_store.status)

A vector store tracks ingestion progress through fields like status and file_counts.

3.3 Upload files and wait for ingestion to complete

The easiest path in Python is to use the SDK helper that uploads and polls:


from openai import OpenAI

client = OpenAI()

vector_store = client.vector_stores.create(name="Support FAQ")

client.vector_stores.files.upload_and_poll(
    vector_store_id=vector_store.id,
    file=open("customer_policies.txt", "rb"),
)

vs = client.vector_stores.retrieve(vector_store.id)
print(vs.status, vs.file_counts)

OpenAI notes ingestion is asynchronous and recommends polling helpers, or monitoring file_counts until processing completes.

Important limits and defaults to keep in mind:

Maximum file size is 512 MB
Each file should contain no more than 5,000,000 tokens
Default chunking is 800 token chunks with 400 token overlap

If you have many files, use the file batch helper shown in the Assistants docs:


from openai import OpenAI

client = OpenAI()

vector_store = client.vector_stores.create(name="Financial Statements")

file_paths = ["edgar/goog-10k.pdf", "edgar/brka-10k.txt"]
file_streams = [open(p, "rb") for p in file_paths]

file_batch = client.vector_stores.file_batches.upload_and_poll(
    vector_store_id=vector_store.id,
    files=file_streams,
)

print(file_batch.status)
print(file_batch.file_counts)

This pattern is designed for multi file ingestion with polling.

3.4 Run semantic search directly

Once ingestion is complete, you can search:


results = client.vector_stores.search(
    vector_store_id=vector_store.id,
    query="What is the return policy?",
)

for r in results.data:
    print("score:", r.score)
    print("file_id:", r.file_id)
    print("text:", r.text[:200])
    print()

OpenAI describes this as semantic search that can find matches even when few or no keywords overlap.

4. Combining vector stores with other OpenAI APIs, detailed patterns

4.1 Vector stores plus Responses API

This is the most direct way to get RAG behavior with minimal orchestration. You pass the vector store into the file_search tool, and the model can choose to call it.


from openai import OpenAI
client = OpenAI()

response = client.responses.create(
    model="gpt-4.1",
    input="Answer using the internal docs. What is deep research by OpenAI?",
    tools=[{
        "type": "file_search",
        "vector_store_ids": [vector_store.id],
    }],
    include=["file_search_call.results"],
)

print(response.output_text)

This usage, including the include=["file_search_call.results"] option to inspect retrieved results, is shown in the File Search guide.

If you store attributes on files, you can filter retrieval:


response = client.responses.create(
    model="gpt-4.1",
    input="Summarize announcements only.",
    tools=[{
        "type": "file_search",
        "vector_store_ids": [vector_store.id],
        "filters": {
            "type": "in",
            "key": "category",
            "value": ["announcement"],
        },
    }],
)
print(response.output_text)

The File Search guide shows a filters object with an in filter on a metadata key.

4.2 Vector stores plus Assistants API

This is useful when you want a persistent assistant that can keep using the same knowledge base across many user conversations.

Key ideas from the docs:

You attach at most one vector store to an assistant and at most one vector store to a thread.
If the thread also has a vector store created through message attachments, the file search tool can query both the assistant store and the thread store during a run.

A typical attachment step looks like this:


assistant = client.beta.assistants.update(
    assistant_id=assistant.id,
    tool_resources={"file_search": {"vector_store_ids": [vector_store.id]}},
)

That exact tool_resources pattern is shown in the Assistants File Search documentation.

The Assistants docs also document default retrieval configuration used by the file search tool, including chunk size, overlap, embedding model, maximum chunks added to context, and ranking behavior.

4.3 Vector stores plus Retrieval API style workflows

Sometimes you do not want the model to decide when to retrieve. You want a deterministic pipeline:

Run client.vector_stores.search(...)
Take the top results
Feed them into a model call as context

The Retrieval guide describes retrieval as useful on its own and especially powerful when combined with models to synthesize responses.

4.4 Vector stores plus tool calling for structured actions

A common production pattern is:

Use file search for unstructured knowledge such as policies and manuals
Use function tools for structured actions such as database lookup, order status, scheduling, or calculations
Let the model combine retrieved passages with fresh structured data

Vector stores handle your document knowledge base, while function tools handle operations and up to date system data. The File Search guide emphasizes that file search is a hosted tool that the model can call automatically.

5. A complete Python example

This example shows:

Creating a vector store
Uploading files with polling
Adding per file attributes
Using Responses API with filtered file search


from openai import OpenAI

client = OpenAI()

# 1. Create a vector store
vector_store = client.vector_stores.create(name="Company Knowledge Base")

# 2. Upload multiple files and poll until processed
file_paths = ["docs/policies.txt", "docs/announcements.txt"]
streams = [open(p, "rb") for p in file_paths]

client.vector_stores.file_batches.upload_and_poll(
    vector_store_id=vector_store.id,
    files=streams,
)

# 3. Attach attributes to a specific file already uploaded to the OpenAI Files API
# If you already have a file_id, you can add it with attributes like this:
# client.vector_stores.files.create(vector_store_id=vector_store.id, file_id="file_123", attributes={...})

# 4. Ask a question, restricting retrieval to a category
resp = client.responses.create(
    model="gpt-4.1",
    input="Summarize only the announcements in two paragraphs.",
    tools=[{
        "type": "file_search",
        "vector_store_ids": [vector_store.id],
        "filters": {
            "type": "in",
            "key": "category",
            "value": ["announcement"],
        },
    }],
)

print(resp.output_text)

The ingestion helpers, per file attributes, and filters used above are all documented in OpenAI’s Retrieval and File Search guides.

SLQ notes

February 25, 2026