OpenAI Vector Stores API Explained
1. What the Vector Stores API is
A vector store is a hosted knowledge base that OpenAI manages for you. You upload files, and OpenAI automatically parses the content, splits it into chunks, creates embeddings, and stores everything in a system that supports both semantic search and keyword search. The stored content can then be searched directly, or used by higher level tools like the file_search tool in the Responses API and the Assistants API.
This makes vector stores a core building block for retrieval augmented generation. Instead of forcing a model to rely only on its built in knowledge, you give it access to your own documents and let it retrieve the most relevant passages at the moment it needs them.
2. The most important capabilities
2.1 Store and manage searchable knowledge bases
With the Vector Stores API, you can create and manage vector_store objects that track ingestion status, file counts, usage, and expiration policy. A vector store becomes usable when its status is completed.
2.2 Automatic ingestion pipeline for files
When you attach a file to a vector store, ingestion is automatic: parsing, chunking, embedding, and indexing happen without you implementing that pipeline yourself. File ingestion is asynchronous, so you typically poll until it is finished, or use official SDK helpers that upload and poll for you.
2.3 Semantic search directly on the store
You can query a vector store using client.vector_stores.search(...) with a natural language query and get back matching chunks and related metadata, including similarity scores and file of origin.
2.4 Power the file_search tool for Responses and Assistants
Vector stores are the backing index for the hosted file_search tool. In the Responses API, you pass vector_store_ids to the file_search tool so the model can retrieve relevant passages before answering.
In the Assistants API, you attach vector stores to assistants and threads via tool_resources, and the file search tool can query them during a run.
2.5 Metadata, filtering, batching, chunking strategy, and expiration
You can attach attributes to vector store files and use filtering in file search calls, for example filtering by a category attribute.
You can attach many files at once using file batches, with a maximum batch size of 2000 files, and optionally override metadata or chunking strategy per file.
You can also set an expiration policy using expires_after so the store is deleted after a period of inactivity, and you are no longer charged after expiration.
3. How to use it in practice, detailed workflow
3.1 Choose the interaction style
You typically choose one of these approaches:
-
Direct retrieval workflow
You create a vector store, upload files, then callclient.vector_stores.search(...)yourself and decide what to do with the results. -
Hosted tool workflow
You create a vector store, upload files, then let the model call the hostedfile_searchtool from within the Responses API or Assistants API.
3.2 Create a vector store
Creating a store is straightforward:
from openai import OpenAI
client = OpenAI()
vector_store = client.vector_stores.create(name="Support FAQ")
print(vector_store.id, vector_store.status)
A vector store tracks ingestion progress through fields like status and file_counts.
3.3 Upload files and wait for ingestion to complete
The easiest path in Python is to use the SDK helper that uploads and polls:
from openai import OpenAI
client = OpenAI()
vector_store = client.vector_stores.create(name="Support FAQ")
client.vector_stores.files.upload_and_poll(
vector_store_id=vector_store.id,
file=open("customer_policies.txt", "rb"),
)
vs = client.vector_stores.retrieve(vector_store.id)
print(vs.status, vs.file_counts)
OpenAI notes ingestion is asynchronous and recommends polling helpers, or monitoring file_counts until processing completes.
Important limits and defaults to keep in mind:
-
Maximum file size is 512 MB
-
Each file should contain no more than 5,000,000 tokens
-
Default chunking is 800 token chunks with 400 token overlap
If you have many files, use the file batch helper shown in the Assistants docs:
from openai import OpenAI
client = OpenAI()
vector_store = client.vector_stores.create(name="Financial Statements")
file_paths = ["edgar/goog-10k.pdf", "edgar/brka-10k.txt"]
file_streams = [open(p, "rb") for p in file_paths]
file_batch = client.vector_stores.file_batches.upload_and_poll(
vector_store_id=vector_store.id,
files=file_streams,
)
print(file_batch.status)
print(file_batch.file_counts)
This pattern is designed for multi file ingestion with polling.
3.4 Run semantic search directly
Once ingestion is complete, you can search:
results = client.vector_stores.search(
vector_store_id=vector_store.id,
query="What is the return policy?",
)
for r in results.data:
print("score:", r.score)
print("file_id:", r.file_id)
print("text:", r.text[:200])
print()
OpenAI describes this as semantic search that can find matches even when few or no keywords overlap.
4. Combining vector stores with other OpenAI APIs, detailed patterns
4.1 Vector stores plus Responses API
This is the most direct way to get RAG behavior with minimal orchestration. You pass the vector store into the file_search tool, and the model can choose to call it.
from openai import OpenAI
client = OpenAI()
response = client.responses.create(
model="gpt-4.1",
input="Answer using the internal docs. What is deep research by OpenAI?",
tools=[{
"type": "file_search",
"vector_store_ids": [vector_store.id],
}],
include=["file_search_call.results"],
)
print(response.output_text)
This usage, including the include=["file_search_call.results"] option to inspect retrieved results, is shown in the File Search guide.
If you store attributes on files, you can filter retrieval:
response = client.responses.create(
model="gpt-4.1",
input="Summarize announcements only.",
tools=[{
"type": "file_search",
"vector_store_ids": [vector_store.id],
"filters": {
"type": "in",
"key": "category",
"value": ["announcement"],
},
}],
)
print(response.output_text)
The File Search guide shows a filters object with an in filter on a metadata key.
4.2 Vector stores plus Assistants API
This is useful when you want a persistent assistant that can keep using the same knowledge base across many user conversations.
Key ideas from the docs:
-
You attach at most one vector store to an assistant and at most one vector store to a thread.
-
If the thread also has a vector store created through message attachments, the file search tool can query both the assistant store and the thread store during a run.
A typical attachment step looks like this:
assistant = client.beta.assistants.update(
assistant_id=assistant.id,
tool_resources={"file_search": {"vector_store_ids": [vector_store.id]}},
)
That exact tool_resources pattern is shown in the Assistants File Search documentation.
The Assistants docs also document default retrieval configuration used by the file search tool, including chunk size, overlap, embedding model, maximum chunks added to context, and ranking behavior.
4.3 Vector stores plus Retrieval API style workflows
Sometimes you do not want the model to decide when to retrieve. You want a deterministic pipeline:
-
Run
client.vector_stores.search(...) -
Take the top results
-
Feed them into a model call as context
The Retrieval guide describes retrieval as useful on its own and especially powerful when combined with models to synthesize responses.
4.4 Vector stores plus tool calling for structured actions
A common production pattern is:
-
Use file search for unstructured knowledge such as policies and manuals
-
Use function tools for structured actions such as database lookup, order status, scheduling, or calculations
-
Let the model combine retrieved passages with fresh structured data
Vector stores handle your document knowledge base, while function tools handle operations and up to date system data. The File Search guide emphasizes that file search is a hosted tool that the model can call automatically.
5. A complete Python example
This example shows:
-
Creating a vector store
-
Uploading files with polling
-
Adding per file attributes
-
Using Responses API with filtered file search
from openai import OpenAI
client = OpenAI()
# 1. Create a vector store
vector_store = client.vector_stores.create(name="Company Knowledge Base")
# 2. Upload multiple files and poll until processed
file_paths = ["docs/policies.txt", "docs/announcements.txt"]
streams = [open(p, "rb") for p in file_paths]
client.vector_stores.file_batches.upload_and_poll(
vector_store_id=vector_store.id,
files=streams,
)
# 3. Attach attributes to a specific file already uploaded to the OpenAI Files API
# If you already have a file_id, you can add it with attributes like this:
# client.vector_stores.files.create(vector_store_id=vector_store.id, file_id="file_123", attributes={...})
# 4. Ask a question, restricting retrieval to a category
resp = client.responses.create(
model="gpt-4.1",
input="Summarize only the announcements in two paragraphs.",
tools=[{
"type": "file_search",
"vector_store_ids": [vector_store.id],
"filters": {
"type": "in",
"key": "category",
"value": ["announcement"],
},
}],
)
print(resp.output_text)
The ingestion helpers, per file attributes, and filters used above are all documented in OpenAI’s Retrieval and File Search guides.
No comments:
Post a Comment