December 16, 2025

Open AI Hosted File Search and Vector Stores in Python

 

Open AI Hosted File Search and Vector Stores in Python

This document explains what the OpenAI Python client supports for file ingestion, vector stores, semantic search, attribute filtering, and how those pieces connect to “talk to your documents” experiences, while keeping the focus on API support.

What OpenAI is solving here

OpenAI is solving both parts:

  • Hosted search: you upload files into a vector store, then run semantic search over chunked content and get back matching chunks plus scores.

  • Hosted retrieval plus generation: you can enable the file_search tool in the Responses API, so the model can automatically retrieve from your vector stores before generating an answer.

If you only want retrieval, use vector store search directly. If you want “chat with docs”, use Responses with file_search.

Core objects and the workflow

Files

You upload a document and get a file_id. For file search workflows, docs show using purpose="assistants" when creating the file.

Simple file upload example (local file):

from openai import OpenAI client = OpenAI() file_obj = client.files.create( file=open("docs/policy.pdf", "rb"), purpose="assistants", ) print(file_obj.id) # Example output: "file_abc123"

Vector stores

A vector store is a hosted index used by the Retrieval guide and by the file_search tool.

Create a vector store:

from openai import OpenAI client = OpenAI() vs = client.vector_stores.create(name="knowledge_base") print(vs.id) # Example output: "vs_abc123"

Attach files to a vector store and wait for indexing

You can attach a file by file_id, then poll status until it is completed.

from openai import OpenAI client = OpenAI() result = client.vector_stores.files.create( vector_store_id=vs.id, file_id=file_obj.id, ) print(result.status) # Often starts as "in_progress"

Polling style check:

from openai import OpenAI import time client = OpenAI() while True: files = client.vector_stores.files.list(vector_store_id=vs.id) statuses = [f.status for f in files.data] print(statuses) if all(s == "completed" for s in statuses): break time.sleep(2)

SDK helper option for local files, one shot upload plus poll:

from openai import OpenAI client = OpenAI() vs = client.vector_stores.create(name="Support FAQ") client.vector_stores.files.upload_and_poll( vector_store_id=vs.id, file=open("customer_policies.txt", "rb"), )

Searching a vector store

Vector store search returns relevant chunks, similarity scores, and file info.

Minimal search:

from openai import OpenAI client = OpenAI() query = "What is the return policy?" results = client.vector_stores.search( vector_store_id=vs.id, query=query, ) print(results.object) # "vector_store.search_results.page" print(results.data[0].score) # Example: 0.85 print(results.data[0].content[0].text) # One matching chunk

Search request options include filters, max_num_results (1 to 50), and rewrite_query.

from openai import OpenAI client = OpenAI() results = client.vector_stores.search( vector_store_id=vs.id, query="Summarize the refund rules for EU customers", rewrite_query=True, max_num_results=5, ) print(results.search_query) # The query used

If you want a custom ordering, sort client side using score:

sorted_hits = sorted(results.data, key=lambda r: r.score, reverse=True) top = sorted_hits[0] print(top.filename, top.score)

Attribute metadata, filtering, and update

Each vector store file can have up to 16 key value pairs as attributes, and search filters can reference them.

Attach a file with attributes:

from openai import OpenAI client = OpenAI() client.vector_stores.files.create( vector_store_id=vs.id, file_id=file_obj.id, attributes={ "region": "US", "category": "Marketing", "date": 1672531200, # Example timestamp }, )

Filter search by attributes:

from openai import OpenAI client = OpenAI() results = client.vector_stores.search( vector_store_id=vs.id, query="What is the campaign approval process?", filters={ "category": {"eq": "Marketing"}, "region": {"eq": "US"}, }, max_num_results=10, ) for hit in results.data: print(hit.filename, hit.attributes, hit.score)

Update attributes later:

from openai import OpenAI client = OpenAI() updated = client.vector_stores.files.update( vector_store_id=vs.id, file_id=file_obj.id, attributes={"category": "Marketing", "region": "US", "priority": 2}, ) print(updated.attributes)

Chunking, limits, and supported file types

For Retrieval indexing, docs state:

  • Max file size is 512 MB, and max 5,000,000 tokens per file when attached.

  • Default chunking is 800 tokens with 400 overlap, and you can set a chunking_strategy with constraints.

  • A list of supported file types is provided, including pdf, docx, pptx, md, txt, and various code formats.

Getting an AI answer using file search in Responses

File search is a built in hosted tool in the Responses API. The model can automatically call it, retrieve from your vector store IDs, and then generate a response.

Basic Responses call with file_search enabled:

from openai import OpenAI client = OpenAI() response = client.responses.create( model="gpt-4.1", input="What are the key points of our return policy?", tools=[{ "type": "file_search", "vector_store_ids": [vs.id], }], ) print(response.output_text) # The model answer

If you want the raw retrieval hits returned in the API response, use include=["file_search_call.results"].

from openai import OpenAI client = OpenAI() response = client.responses.create( model="gpt-4.1", input="List the refund conditions and cite the sources.", tools=[{"type": "file_search", "vector_store_ids": [vs.id]}], include=["file_search_call.results"], ) # You can inspect tool results inside response.output items print(response.output_text)

You can also guide tool usage using tool_choice in the request.

Manual RAG style synthesis from search results

The Retrieval guide also shows a manual pattern: call vector store search, format the chunks, then send them to a model to synthesize a grounded answer.

A simplified version:

from openai import OpenAI client = OpenAI() query = "What is the return policy?" results = client.vector_stores.search(vector_store_id=vs.id, query=query) sources_text = "\n\n".join( "\n".join(part.text for part in hit.content) for hit in results.data ) completion = client.chat.completions.create( model="gpt-4.1", messages=[ {"role": "developer", "content": "Answer using only the sources."}, {"role": "user", "content": f"Sources:\n{sources_text}\n\nQuestion:\n{query}"}, ], ) print(completion.choices[0].message.content)

Operational details worth knowing

Costs

Pricing includes:

  • File Search storage: 0.10 USD per GB of vector storage per day, first GB free

  • File Search tool call cost (Responses API): priced per tool call

Expiration and cost control

Vector stores support expires_after, and the Retrieval guide notes that once expired, associated vector store files are deleted and you stop being charged for them.

from openai import OpenAI client = OpenAI() client.vector_stores.update( vector_store_id=vs.id, expires_after={"anchor": "last_active_at", "days": 7}, )

Multipart uploads

Uploads can accept up to 8 GB total and expire about an hour after creation, producing a normal File when completed.

This matters if your ingestion pipeline needs multipart transport, even though Retrieval indexing has its own file size limits.

Inspect parsed content

You can retrieve the parsed content of a vector store file, which is useful for debugging chunking and extraction.

from openai import OpenAI client = OpenAI() content = client.vector_stores.files.content( vector_store_id=vs.id, file_id=file_obj.id, ) print(content.filename) print(content.content[0].text)

Removing a file from a vector store is not the same as deleting the file

Deleting a vector store file removes it from the vector store but does not delete the underlying file object.

Listing and sorting vector stores

The list endpoint supports sorting by created_at via the order query parameter.

from openai import OpenAI client = OpenAI() recent_first = client.vector_stores.list(order="desc", limit=5) oldest_first = client.vector_stores.list(order="asc", limit=5) print([x.id for x in recent_first.data])

Data controls

OpenAI’s platform docs say API data is not used to train or improve models by default unless you explicitly opt in.

Assistants deprecation timeline

OpenAI’s deprecations page states the Assistants API is deprecated and will be removed one year after Aug 26, 2025, with a shutdown date of Aug 26, 2026, and guidance to use the Responses API. https://platform.openai.com/docs/deprecations 

Example

End to end search plus an AI answer

This example shows: input document, create vector store, upload, search, then ask for an answer using Responses file_search.

from openai import OpenAI client = OpenAI() # Input: a local file that contains your policies path = "docs/customer_policies.txt" # Create a vector store vs = client.vector_stores.create(name="Support FAQ") # Upload and wait until indexed client.vector_stores.files.upload_and_poll( vector_store_id=vs.id, file=open(path, "rb"), ) # Input: a user question question = "What is the return policy?" # Direct search for debugging and transparency hits = client.vector_stores.search( vector_store_id=vs.id, query=question, max_num_results=3, ) print("Top search hits:") for h in hits.data: # Output shape: # h.filename, h.score, h.content is a list of text chunks print(h.filename, h.score) print(h.content[0].text[:200], "...") print() # Now let the model retrieve and answer using file_search response = client.responses.create( model="gpt-4.1", input=question, tools=[{"type": "file_search", "vector_store_ids": [vs.id]}], include=["file_search_call.results"], ) print("Answer:") print(response.output_text) # Output shape note: # response.output includes tool call items plus message items. # When include is set, you can inspect file_search_call.results for retrieved chunks.

No comments:

Post a Comment