February 25, 2026

OpenAI Files API

How It Works, What It Is For, and How to Use It with Other APIs

The OpenAI Files API is the platform’s shared file storage layer. You upload a file once, receive a file_id, and then reuse that file across other OpenAI endpoints such as Responses file inputs, Batch jobs, vector stores for file search, and fine tuning. The Files API itself is simple, but it becomes powerful because many other APIs accept file_id as a reference.


1. What You Can Do with the Files API

The Files API provides a small set of core operations.

  1. Upload a file
    Endpoint: POST /v1/files
    Inputs include the file bytes, a required purpose, and an optional expiration policy. Individual files can be up to 512 MB, and a project can store up to 2.5 TB.

  2. List files
    Endpoint: GET /v1/files
    Supports pagination and filtering by purpose.

  3. Retrieve file metadata
    Endpoint: GET /v1/files/{file_id}
    Returns info such as filename, size, purpose, created time, and optional expiration time.

  4. Download file content
    Endpoint: GET /v1/files/{file_id}/content
    Returns the original file contents.

  5. Delete a file
    Endpoint: DELETE /v1/files/{file_id}
    Deletes the file and removes it from all vector stores that reference it.

File purposes matter

When you upload, you must set a purpose. Typical values include assistants, batch, fine-tune, vision, user_data, and evals.

Purposes control what downstream APIs will accept the file, and sometimes the allowed formats. For example, Batch input must be .jsonl and is limited to 200 MB. Fine tuning also requires .jsonl.

Expiration policy

You can optionally set expires_after with an anchor of created_at and a number of seconds. By default, files uploaded with purpose=batch expire after 30 days, while other purposes persist until manually deleted.


2. The Most Important Function of the Files API

The single most important function is: turning a raw document into a reusable identifier.

Once you have a file_id, you can:

  1. Feed the file directly into the Responses API as an input_file item.

  2. Put the file into a vector store so the model can search it via the file_search tool.

  3. Use the file as the input for a Batch job, where each line of a .jsonl file represents a request.

  4. Use the file as training data for fine tuning workflows that require .jsonl.

That is why the Files API is usually the first step in any workflow that involves documents at scale.


3. Using the Files API Step by Step (Detailed)

3.1 Install and authenticate (Python)

The official OpenAI Python library supports Python 3.9 or newer.

# pip install openai

import os
from openai import OpenAI

# Recommended: export OPENAI_API_KEY in your environment
client = OpenAI()

3.2 Upload a file

Key inputs:

  1. file: a binary stream, or a tuple for filename plus content

  2. purpose: required

  3. expires_after: optional expiration policy

from openai import OpenAI

client = OpenAI()

uploaded = client.files.create(
file=open("example.pdf", "rb"),
purpose="user_data",
expires_after={
"anchor": "created_at",
"seconds": 7 * 24 * 60 * 60, # 7 days
},
)

print(uploaded.id)
print(uploaded.filename)
print(uploaded.purpose)

Limits to keep in mind:

  1. One file can be up to 512 MB

  2. A project can store up to 2.5 TB total

3.3 List files

You can paginate with after, set limit, sort by created_at, and filter by purpose.

files_page = client.files.list(
purpose="user_data",
limit=20,
order="desc",
)

for f in files_page.data:
print(f.id, f.filename, f.bytes, f.created_at, f.purpose)

3.4 Retrieve metadata for one file

info = client.files.retrieve("YOUR_FILE_ID")
print(info.id, info.filename, info.bytes, info.created_at, info.purpose, info.expires_at)

This maps to GET /v1/files/{file_id}.

3.5 Download file content

You can use the SDK method client.files.content(file_id) which returns a binary response object.
If you want a fully explicit approach, you can directly call the REST endpoint:

import os
import requests

api_key = os.environ["OPENAI_API_KEY"]
file_id = "YOUR_FILE_ID"

resp = requests.get(
f"https://api.openai.com/v1/files/{file_id}/content",
headers={"Authorization": f"Bearer {api_key}"},
stream=True,
)
resp.raise_for_status()

with open("downloaded_file.bin", "wb") as out:
for chunk in resp.iter_content(chunk_size=1024 * 1024):
if chunk:
out.write(chunk)

print("Saved downloaded_file.bin")

The endpoint is GET /v1/files/{file_id}/content.

3.6 Delete a file

Deleting removes the file and also removes it from any vector stores that include it.

deleted = client.files.delete("YOUR_FILE_ID")
print(deleted.deleted)

4. Combining the Files API with Other OpenAI APIs (Detailed)

4.1 Use files as direct input to the Responses API

The Responses API can accept a file in three common ways:

  1. By file_id returned from the Files API

  2. By Base64 encoded data

  3. By an external URL

It also processes different file types differently. PDFs can include both extracted text and page images on vision capable models, while many other document types are text extracted only. Spreadsheet files have a special augmentation flow.

Example: upload, then ask a question about the file

from openai import OpenAI

client = OpenAI()

file_obj = client.files.create(
file=open("document.pdf", "rb"),
purpose="user_data"
)

resp = client.responses.create(
model="gpt-5",
input=[
{
"role": "user",
"content": [
{"type": "input_file", "file_id": file_obj.id},
{"type": "input_text", "text": "Summarize the main points and list key risks."},
],
}
],
)

print(resp.output_text)

This pattern is best when you have a small number of files and the question is scoped enough that the model can consume the file content directly.

4.2 Use files for retrieval with vector stores and the file_search tool

If your files are large, many, or you want retrieval augmented generation, use file search. File search lets the model retrieve relevant passages from a knowledge base using semantic and keyword search.

Workflow overview:

  1. Upload file with the Files API

  2. Create a vector store

  3. Add the file to the vector store

  4. Query using the Responses API with the file_search tool and your vector store id

Python example (end to end):

import time
import requests
from io import BytesIO
from openai import OpenAI

client = OpenAI()

def upload_from_url(url: str) -> str:
r = requests.get(url)
r.raise_for_status()
file_content = BytesIO(r.content)
filename = url.split("/")[-1]

uploaded = client.files.create(
file=(filename, file_content),
purpose="assistants",
)
return uploaded.id

file_id = upload_from_url("https://cdn.openai.com/API/docs/deep_research_blog.pdf")

vector_store = client.vector_stores.create(name="knowledge_base")

client.vector_stores.files.create(
vector_store_id=vector_store.id,
file_id=file_id,
)

# Simple readiness check by listing files and inspecting status fields in the response
# Some apps use SDK helpers that poll until processing completes
for _ in range(30):
vs_files = client.vector_stores.files.list(vector_store_id=vector_store.id)
# Break when processing is done in your environment
# You can print vs_files and inspect fields like status in the returned objects
time.sleep(2)

response = client.responses.create(
model="gpt-4.1",
input="What is deep research by OpenAI?",
tools=[{
"type": "file_search",
"vector_store_ids": [vector_store.id],
"max_num_results": 3,
}],
include=["file_search_call.results"],
)

print(response.output_text)

This uses the documented vector store creation, adding files, and invoking file search from the Responses API.

Important lifecycle note:

  1. Removing a file from a vector store is not the same as deleting the file itself. Removing only detaches it from that vector store.

  2. Deleting the file removes it from all vector stores.

Also note: the Assistants API is deprecated and scheduled to shut down on August 26, 2026, with Responses positioned as the preferred API going forward.

4.3 Use the Files API with the Batch API

Batch processing starts with a .jsonl file where each line is one request. The Batch guide lists supported endpoints including /v1/responses.

Step A: Create a .jsonl batch input file

Here is a simple example that targets /v1/responses. The body fields match the normal request body you would send to that endpoint.

import json

requests_list = [
{
"custom_id": "job_001",
"method": "POST",
"url": "/v1/responses",
"body": {
"model": "gpt-5",
"input": "Classify the sentiment: I love this product."
},
},
{
"custom_id": "job_002",
"method": "POST",
"url": "/v1/responses",
"body": {
"model": "gpt-5",
"input": "Classify the sentiment: This was a disappointing purchase."
},
},
]

with open("batchinput.jsonl", "w", encoding="utf-8") as f:
for item in requests_list:
f.write(json.dumps(item, ensure_ascii=False) + "\n")

Step B: Upload the .jsonl file via Files API with purpose="batch"

from openai import OpenAI

client = OpenAI()

batch_input_file = client.files.create(
file=open("batchinput.jsonl", "rb"),
purpose="batch",
)

print(batch_input_file.id)

Step C: Create the batch job using the file id

batch = client.batches.create(
input_file_id=batch_input_file.id,
endpoint="/v1/responses",
completion_window="24h",
metadata={"description": "sentiment classification"},
)

print(batch.id, batch.status)

Batch input files have size constraints, and the Files API upload docs call out that Batch input supports .jsonl up to 200 MB.

4.4 Use the Files API for fine tuning data

Fine tuning training files are uploaded via Files API and must be .jsonl.

A minimal first step is just the file upload:

from openai import OpenAI

client = OpenAI()

training_file = client.files.create(
file=open("training.jsonl", "rb"),
purpose="fine-tune",
)

print(training_file.id)

You then pass that file_id into the fine tuning job creation API for your chosen fine tuning workflow.


5. When to Use Files API vs Uploads API

If your file fits within the standard Files API limit, upload directly with POST /v1/files. Files can be up to 512 MB.

If you need to upload larger data, OpenAI also provides an Uploads API that supports multipart uploading up to 8 GB total and produces a normal File object when completed.

No comments:

Post a Comment