OpenAI — APIs : Stream it for faster response

Amarjit Jha
3 min readOct 4, 2023


If you tried OpenAI chat completion api you must have noticed the delay in response.

It is easy to have overlooked the mention of {stream : true} parameter in OpenAI api references which seems to have solved the issue in certain way.

Photo by Mariia Shalabaieva on Unsplash

Excerpt from OpenAI documents :

In simple english — if you make the call with {stream:true}, you gonna receive multiple callbacks with chunks of the response until the response fully completes. Giving you a progressing rendering experience.

But hang on, to pull this off a couple of things has to line up :

1. OpenAI Call : with stream parameter and read the delta.
2. Api : to stream the response back to front-end.
3. Front-end : to receive the delta posted by the server/api.

1. OpenAI Call :

// NodeJS package.json

"dependencies": {
"express": "^4.17.1",
"openai": "^4.0.0",
// NodeJs
// openai-gpt.js

// Initialise OpenAI
const OpenAI = require("openai");

const getOpenAI = () => {
const openai = new OpenAI({
return openai;

// Define chat completion method
const withChatCompletion = async (user, userPrompt) => {
try {
const openai = getOpenAI();
const completion = await{
model: "gpt-3.5-turbo",
messages: [{
role: "system",
content: `When I request a movie list,assume it's for Indian films`
}, userPrompt],
temperature: 0.2,
stream : true
return completion;
} catch (error) {
return null;

module.exports = { withChatCompletion }

2. API to send data :

// NodeJS
// index.js

const express = require('express');
const { withChatCompletion } = require("./openai-gpt");

const app = express();
const port = 3000;

const sysPrompt =

// GET call to pull a sample streamed response from GPT
app.get('/gpt-response', (req, res) => {
try {

const userId = "thisisasimpleuserguid";
const prompt = {
role: "user",
content: `Give me the most popular Rajnikant movies list`

// GPT call
const completion = await withChatCompletion(userId,prompt);
for await (const chunk of completion) {
if (chunk.choices.length > 0) {
const content = chunk.choices[0].delta.content;
if (content != undefined) {
res.write(`data: ${content}\n\n`);
} else {
res.write(`data: break\n\n`);
} catch (error) {


// Start the server
app.listen(port, () => {
console.log(`Server is listening on port ${port}`);

Observe the iteration over the completion for chunks. Each delta content is written on to the res object.

Now the last thing we have to accomplish is listen to these server-sent-events.

3. Front-end consumption :

// JavaScript
// request.js

function getRajniMovies() {
// callback method for each delta response
const streamResponse = (event) => {
const delta =;
elOutput.textContent += delta;
if (delta === "break") newSearch.close();

// callback upon error
const streamError = () => {

// Making a call.
const newSearch = new EventSource(`http://localhost:3000/gpt-response`);
newSearch.onmessage = streamResponse;
newSearch.onerror = streamError;

// Capturin dom elements
const elOutput = document.getElementById("output");
const elButton = document.getElementById("appendButton");
elButton.addEventListener("click", getRajniMovies);

<!DOCTYPE html>
<title>Indian Films : Rajni</title>
<p id="output"></p>
<button id="appendButton">Get Rajnikant Movies</button>
<script src="./request.js"></script>

If you look at the request.js file, it makes the call to api on button click. Every time a delta is received it is appended to the <p id="output"></p> element.

Conclusion :

GPT-calls with {stream:true} in most cases returns the firs token/delta in just 0.1 seconds and subsequent deltas every ~0.01–0.02 seconds.

While OpenAI document says — both kind of calls with stream/or without stream takes almost the same time to fully complete, however with-stream is a far better experience.

Hey — Care to give a clap before you go and I would love to respond to any question/comment or concern that you may have.




Responses (1)