OpenAI — APIs : Stream it for faster response
If you tried OpenAI chat completion api openai.chat.completion()
you must have noticed the delay in response.
It is easy to have overlooked the mention of {stream : true}
parameter in OpenAI api references which seems to have solved the issue in certain way.
Excerpt from OpenAI documents :
In simple english — if you make the call with {stream:true}
, you gonna receive multiple callbacks with chunks of the response until the response fully completes. Giving you a progressing rendering experience.
But hang on, to pull this off a couple of things has to line up :
1. OpenAI Call : with stream parameter and read the delta.
2. Api : to stream the response back to front-end.
3. Front-end : to receive the delta posted by the server/api.
1. OpenAI Call :
// NodeJS package.json
"dependencies": {
...
"express": "^4.17.1",
"openai": "^4.0.0",
},
// NodeJs
// openai-gpt.js
// Initialise OpenAI
const OpenAI = require("openai");
const getOpenAI = () => {
const openai = new OpenAI({
apiKey: "YOUR-OPEN-AI-API-KEY-GOES-HERE",
});
return openai;
};
// Define chat completion method
const withChatCompletion = async (user, userPrompt) => {
try {
const openai = getOpenAI();
const completion = await openai.chat.completions.create({
model: "gpt-3.5-turbo",
messages: [{
role: "system",
content: `When I request a movie list,assume it's for Indian films`
}, userPrompt],
user,
temperature: 0.2,
stream : true
});
return completion;
} catch (error) {
return null;
}
};
module.exports = { withChatCompletion }
2. API to send data :
// NodeJS
// index.js
const express = require('express');
const { withChatCompletion } = require("./openai-gpt");
const app = express();
const port = 3000;
const sysPrompt =
// GET call to pull a sample streamed response from GPT
app.get('/gpt-response', (req, res) => {
try {
const userId = "thisisasimpleuserguid";
const prompt = {
role: "user",
content: `Give me the most popular Rajnikant movies list`
}
// GPT call
const completion = await withChatCompletion(userId,prompt);
for await (const chunk of completion) {
if (chunk.choices.length > 0) {
const content = chunk.choices[0].delta.content;
if (content != undefined) {
res.write(`data: ${content}\n\n`);
} else {
res.write(`data: break\n\n`);
res.end();
}
}
}
} catch (error) {
res.end();
}
});
// Start the server
app.listen(port, () => {
console.log(`Server is listening on port ${port}`);
});
Observe the iteration over the completion
for chunks
. Each delta
content is written
on to the res
object.
Now the last thing we have to accomplish is listen to these server-sent-events
.
3. Front-end consumption :
// JavaScript
// request.js
function getRajniMovies() {
// callback method for each delta response
const streamResponse = (event) => {
const delta = event.data;
elOutput.textContent += delta;
if (delta === "break") newSearch.close();
};
// callback upon error
const streamError = () => {
newSearch.close()
};
// Making a call.
const newSearch = new EventSource(`http://localhost:3000/gpt-response`);
newSearch.onmessage = streamResponse;
newSearch.onerror = streamError;
}
// Capturin dom elements
const elOutput = document.getElementById("output");
const elButton = document.getElementById("appendButton");
elButton.addEventListener("click", getRajniMovies);
<!DOCTYPE html>
<html>
<head>
<title>Indian Films : Rajni</title>
</head>
<body>
<p id="output"></p>
<button id="appendButton">Get Rajnikant Movies</button>
<script src="./request.js"></script>
</body>
</html>
If you look at the request.js
file, it makes the call to api on button click. Every time a delta is received it is appended to the <p id="output"></p>
element.
Conclusion :
GPT-calls with {stream:true}
in most cases returns the firs token/delta in just 0.1 seconds and subsequent deltas every ~0.01–0.02 seconds.
While OpenAI document says — both kind of calls with stream/or without stream takes almost the same time to fully complete, however with-stream is a far better experience.
Hey — Care to give a clap before you go and I would love to respond to any question/comment or concern that you may have.
Cheers