Improved Streaming Support and Async Stream Parser
Improved Streaming Support and Async Stream Parser
We’ve made significant improvements to our streaming functionality with two key updates:
Stream Fixes
We’ve resolved several issues with stream handling across different LLM providers, ensuring more reliable and consistent streaming experiences. These fixes address edge cases and improve compatibility with various streaming implementations, including:
- Better handling of stream interruptions and reconnections
- Improved error handling for streaming responses
- Enhanced compatibility with different LLM provider streaming formats
- Fixed timing calculations for streamed responses
New Streaming Methods
The HeliconeManualLogger
class now includes enhanced methods for working with streams:
logStream
: Logs a streaming operation with full control over stream handlinglogSingleStream
: Simplified method for logging a single ReadableStreamlogSingleRequest
: Logs a single request with a response body
Example Usage with Together AI
import Together from "together-ai";
import { HeliconeManualLogger } from "@helicone/helpers";
// Initialize with properties
const helicone = new HeliconeManualLogger({
apiKey: process.env.HELICONE_API_KEY!,
loggingEndpoint: "https://api.worker.helicone.ai/oai/v1/log",
headers: {
"Helicone-Property-Environment": "production",
},
});
export async function POST(request: Request) {
const { question } = await request.json();
const body = {
model: "meta-llama/Meta-Llama-3.1-8B-Instruct-Turbo",
messages: [{ role: "user", content: question }],
stream: true,
} as Together.Chat.CompletionCreateParamsStreaming & { stream: true };
const response = await together.chat.completions.create(body);
const [stream1, stream2] = response.tee();
helicone.logStream(
body,
async (resultRecorder) => {
resultRecorder.attachStream(stream2.toReadableStream());
},
{
"Helicone-User-Id": "123",
}
);
return new Response(stream1.toReadableStream());
}
These improvements make working with streaming LLMs more reliable and efficient, especially for applications that require real-time responses.