Running Local LLMs with Ollama and Node.js

Run LLMs locally without API costs using Ollama: # Install Ollama curl -fsSL https://ollama.ai/install.sh | sh # Pull a model ollama pull llama2 ollama pull codellama Node.js Integration npm install ollama import { Ollama } from ‘ollama’; const ollama = new Ollama(); // Simple completion const response = await ollama.chat({ model: ‘llama2’, messages: [{ role:…

Rate Limiting AI API Calls in Node.js with Bottleneck

Rate limiting is critical for AI APIs. Here’s a robust implementation: import Bottleneck from ‘bottleneck’; const limiter = new Bottleneck({ reservoir: 60, // 60 requests reservoirRefreshAmount: 60, reservoirRefreshInterval: 60 * 1000, // per minute maxConcurrent: 5, minTime: 100 // 100ms between requests }); // Wrap OpenAI calls const rateLimitedChat = limiter.wrap(async (prompt) => { return…