Overview
Master your AI workspace with intelligent agents, seamless integrations, and powerful tools
Cleo centraliza agentes inteligentes, herramientas conectadas y flujos de trabajo productivos en una sola experiencia premium. Usa el menú lateral para explorar cada área.
Quick Start
Launch in minutes and build momentum
- Crea tu cuenta / inicia sesión.Accede a la app y ve a
Settings → API & Keys
. - Configura tus claves de modelo.Introduce al menos una clave (OpenAI, Anthropic, Groq u OpenRouter). Cleo autodetectará disponibilidad y latencias.# Ejemplo (.env.local) OPENAI_API_KEY=sk-... ANTHROPIC_API_KEY=... GROQ_API_KEY=... OPENROUTER_API_KEY=...
- Crea tu primer agente.Ve a
Agents
y pulsa “New Agent”. Elige rolspecialist
para tareas concretas.Config JSON (UI equivalente)
{ "name": "Research Scout", "description": "Busca y resume información actual", "role": "specialist", "model": "gpt-4o-mini", "temperature": 0.4, "tools": ["web_search", "web_fetch"], "prompt": "Eres un agente que verifica, contrasta y sintetiza fuentes creíbles." , "memoryEnabled": true, "memoryType": "short_term" }
Creación vía API (POST)
curl -X POST https://api.tu-dominio.com/api/agents/create -H 'Content-Type: application/json' -H 'Authorization: Bearer <TOKEN>' -d '{ "name": "Research Scout", "description": "Busca y resume información actual", "role": "specialist", "model": "gpt-4o-mini", "tools": ["web_search", "web_fetch"], "prompt": "Eres un agente que verifica y sintetiza fuentes confiables", "memoryEnabled": true, "memoryType": "short_term" }'
- Ejecuta un prompt de prueba.Selecciona el agente recién creado en el panel de conversación y pregunta: “Resume en 5 viñetas las tendencias actuales en IA para edge computing”.curl -X POST https://api.tu-dominio.com/api/agents/execute -H 'Content-Type: application/json' -H 'Authorization: Bearer <TOKEN>' -d '{ "agentId": "<AGENT_ID>", "input": "Resume en 5 viñetas las tendencias actuales en IA para edge computing" }'
- Crea una mini cadena (workflow).Agrega un segundo agente evaluador (rol
evaluator
) para refinar calidad. El supervisor puede delegar automáticamente.1. Specialist
Recolecta y sintetiza info cruda.
2. Evaluator
Verifica, limpia sesgos, estructura.
3. Output Final
Supervisor integra y entrega.
- Guarda y reutiliza.Exporta la configuración de agentes o clónala para nuevas variantes (baja temperatura para datos, alta para ideación).
Checklist de validación
- Clave de modelo válida
- Primer agente creado
- Ejecución exitosa
- Delegación configurada
- Workflow guardado
- Ajuste de temperatura probado
Consejos rápidos
- 0.2–0.4 temperatura: respuesta estable / factual. 0.7–0.9: ideación / creatividad.
- Incluye
objetivo claro
en el prompt: mejora delegación. - Activa memoria corta para contexto de sesión; evita memoria larga si no necesitas persistencia.
- Limita herramientas: 2–3 por agente max para precisión.
Agents
Design, specialize and orchestrate autonomous assistants
Agents in Cleo are modular, typed entities with a defined role
, model
, prompt
, and an allowed tool set. The multi‑agent graph routes tasks between them via the supervisor.
Core Roles
Supervisor
Routes tasks, decides delegation, aggregates final response.
Specialist
Domain‑focused (research, code, analysis, planning, data).
Worker
Executes atomic sub‑tasks (fetch, transform, extract).
Evaluator
Reviews quality, bias, structure & can request rewrites.
Minimal Config
{ "name": "Data Analyst", "role": "specialist", "model": "gpt-4o-mini", "temperature": 0.2, "tools": ["python_runner", "chart_builder"], "prompt": "Eres un analista de datos. Devuelve análisis concisos y verificables.", "memoryEnabled": false }
Expanded Config
{ "name": "Research Planner", "description": "Breaks down broad objectives into structured research tasks", "role": "specialist", "model": "claude-3-5-sonnet", "temperature": 0.4, "tools": ["web_search", "web_fetch", "notion_write"], "prompt": "Actúa como un planificador estratégico. Divide objetivos complejos en pasos claros priorizados.", "objective": "Transform vague goals into actionable research sequences", "customInstructions": "Always ask clarifying questions if scope is ambiguous.", "memoryEnabled": true, "memoryType": "short_term", "stopConditions": ["[FINAL]"], "toolSchemas": { "notion_write": { "properties": { "page": {"type": "string"} } } } }
Lifecycle
- Registration: Agent definition stored; supervisor graph updated.
- Invocation: User or supervisor dispatches request.
- Reasoning / Tooling: Model generates intermediate thoughts & tool calls.
- Delegation (optional): Supervisor re-routes if another agent is better suited.
- Evaluation (optional): Evaluator reviews & refines.
- Finalization: Response aggregated and returned.
Specialization Patterns
- Splitter: Breaks tasks → sub-prompts (planner)
- Researcher: Multi-source synthesis + credibility scoring
- Extractor: Structured JSON output from messy text
- Synthesizer: Combines multi-agent outputs
- Reviewer: Style, tone & factual QA
Delegation Heuristics
- Detect domain keywords ("analyze", "plan", "buscar")
- Check tool availability match
- Fallback to generalist if confidence < threshold
- Escalate to evaluator on low coherence
- Stop chain if cost/time limit exceeded
Best Practices
- One primary objective per agent
- 2–5 tools max; avoid overloading
- Lower temperature for evaluators (0–0.2)
- Use explicit stop tokens in multi-step outputs
- Tag agents (e.g.
research
)
When to Create a New Agent?
- Recurring task with distinct style or constraints
- Needs unique tool combo (e.g. Notion + Web + Python)
- Different temperature / risk tolerance required
- Output format radically different (JSON vs narrative)
- Separate audit / logging channel needed
Prompt Examples
High‑quality prompt patterns for reliable outputs
A curated set of production‑grade prompt archetypes covering system conditioning, structured extraction, reasoning, delegation, and evaluation. All outputs are designed for deterministic parsing and multi‑agent chaining.
Structured Research Synthesizer
Reliable multi-source synthesis with explicit output schema.
You are a senior research synthesis agent. Goal: Produce a concise, unbiased summary. Rules: - Validate each claim with at least 2 sources. - If contradiction exists, surface it explicitly. - Output strict JSON with keys: summary, key_points[], risks[], sources[]. - Do NOT hallucinate. Return only JSON.
Planner Decomposition
Break down vague objective into prioritized task plan.
You are a strategic planning agent. Input: A vague objective. Transform into: { objective, clarifying_questions[], tasks[ {id, title, rationale, dependencies[]} ], risks[], success_criteria[] } Always ask questions first if scope ambiguous. Return JSON only.
Constrained Reasoning Steps
Encourages explicit internal reasoning with bounded length.
You will solve the problem using structured reasoning. Format: THOUGHT[1]: ... THOUGHT[2]: ... FINAL: <answer> Keep each THOUGHT under 25 tokens. If uncertain, state assumptions.
Robust Field Extraction
Turns messy text into typed structured record.
Extract fields from input text. Output strictly JSON: { company: string|null, country: string|null, employees: number|null, funding_stage: enum[seed,series_a,series_b,growth]|null } If missing set null. Never guess. Return ONLY JSON.
Supervisor Delegation Pattern
Supervisor decides whether to route to research or analysis agent.
You are SUPERVISOR. Agents: research_agent (web_search, web_fetch), analysis_agent (python_runner, chart_builder) User query: <INSERT> Evaluate intent: IF requires external info -> delegate:research_agent with objective ELSE IF numeric / data transformation -> delegate:analysis_agent ELSE respond directly. Return JSON: { mode: direct|delegate, target_agent?: string, rationale: string, objective?: string }
Quality & Fact Reviewer
Evaluator that flags factual uncertainty and style issues.
You are an evaluator. Input: draft_response + original_request. Tasks: 1. Score factuality (0-1) 2. List potential hallucinations (if any) 3. Suggest style improvements 4. If rewrite needed, provide improved_response. Return JSON: { factuality: number, hallucinations: string[], improvements: string[], improved_response?: string }
Guidelines
- Prefer explicit JSON schemas for extraction & handoff.
- Bound reasoning tokens: reduces drift + cost.
- Separate evaluation from generation for higher factuality.
- Use lower temperature for system / evaluator prompts.
- Never mix natural language + JSON in machine‑consumable outputs.
Model Strategy
Choose the optimal model per intent, cost and latency
Model selection in Cleo balances latency, determinism, reasoning depth and cost. Use fast tiers for routing & control loops, balanced for planning & synthesis, and escalate only when confidence or structure thresholds fail.
Tier | Models | Latency | Cost | Ideal For |
---|---|---|---|---|
Ultra Fast | gpt-4o-mini, claude-haiku, mistral-small | 50–250ms | Low | Routing, delegation heuristics, light classification |
Balanced | gpt-4o, claude-sonnet, gemini-1.5-pro | 300–1200ms | Medium | General reasoning, planning, structured synthesis |
Heavy Reasoning | claude-opus, oatmega-70b (open) | 1.5–4s | High | Complex multi-hop reasoning, deep evaluation passes |
Specialized | embedding-small, vision-model, audio-large | Varies | Variable | Vector search, OCR, multimodal enrichment |
Selection Heuristics
- Extraction (strict JSON): Small deterministic (gpt-4o-mini) → escalate only on parse failure
- Multi-hop reasoning: Start Balanced (gpt-4o / sonnet), escalate to opus only if reasoning depth score < threshold
- Cost sensitive batch tasks: Use open smaller models + caching + batch API
- Delegation routing: Ultra Fast tier for low latency control loop
- Evaluation / Fact QA: Balanced model at low temperature (0–0.3) for consistency
- Creative ideation: Increase temperature 0.7–0.9 on Balanced tier before using Heavy
Fallback Cascade Pattern
// Pseudocode async function smartInvoke(task) { // Tier 1: fast attempt const fast = await callModel('gpt-4o-mini', task, { timeout: 1800 }) if(fast.parsed && fast.confidence >= 0.82) return fast // Tier 2: balanced refinement const balanced = await callModel('gpt-4o', enhance(fast, task), { temperature: 0.4 }) if(balanced.confidence >= 0.9) return balanced // Tier 3: heavy reasoning escalation return await callModel('claude-opus', enrichWithCritique(balanced, task), { maxTokens: 1200 }) }
- Escalate only when parse fails or confidence < threshold.
- Propagate critique context instead of raw hallucinated text.
- Track token + cost metrics per tier for optimization.
Caching & Cost Control
- Deduplicate identical structured extraction prompts via hash cache.
- Use temperature 0–0.3 for parse‑critical tasks to reduce retries.
- Persist intermediate balanced-tier outputs for heavy escalation reuse.
- Track token usage per agent role to spot misalignment.
- Batch low priority tasks during off-peak windows.
Confidence Signals
- Structural: JSON schema validation pass/fail.
- Self-estimated certainty: Model returns numeric confidence (sanity bound).
- Evaluator score: Independent pass for factuality & coherence.
- Time budget: Abort escalation if nearing SLA limit.
- Cost guardrail: Hard ceiling per user/session triggers degrade mode.
Tool Safety
Approval workflows and secure execution model
Tool execution is governed by scoped permissions, real‑time policy checks, human approval escalation and immutable audit trails. Minimize blast radius by constraining agents to least privilege.
Permission Scopes
Scope | Description |
---|---|
read | Non‑destructive retrieval (fetch, search, list) |
write | Create or modify content (notion_write, file_save) |
execute | Run code or transformations (python_runner, script_exec) |
network | Outbound web requests (web_fetch, api_call) |
sensitive | Access to PII / internal systems; requires explicit approval |
Approval Workflow
Agent tool call → policy check | pass (auto) if scope ∈ allowed && risk < threshold | queue if scope=sensitive OR confidence < 0.75 Queue item → human approve/deny → audit log entry → continue/abort
- Human queue stored with TTL; stale requests auto‑expire.
- UI shows diff / requested arguments for clarity.
- Denied calls propagate structured error to agent for graceful fallback.
Risk Classification
Level | Scopes | Examples |
---|---|---|
Low | read | Public info retrieval, static asset fetch |
Medium | write|execute | Content mutation, code run with sandbox |
High | network|sensitive | External exfiltration vectors, PII read |
Critical | sensitive + execute | Potential lateral movement or data leakage |
Rate Limits
Tool | Quota | Burst | Notes |
---|---|---|---|
web_fetch | 60 / 5m | Burst 5 | Backoff exponential after 429 |
python_runner | 20 / 10m | Serialized | Workspace CPU guard |
notion_write | 40 / 10m | Burst 3 | Queue + retry jitter |
email_send | 100 / 1h | Burst 10 | DMARC compliance + delay |
vector_search | 200 / 5m | Parallel | Cache layer w/ LRU |
Audit Log Schema
Event | Fields |
---|---|
Tool Call Start | timestamp, agentId, tool, argsHash, scope |
Tool Call End | duration, success, errorType, tokensUsed |
Escalation | previousTool, rationale, newScope |
Approval Decision | approverId, decision, latency, justification |
Anomaly Flag | patternType, severity, correlationId |
- All entries carry a correlationId for tracing cross-agent flows.
- High severity anomalies trigger webhook + optional Slack alert.
- Logs are immutable append-only; retention tiered (hot → warm → archive).
Best Practices
- Create separate agents for high‑risk tools (isolate scope).
- Hash + diff args for write operations to show intent clarity.
- Enable human queue only for sensitive+execute not routine writes.
- Alert on unusual burst patterns (entropy of tool sequence).
- Rotate API keys & enforce per‑agent tokens when possible.
Multi-Agent
Delegation, supervision and collaboration patterns
Cleo orchestrates agents through an adaptive supervisor that performs intent routing, delegation, arbitration and evaluation. The system emphasizes minimal escalation, deterministic structure, and explicit confidence signals.
Conceptual Flow
User Input ↓ [ Supervisor ] -- intent classification --> ( route ) | | | | -> direct answer (low complexity) | |--> Specialist A (research) | |--> Specialist B (analysis) | ↓ | Worker agents (extraction, transform) | ↓ |<-- Aggregated partial outputs -- | ↓ |--> Evaluator (quality/factuality/style) | ↓ (approve / request revision) Final Response --> User
Graph edges represent potential delegation; actual path chosen by heuristics (intent, tool availability, confidence, cost budget).
Orchestration Phases
- Intake: Normalize user input; detect language; strip PII if required.
- Intent Classification: Light fast model or rules to map to domain + complexity level.
- Routing Decision: Select direct response vs delegation; choose specialist set.
- Task Decomposition: Optional planner expansion into structured sub‑tasks.
- Execution: Specialists + workers perform reasoning + tool calls.
- Synthesis: Combine multi‑agent outputs (order, conflict resolution).
- Evaluation: Quality, factuality, coherence, style normalization.
- Finalization: Formatting, safe content filters, response packaging.
Routing Strategies
- Keyword + Tool Match: Map intent tokens to agents whose tool set intersects required capability.
- Confidence Threshold: If classifier confidence < τ → escalate to generalist or ask clarification.
- Cost-Aware Routing: Prefer cheapest capable agent unless complexity score > threshold.
- Adaptive Feedback: Evaluator signals misrouting; update routing weights incrementally.
- Composite Voting: Sample 2 light models for classification; use consensus or escalate.
Arbitration Patterns
- Evaluator Gate: Evaluator must approve if risk score > R or novelty flag set.
- Dual Response Compare: Two specialists produce outputs → evaluator chooses or merges.
- Progressive Refinement: Draft → critique → improved draft (limit N cycles).
- Conflict Resolution: If contradictory claims → request sources or escalate to higher tier model.
- Time Budget Abort: If cumulative execution time > SLA threshold → degrade gracefully.
Supervision Loops
- Light Supervision: Supervisor only delegates & aggregates; no evaluator unless uncertainty flagged.
- Inline Evaluation: Evaluator reviews each intermediate artifact before next stage.
- Periodic Audit: Every N tasks, sample outputs for deeper factual QA.
- Escalation Ladder: Uncertain → evaluator → heavy model → human (optional).
- Self-Critique Injection: Agent produces THOUGHT + CRITIQUE internally before FINAL output.
Optimization Tips
- Cache classification & routing decisions by normalized query signature.
- Short‑circuit evaluator when structural parse already passes high confidence.
- Limit refinement loops (N ≤ 2) to prevent cost spirals.
- Track per‑role token + latency metrics to prune underperforming agents.
- Fallback to single‑agent mode in degraded / high load states.
Image Generation
Creative rendering with model selection & limits
Troubleshooting
Common issues, diagnostics and recovery steps
Use this guide to quickly isolate issues across routing, delegation, tooling, memory and cost. Patterns are designed for rapid triage with structured remediation.
Area | Symptom | Likely Cause | Action |
---|---|---|---|
Connection | Intermittent 504 / timeouts | Model provider latency spike | Failover to secondary key; check status page |
Agents | Delegation never triggers | Routing heuristics confidence too strict | Lower threshold or add domain keywords |
Tools | Frequent 429 on web_fetch | Rate limit exceeded | Introduce jitter & batch queries |
Memory | Context truncation early | Max tokens too low | Increase maxTokens or enable streaming summarizer |
Costs | Token usage spikes suddenly | Escalation loop / evaluator recursion | Cap refinement cycles; add safeguard counter |
Output | Invalid JSON parse | Temperature too high or missing schema framing | Add explicit JSON schema + reduce temperature |
API Diagnostics
- List agents:
GET /api/agents
- Recreate orchestrator:
POST /api/agents/register?recreate=true
- Execute agent:
POST /api/agents/execute { agentId, input }
- List tasks:
GET /api/agent-tasks
- Check metrics:
GET /api/agents/metrics
- Reset thread:
POST /api/threads/reset { threadId }
Error Taxonomy
Code | Meaning |
---|---|
routing.miss | Supervisor selected suboptimal agent; adjust thresholds |
delegation.timeout | Worker exceeded execution window; raise timeout or optimize task |
tool.rate_limited | 429 from provider; apply backoff + queue |
model.hallucination | Low factual confidence; trigger evaluator rewrite |
parse.failure | JSON invalid; enforce schema & retry with lower temp |
memory.overflow | Too many tokens; compress older context |
cost.guardrail | Budget exceeded; degrade to fast tier + reduce depth |
Error codes are structured to allow automated remediation triggers.
Recovery Playbooks
Routing Broken
- Enable debug routing logs
- Lower confidence threshold 0.85 → 0.7
- Add explicit keyword mapping
- Rebuild orchestrator
Escalation Loop
- Set max refinement cycles = 2
- Add token guard
- Log evaluator triggers
- Fallback to balanced tier
High Latency
- Activate streaming
- Switch to fast tier
- Enable partial synthesis
- Batch similar requests
JSON Failures
- Wrap schema in fenced block
- Remove narrative instructions
- Lower temperature
- Add validator + retry
Tool Flood
- Apply per-agent rate limiter
- Throttle high-frequency tool
- Introduce queue + jitter
- Alert on anomaly
Memory Drift
- Shorten conversation window
- Enable summarizer
- Disable long_term memory temporarily
- Reset thread context
Preventative Monitoring
- Alert on escalation chain length > 2.
- Track JSON parse failure rate; auto‑lower temperature if spike detected.
- Log per‑tool p95 latency & throttle anomalies.
- Capture evaluator disagreement rate as drift signal.
- Budget guard: emit event at 80% daily cost threshold.
Frequently Asked Questions
Answers to recurring questions