Event-Triggered VLM

The core pattern: Atriva detects an event on-device → your backend receives the event webhook → you call a VLM with the event snapshot → the VLM provides richer analysis.

This keeps VLM costs proportional to actual incidents, not camera uptime.

Architecture

Camera
  └─▶ Atriva Pipeline (on-device)
        └─▶ Detection: PPE violation (confidence 0.91)
              └─▶ POST /webhook → Your Backend
                    ├─▶ Fetch snapshot from Atriva
                    └─▶ Call VLM API with snapshot + prompt
                          └─▶ VLM response → Store / Alert / Report

Event Payload

When Atriva fires a webhook, the payload looks like:

{
  "event_id": "evt_01J9X...",
  "type": "ppe_violation",
  "confidence": 0.91,
  "timestamp": "2025-04-26T08:32:11Z",
  "camera_id": "cam_warehouse_north",
  "zone": "loading_dock",
  "snapshot_url": "http://edge-device/snapshots/evt_01J9X.jpg",
  "metadata": {
    "missing_ppe": ["helmet", "high_vis_vest"],
    "worker_count": 1
  }
}

Calling a VLM

Python example (Claude claude-sonnet-4-6)

import anthropic
import httpx
import base64

def handle_atriva_event(event: dict) -> str:
    # Fetch snapshot from the edge device
    img_bytes = httpx.get(event["snapshot_url"]).content
    img_b64 = base64.standard_b64encode(img_bytes).decode()

    client = anthropic.Anthropic()

    prompt = f"""
You are a workplace safety analyst. An AI system detected a PPE violation.

Event details:
- Type: {event['type']}
- Zone: {event['zone']}
- Missing PPE: {event['metadata']['missing_ppe']}
- Confidence: {event['confidence']}

Analyze the image and:
1. Confirm whether the violation is visible
2. Describe the worker's location and activity
3. Assess the immediate safety risk (low / medium / high)
4. Write one sentence for the incident log
"""

    response = client.messages.create(
        model="claude-sonnet-4-6",
        max_tokens=512,
        messages=[
            {
                "role": "user",
                "content": [
                    {
                        "type": "image",
                        "source": {
                            "type": "base64",
                            "media_type": "image/jpeg",
                            "data": img_b64,
                        },
                    },
                    {"type": "text", "text": prompt},
                ],
            }
        ],
    )
    return response.content[0].text

Python example (OpenAI GPT-4o)

import openai, httpx, base64

def handle_atriva_event(event: dict) -> str:
    img_bytes = httpx.get(event["snapshot_url"]).content
    img_b64 = base64.standard_b64encode(img_bytes).decode()

    client = openai.OpenAI()
    response = client.chat.completions.create(
        model="gpt-4o",
        messages=[
            {
                "role": "user",
                "content": [
                    {"type": "text", "text": f"Analyze this PPE violation event: {event}"},
                    {"type": "image_url", "image_url": {"url": f"data:image/jpeg;base64,{img_b64}"}},
                ],
            }
        ],
        max_tokens=512,
    )
    return response.choices[0].message.content

Webhook Receiver (FastAPI)

from fastapi import FastAPI, Request
import asyncio

app = FastAPI()

@app.post("/atriva/webhook")
async def receive_event(request: Request):
    event = await request.json()

    # Only call VLM for high-confidence, actionable events
    if event["confidence"] >= 0.85 and event["type"] in ("ppe_violation", "fall", "intrusion"):
        analysis = handle_atriva_event(event)
        await store_incident(event, analysis)
        await route_alert(event, analysis)

    return {"status": "ok"}

Tips

Cache snapshots locally before passing to the VLM — the edge device snapshot URL may expire.
Batch low-urgency events (e.g., queue depth analytics) and send to the VLM hourly rather than in real time.
Gate by confidence — only invoke the VLM when Atriva’s confidence exceeds your threshold (e.g., 0.85+).
Use structured output — ask the VLM to respond in JSON so downstream systems can parse risk level and log entry programmatically.

Atriva Edge AI Platform