Infrastructure and Security
Support and monitoring
Real monitoring is not only “green ping in Grafana”: it is knowing which metric matters for the business, who wakes up when it breaks, and what to do in the first fifteen minutes. Good support does not stack “awaiting customer reply” without stating what was already tried. Viscale treats operations like a product: visible queue, priority agreed with you, and retrospectives that change configuration — not only close tickets.
Observability layers that talk to product. API latency, error rate, growing queues, disk, and DB connections — each alert with a justified threshold and the right recipient. We avoid “never fires” and “fires every night for no reason” with the same care: tuning from real incidents, not guesses.
What we can deliver
Business-hours or 24/7 on-call
Scope and channel (Slack, email, phone) with first-response SLA.
Environment health dashboards
One page leadership understands: green, yellow, red with context.
Alerts with one-page runbooks
Symptom → check → action or escalation.
Patch management and windows
Critical CVEs with apply plan and rollback.
Monthly cost and performance review
What went up, why, and what can be trimmed safely.
Deploy and release support
Someone on the agreed window with a smoke-test checklist.
ITSM integration
Jira, Freshdesk, or spreadsheet — single ticket history.
Postmortems and improvement backlog
Incidents become three owned actions with dates.
Support with context and no hot potato. When a ticket opens, we log what we already checked, which environment, which version, and the hypothesis — so the next on-call does not restart from scratch. If the cause is code, we loop engineering with logs and repro steps; if it is infra, we act within the contract without hiding risk.
Continuous improvement that shows in time spent. After meaningful incidents we run a lean postmortem with owned actions (patch, new threshold, documentation). If cloud cost climbed, we name the noisy service and options — autoscaling, log lifecycle, or heavy query. The CTA is to talk to us and align on-call, maintenance windows, and what P0 vs P2 means in your world.
Portfolio of Support and monitoring
Deliverables
Service agreement (SLA)
Hours, channels, severities signed off.
Monitoring dashboards
Links and plain-language legend.
Runbook library
PDF or wiki with short steps.
Incident log
Searchable history with root cause when known.
Contact and escalation list
Who calls whom in each shift.
Periodic reports
Email or PDF with agreed metrics.
Technical improvement backlog
Prioritized items with rough sizing.
Patch evidence
Date, package, environment.
Maintenance calendar
Planned windows visible to the team.
Handoff session
If switching vendor or internal team.
Execution methodology
-
Technical onboarding
Access, environments, contacts, and what is business-critical.
-
SLAs and severities
P0–P3 with first-response and target resolution times.
-
Minimum instrumentation
Metrics, logs, and agreed alerts.
-
Initial runbooks
One pager per critical service.
-
On-call drill
We simulate failure and measure time to action.
-
Steady-state operations
Queues, changes, weekly or biweekly comms.
-
Change management
Windows, rollback, product notice.
-
Alert review
Cut noise, add what incidents showed was missing.
-
Monthly report
Incidents, MTTR, cost, open risks.
-
Quarterly retro
What to change in contract or architecture.