Task-Boundary Mode How task-instance boundaries are drawn from the event stream. Applies to every Task SoP, Step SoP, and Variants view.

On This Page

Data Foundations

Data Sources
Inside-out Pyze JS + outside-in Celonis Task Mining
Pyze JS — Veeva Safety
Inside-out instrumentation. JavaScript deployed inside Veeva Safety captures every click, form interaction, section navigation, and case action at DOM level with direct case-ID linkage from the URL/DOM.
23,723 events (42.4%)
Pyze JS — Phobos & Vaults
Same JS technology deployed in Merck's Phobos intake system, RIM Vault, and Quality Vault. Case linkage via swivel-chair detection.
14,664 events (26.2%)
Celonis Task Mining
Outside-in desktop monitoring captures application switching, window titles, and interaction events across all desktop and web apps. Case linkage via swivel-chair detection.
17,625 events (31.5%)
Pyze captures 68.5% of all signal — 2.2x the coverage of traditional task mining alone. Inside-out instrumentation captures granular in-application behavior (section-level navigation, field-level edits, button clicks) that desktop monitoring cannot see. Combined with Celonis Task Mining, this creates a complete Digital Twin of how work actually happens.
📈
Confidence Model
How we filter events before computing metrics

Every event is assigned a confidence level indicating the reliability of its case association:

LevelMeaningEvents
HighCase ID extracted directly from URL/DOM (Pyze JS)28,341
Medium-HighStrong swivel-chair attribution (short gap, same user)3,554
MediumProbable swivel-chair attribution3,693
LowWeak attribution (long gap or ambiguous context)20,424

All agent analyses include High, Medium-High, and Medium confidence events (35,588 events, 63.5%). Low confidence events are excluded to ensure metric accuracy. Exception: the AI Effectiveness (gpteal) analysis includes all confidence levels because we're measuring tool adoption behavior, not case-level accuracy.

Handling Time & Cycle Time Definitions
How we separate active work from wait time

Handling time = sum of active dwell time per case per user. Any gap exceeding 5 minutes between consecutive events on the same case is classified as idle/wait time (not active touch). This threshold captures context-switching breaks while excluding brief pauses for reading or thinking.

Cycle time = wall-clock duration from first to last event on a case.

The ratio handling / cycle measures "touch efficiency" — what percentage of elapsed time involves active work.

📋
Savings Projection & Data Hygiene
How we extrapolate pilot observations

Three Savings Tiers

  1. Pilot Period: Hours observed during the 19-day pilot.
  2. Annualized (17 users): Pilot hours × (250 working days / 19 pilot days). Assumes representative volume.
  3. Projected (1,000 users): Annualized hours × (1,000 / 17). Linear projection — does not account for economies/diseconomies of scale.
Caveat: The 1,000-user projection is a directional estimate for business case sizing. Actual savings depend on role distribution, case mix, and process maturity at scale.

Data Hygiene

Events from personal browsing domains (social media, shopping, personal finance) are filtered from all analyses. All user identities are anonymized to sequential labels (Analyst 01 through Analyst 17). No personally identifiable information is included.

Opportunities vs Benchmarks

Every finding surfaced by an agent is classified as either an Opportunity or a Benchmark. The distinction drives how it flows through the platform — savings totals, lifecycle workflow, and what the team should do with it.

Opportunity

An actionable finding that proposes a specific remediation and has measurable savings in hours and dollars.

Treatment on the platform

  • Shows projected savings at pilot, 17-user, and 1,000-user scale
  • Receives a composite risk factor and Automation Readiness Score
  • Enters the four-state lifecycle (Surfaced → Accepted → Remediating → Remediated)
  • Counts toward dashboard savings totals

Example findings

  • High-Latency Handoffs — fix: automated task notifications
  • Delete-Confirm Loop — fix: bulk delete UX
  • gpteal Productivity Uplift — fix: close adoption gap
Benchmark

A measurement or observation that informs strategy but doesn't propose a concrete fix on its own. Provides the context that makes opportunities credible.

Treatment on the platform

  • Displays a headline metric instead of a savings number
  • Does not enter the lifecycle (no Accept / Remediate workflow)
  • Does not count toward dashboard savings totals
  • Still carries validation context and ties to the theme it informs

Example findings

  • gpteal Adoption Dashboard — 53% adoption rate
  • Touch Efficiency Ratio by Case Type — handling/cycle ratio
  • Stage-Level Handling Time — effort concentration map

How we classify

An agent's output becomes an Opportunity only if it passes all three tests:

  1. Specific remediation — the finding points to a concrete fix (RPA bot, UI change, integration, AI agent deployment)
  2. Measurable savings — we can project hours and dollars recovered at pilot, annual, and scaled volume
  3. Implementation clarity — a delivery team could take the finding and translate it into a scoped project

If any test fails, the finding is classified as a Benchmark. Benchmarks are intentionally kept out of savings totals so the commitment-ready numbers stay honest — but they're prominently displayed alongside opportunities because the measurement context is what makes the opportunity credible. An opportunity saying "automate narrative drafting" is much stronger paired with a benchmark showing "users spend 22.8s per narrative interaction" — the two are designed to work together.

Why we draw the line this way Discovery tools that treat every observation as a savings opportunity produce inflated business cases that don't survive scrutiny. The Opportunity/Benchmark split keeps the conversation honest: what can we commit to, what do we know, and what's the difference.

Autonomous Agents

Each agent is a self-contained analyzer with its own detection logic, signals, and scoring approach. Click to expand any agent for full methodology.

Rework
Detects cross-application round-trip patterns

What It Detects

Case-bound round-trip patterns where an analyst leaves the system of record, performs work in a supporting app, and returns — with the round trip repeating multiple times on the same case. These patterns signal missing integrations, UI gaps, or workflow habits that create hidden friction.

Signals It Analyzes

case_id source_app event_timestamp activity pyzeClick (drilldown)

Events are ordered by timestamp within each case, then scanned for A→B→A sequences where A is an instrumented app and B is the "detour" app. Transitions with <2 second dwell in the detour are filtered as navigation artifacts rather than real work.

How It Scores

Each rework pattern is scored by: frequency × dwell_in_detour_app × cases_affected. Patterns that repeat across many analysts and cases rank highest.

Example Pattern from the Merck Pilot

Acrobat-Word Narrative Drafting Loop: Users switched between Adobe Acrobat and Microsoft Word every 6-8 seconds during narrative drafting. 1,261 consecutive transitions detected across 77 cases. Indicates the built-in Veeva Narrative editor isn't meeting the drafting workflow — candidate for an integrated PDF side-by-side viewer or AI-assisted entity extraction.

View Rework agent →

Cycle Time
Measures wall-clock case duration and handoff latency

What It Detects

Cases that take longer than they should — through stage-level wait time, inter-user handoff latency, or extended "background" sessions. Distinguishes active work from idle time to separate "we're working on it slowly" from "it's sitting in a queue."

Signals It Analyzes

event_timestamp end_timestamp handoff_wait_hours case_id_source page_title (case type)

For each case, computes: total wall-clock duration, per-stage span (using PV stage mapping), and inter-user wait time (from the handoff analysis table). Case types are parsed from page_title to segment by SUSAR / SAE / AE and Initial / Follow-up.

How It Scores

Findings are ranked by: median wait time in the problematic stage, frequency of cases affected, and variance relative to peer cases of the same type. Outlier cases (e.g., 300+ hours idle) are surfaced as specific examples.

Example Pattern from the Merck Pilot

Follow-up SUSAR Handoff Latency: Cases awaiting 'Select Documents' after 'View Combined Review' sit idle for a median of 131 hours (5.4 days) before the next analyst picks them up. 41% of sequential handoffs exceed 3 days. Root cause appears to be missing task-assignment notifications, not missing work.

View Cycle Time agent →

Handling Time
Computes active touch time and peer-benchmarks analysts

What It Detects

Where effort concentrates across PV stages and which analysts handle similar work fastest. Unlike cycle time (wall-clock), handling time captures only active, focused interaction — the time a keyboard or mouse is actually engaged with the case.

Signals It Analyzes

total_dwell_ms edit_count click_count analyst_name activity (stage map)

Sums total_dwell_ms per case per user, grouped by PV stage. The 5-minute idle threshold separates active touch from wait time. Compares analysts on the same case types to identify peer benchmarks.

How It Scores

Findings surface: touch efficiency (handling ÷ cycle) by case type, analyst variance (best vs median on like cases), and stage-level effort concentration. A 25th-percentile target is used to project savings from bringing slower analysts to peer pace.

Example Pattern from the Merck Pilot

Touch Efficiency Variance: For SUSAR cases, analysts range from 0.2 min to 0.9 min of active handling per case — a 4x spread. If the 8 slowest analysts matched the median, ~46K hours of capacity would be freed annually across 17 users.

View Handling Time agent →

Automation
Mines repetitive, deterministic click sequences

What It Detects

Click sequences that recur across many cases and analysts with deterministic outcomes — the clearest RPA candidates. Focuses on patterns where the same sequence always produces the same result, distinguishing them from judgment-heavy work (which is the AI Discovery agent's territory).

Signals It Analyzes

activity (n-gram) pyzeClick sequence element_tag inner_text action_sequence

Extracts activity n-grams (2-6 length) within cases from the event log, and fine-grained pyzeClick sequences from the drilldown. Sequences are scored by frequency, estimated time per execution, and breadth across analysts.

How It Scores

Each candidate is assigned an Automation Readiness Score (0-100) along four factors:

  • Pattern Frequency (30%) — volume of repetition
  • Decision Complexity (30%) — RPA-simple vs AI-hard
  • Data Structure (20%) — structured forms vs unstructured content
  • Cross-App Scope (20%) — single app (easy) vs multi-system (harder)

See Automation Readiness Score for full scoring detail.

Example Pattern from the Merck Pilot

Veeva-Phobos Synchronous Handshake: 99% of 'Complete Action' events in Veeva Safety are followed within 25 seconds by a 'Proceed' or 'Advance' click in Phobos on the same Case ID. 87/88 sequences in the pilot show identical pattern — prime candidate for a Veeva Web Action that triggers the Phobos advance automatically.

View Automation agent →

AI Discovery
Identifies judgment-heavy work where AI agents can assist

What It Detects

Work that is not a deterministic loop but is judgment-heavy — drafting, classification, coding, translation. The complement to the Automation agent: these are the patterns where RPA fails but AI agents can augment human analysts.

Signals It Analyzes

total_dwell_ms (high) edit_count (high) click_count (low) source_app (cross-app) gpteal domain hits

Flags activities with high dwell + high edits + low clicks (signature of thinking/writing work). Detects cross-app patterns involving Word, Outlook, or Acrobat tied to narrative or assessment stages. Surfaces existing gpteal usage as proof that users are already self-serving AI.

How It Scores

Candidates are classified by AI capability type: Summarization (source documents), Drafting (narratives), Classification (case triage), Translation (localized cases), Coding (MedDRA lookups). Each gets its own Automation Readiness Score tuned for AI Agent remediation (lower structure, higher complexity).

Example Pattern from the Merck Pilot

GenAI Narrative Drafting: Users spend an average of 22.8 seconds per interaction on the description field in Veeva Safety. The 'Generate Narrative from Outline' feature triggers at 0ms dwell (background operation), after which users spend significant time editing the output. Quality gap in the auto-generated draft is the real bottleneck — not the drafting workflow itself.

View AI Discovery agent →

AI Effectiveness
Measures adoption and productivity uplift from GenAI tools

What It Detects

How Merck's existing GenAI tool (gpteal) is being used — who adopts it, how often, on which case types, and whether adoption correlates with measurable productivity gains. Answers "is our AI investment actually landing?" before expanding rollout.

Signals It Analyzes

source_app (gpteal domains) user_id case_id event_timestamp (daily trend) cases/day (throughput)

Filters events where source_app matches gpteal domains (gpteal.merck.com, dtgpteal.merck.com, talkgpteal.merck.com). Uses the events_all view (includes Low confidence) because gpteal usage often appears as swivel-chair events with weaker case linkage. Computes per-analyst adoption tiers, cohort comparisons, and retention signals.

How It Scores

Per-user tier classification (Power / Regular / Light / Minimal / Non-Adopter) based on event count and active days. Cohort comparison between adopters and non-adopters on cases-per-day productivity — the savings opportunity quantifies the gap if non-adopters matched adopter throughput.

Example Pattern from the Merck Pilot

gpteal Productivity Uplift: Adopters process 9.14 cases/day on average vs 5.14 for non-adopters — a 78% throughput advantage. Handling time per case is similar, so the gain comes from reduced between-case friction. Closing the adoption gap for the 8 current non-adopters would recover ~4,376 hours annually at 17 users, ~312K hours at 1,000 users.

View AI Effectiveness agent →

Scoring Frameworks

Automation Readiness Score
0-100 score quantifying how automation-ready each opportunity is

Every opportunity surfaced by the Automation and AI Discovery agents receives a 0-100 Automation Readiness Score combining four independently-measured factors via weighted average.

Factor Weight What It Measures How It's Computed
Pattern Frequency30%How often the pattern repeatsBucketed by annualized volume: >1,000 hrs/yr = 95, >500 = 80, >100 = 60, else 40
Decision Complexity30%Deterministic vs judgment-heavyRPA = 90, UX = 75, Integration = 60, AI Agent = 30
Data Structure20%Structured vs unstructured inputsRPA/UX = 90, Integration = 70, AI Agent = 40
Cross-App Scope20%Single app vs multi-systemSingle = 90, cross-app = 60, multi-system = 40 (from finding text)

Score Bands

  • Very High   80-100 — ready to automate with high confidence
  • High   60-79 — strong candidate; minor discovery needed
  • Medium   40-59 — partial automation possible
  • Low   <40 — AI-assisted, not fully automated
Risk Adjustment — Confidence-Weighted Savings
Four-dimension weighting that converts unadjusted savings into business-case-ready numbers

Unadjusted savings estimates answer "what is the theoretical maximum if every opportunity is fully realized?" Risk-adjusted savings answer a more honest question: "what should we reasonably expect given real-world constraints?"

Each opportunity is scored High (1.0) / Medium (0.8) / Low (0.5) across four dimensions. The composite factor multiplies the unadjusted savings.

Dimension Weight High (1.0) Medium (0.8) Low (0.5)
Detection Confidence40%Strong statistical signalClear pattern, limited sampleSuggestive only
Implementation Feasibility25%Proven approachCustom integration workNovel AI/ML build
Adoption Readiness20%Invisible to userSimilar workflowSignificant behavior change
Compliance Path15%Light validationStandard CSVFull re-validation

Composite factor = (Detection × 0.40) + (Feasibility × 0.25) + (Adoption × 0.20) + (Compliance × 0.15)

Risk-adjusted annual savings = Unadjusted annual savings × Composite factor

How to read the number: The risk-adjusted total represents the savings to commit in a business case today. As opportunities progress through the lifecycle, factors update with actual implementation data and the forecast converges toward realized value.

Operational Framework

🔄
Opportunity Lifecycle & Value Realization
Surfaced → Accepted → Remediating → Remediated → Monitored

Discovery is only the first step. Every opportunity tracks through a four-state lifecycle — from initial detection through measured value realization — so teams see what's been acted on, what's in flight, and whether expected savings are actually being captured.

1
Surfaced
2
Accepted
3
Remediating
4
Remediated
StateWho Owns ItWhat the Platform Does
SurfacedBA triageContinues collecting evidence; readiness score updates as data arrives
AcceptedBA / Ops leadSnapshots baseline metrics for later comparison
RemediatingImplementation teamMonitors for early behavioral change pre-deployment
RemediatedOps lead / financeContinuously measures actual hours saved vs projected
DeclinedGovernancePattern stays monitored; re-surfaced if material growth

Value Realization Monitoring

Every Remediated opportunity enters continuous post-implementation monitoring. The platform compares three measurements against the locked baseline:

  • Throughput delta — cases per day per analyst. Expected to rise after remediation.
  • Handling time delta — active touch time per case. Expected to fall for the targeted pattern.
  • Pattern recurrence — does the original rework/loop/handoff pattern still appear?

Actual savings are reported weekly against the projected estimate. If realized value is below forecast after 90 days, the opportunity is flagged — the agent re-analyzes and surfaces any secondary patterns blocking full realization.

Why the full loop matters: Discovery tools create backlogs that go stale. Pyze closes the loop — every opportunity tracks from detection through implementation to measured realization. Teams verify what they actually saved, not just what they could.