Expert demonstrating interface use to naive user
Expert drives end-to-end while narrating. The user asks clarifying questions but never takes the controls.
Sixty-two patterns across ten session formats. Every entry is a real session captured with a named, paid, consenting practitioner. Filter by format, mode, participants, or the kind of human signal you need.
Expert drives end-to-end while narrating. The user asks clarifying questions but never takes the controls.
User drives the task; the expert intervenes, narrates over the top, or validates choices.
An expert reviews a captured user session and produces judgement, rewriting, ranking, or prediction.
Three or more experts surface, debate, decompose, or converge on a contested question.
Work moves across stages between participants — brief writer, executor, reviewer.
One practitioner completes a real task end-to-end across multiple tools, narrating reasoning continuously.
Two practitioners share a task with role asymmetry, genuine disagreement, or live co-creation.
Two participants with unequal expertise — explanation under misunderstanding, second-opinion consult, reframed second attempt.
AI moderates the session; a human expert reviews and annotates the capture afterwards.
Sessions captured in the participant's native language, with optional translation and code-switching annotation layers.
An experienced practitioner drives a complex multi-tool task from start to finish while narrating their reasoning. The learner watches and asks clarifying questions but never touches the interface. The expert's mental model is captured cleanly, anchored against learner questions that surface what the expert would otherwise leave implicit.
This is the sample to commission when the question is what an expert's uncontaminated workflow looks like, with the *why* layer attached. Ideal for training models that need to demonstrate expert-mode reasoning in a domain rather than learn it from scratch.
An expert performs the task once with the learner observing in silence, then the learner attempts the same task while the expert watches and only corrects at the end. The two-pass structure isolates the gap between watching competence and reproducing it, and reveals which steps experts execute without conscious thought.
This is the sample to commission when the research question is about the gap between observing skill and exercising it. Useful for instruction-tuning models that need to convert a worked example into a usable procedure.
An expert demonstrates the wrong way to perform a task first, lets the learner see the failure mode, then re-runs the workflow correctly. Comparative reasoning is the central artefact: *the right way* is articulated explicitly against *why the wrong way is wrong*.
This is the sample to commission when the research question is about contrastive reasoning. The two-version structure provides natural preference pairs grounded in real expert judgement rather than synthesised pairings.
The expert lays out a verbal plan up front, *here's how I'd tackle this*, before any tool is touched. The learner then attempts to execute. The structure separates strategy from tactics, capturing the planning layer that gets compressed or skipped in most session formats.
This is the sample to commission when the research question is about expert planning behaviour. Particularly useful for agentic-reasoning training where the gold path needs an explicit plan attached, not just the trajectory.
The user starts the session already in a broken or confused state. The expert must first diagnose what went wrong, then fix it, then explain how to avoid the same failure in future. Three distinct cognitive moves are captured: diagnosis, repair, and prevention.
This is the sample to commission when the research question is about debugging reasoning, not just task reasoning. Models trained on clean trajectories often fail when handed a broken state; this captures the unfailed step where an expert reads the broken state correctly.
The learner is moderately skilled, not naive. The expert deliberately pushes them with harder variants and edge cases they would not normally encounter. Expert reasoning at the upper end of difficulty rather than the entry level.
This is the sample to commission when the research question is about expert behaviour at the difficult end of a domain. Synthetic data tends to model entry-level competence; this captures the part most other corpora miss.
The learner knows how to do the task in their usual environment but now needs to do it in an unfamiliar one: a new tool, a new jurisdiction, a new constraint. The expert bridges the gap. Transfer reasoning rather than first-time learning.
This is the sample to commission when the research question is about adaptation and generalisation. Trains models on the specific cognitive work of mapping a known method to an unfamiliar context.
The user makes an error partway through the task and must decide whether to undo, repair, or start over. The expert helps them triage. Sunk-cost reasoning is captured under real time pressure, against real consequences for getting the call wrong.
This is the sample to commission when the research question is about recovery and triage reasoning. The decision shape itself, not the eventual outcome, is the signal.
The framing shifts from *let's get this done* to *let's make sure you can do this next time without me*. The expert deliberately holds back, prompts the learner to recall, and builds in reinforcement loops. Pedagogical reasoning rather than task reasoning.
This is the sample to commission when the research question is about teaching behaviour. Captures the specific cognitive shape of scaffolding, intentionally distinct from just completing the work efficiently.
The learner attempts the task and only calls on the expert when they get stuck. The expert stays silent the rest of the time. The session captures the natural confusion-articulation-intervention-resolution loop that drives most real-world help-seeking behaviour.
This is the sample to commission when the research question is about how naive users actually ask for help, in their own words, when they hit a wall. Particularly valuable for help-system and assistant-behaviour training.
The expert watches the learner work and intervenes *before* failure rather than after, predicting where the learner is about to go wrong. The expert's predictive model of novice mistakes is made explicit through pre-emptive correction.
This is the sample to commission when the research question is about how experts anticipate user error. The intervention shape, what triggers it and what gets corrected, is the signal.
The learner performs the task. The expert provides continuous running commentary on what the learner is doing right and wrong, like a coach calling a game. High-density signal because the expert is articulating judgements continuously rather than waiting for explicit decision points.
This is the sample to commission when the research question requires a near-continuous expert reasoning track. Most session formats produce sparse reasoning; this format produces it densely across the whole session.
The learner explains their intended approach before doing anything. The expert challenges, refines, or approves. Then the learner executes. The format captures expert reasoning about whether a plan will work *before* it is tested against reality.
This is the sample to commission when the research question is about plan critique. The judgement shape is distinct from execution shape, and most data conflates them.
The user completes the task alone, then submits the recording and any artefacts produced. An expert reviews asynchronously, walking through the finished output, scrubbing the recording with running commentary, or reverse-engineering intent from the artefact alone. The user's behaviour is uncontaminated by the expert's presence.
This is the sample to commission when the research question requires both an unmoderated trace of user behaviour *and* expert evaluative reasoning over it. The async structure means neither contaminates the other.
Several independent users attempt the same task. An expert reviews all of them and compares them. Cross-session reasoning rather than single-session review. The format surfaces what experts treat as common pitfalls versus what they treat as idiosyncratic mistakes.
This is the sample to commission when the research question is about the *distribution* of user behaviour on a fixed task, viewed through an expert's pattern-recognition lens.
An expert applies an explicit rubric to user attempts and produces a ranking with justifications per criterion. Explicit rubric articulation is a distinct cognitive task from open-ended critique: the expert must surface and defend the standards they would normally apply implicitly.
This is the sample to commission when the research question is about graded judgement against explicit criteria. Particularly relevant for reward-model training where the rubric structure matters as much as the rating.
The expert produces their own version of the task rather than just commenting on the user's attempt. Expert output, not just expert critique, is captured, anchored against a real user attempt as the reference point.
This is the sample to commission when the research question needs both a reference output and a parallel expert output on the same task. The pairing is the signal.
Two-party asynchronous dialogue. The expert leaves review notes; the user responds in writing or video with clarifications, defences, or revisions. Captures the back-and-forth that single-expert review misses, including the rejection of expert feedback when the user has good reason.
This is the sample to commission when the research question is about dialogic review, where the user's defence of their choices is part of the signal.
Multiple experts independently review one user session, then their reviews are compared. Multi-expert disagreement is the point. Reveals where the field itself does not agree on what good practice looks like.
This is the sample to commission when the research question is about contested expert judgement, where labelling a single answer would mislead the model.
The expert reviews the first half of a user session and predicts what the user will do next, then the prediction is revealed against what the user actually did. The expert's predictive model is captured explicitly rather than inferred from post-hoc commentary.
This is the sample to commission when the research question is about expert anticipation rather than expert critique. The prediction-then-reveal mechanic produces structured, ground-truthed predictions of human behaviour.
The same user completes the same (or carefully matched) task twice: once without AI assistance, once with. The pair captures where AI changes the workflow, what it replaces, and what it leaves unchanged. Mode A captures both sessions; Mode B captures the AI interactions inside the second one.
This is the sample to commission when the research question is about the *effect* of AI on workflow rather than about AI use in the wild. The matched-pair structure gives controlled comparison rather than naturalistic distribution.
Two to five experts in the same domain work from a list of options and argue toward a ranked priority order. Disagreement is expected and the resolution process is the point. Run as two stages: independent ranking first with think-aloud reasoning, then group discussion focused on divergence points, with optional revision at the end.
This is the sample to commission when the research question is about how experts weigh contested options against each other. The two-stage structure separates individual judgement from group dynamics.
Two or more experts who hold genuinely different positions on a contested question argue the case for their view. No resolution is required, the disagreement itself is the output. Recruitment screens for opinion strength to ensure the disagreement is real, not staged.
This is the sample to commission when the research question is about contested terrain where the field disagrees. The absence of a 'correct' answer is the feature, not the bug.
A group of experts must reach a single agreed answer or recommendation on a complex question. The negotiation, the concessions, and the reasoning shifts along the way are captured. Two-stage: independent positions first, then group negotiation toward consensus.
This is the sample to commission when the research question is about consensus-building behaviour. The reasoning shifts under social pressure are part of the signal, not noise.
A group of experts independently estimate something (cost, timeline, probability, severity), then compare and discuss the differences. Essentially the Delphi method. Two-stage mandatory: independent estimates first, then reveal and discuss, then optional revised estimates.
This is the sample to commission when the research question is about calibration of expert estimates and the social dynamics of revision. The structured comparison reveals where experts cluster and where they spread.
Two or more experts are presented with the same case, problem, or artefact and asked to independently diagnose, evaluate, or recommend, then compare. Common pattern in medicine, law, engineering review. Two-stage: independent diagnosis first with reasoning narrated, then group comparison focused on divergence.
This is the sample to commission when the research question is about reasoning convergence and divergence on the same input. Tightly comparable expert opinions on identical material.
A group of experts brainstorms approaches to a problem with the goal of producing as many distinct viable options as possible. The opposite of consensus building. Two-stage: independent option generation first, then group review to deduplicate and add hybrids.
This is the sample to commission when the research question is about creative range rather than judgement quality. Captures the breadth of expert imagination on an open problem.
Two experts are deliberately assigned opposing positions (one defending, one attacking) on a proposal, plan, or decision. The format captures the strongest counter-arguments experts can construct, even against positions they might privately hold.
This is the sample to commission when the research question is about adversarial reasoning quality. The assigned positions force expression of arguments that might otherwise stay implicit.
A group of experts collectively reviews someone else's work, plan, or decision and discusses what is good and bad about it. Disagreements between reviewers are where the signal lives. Two-stage: independent critique first with written notes, then group discussion that surfaces where critiques agreed and where they diverged.
This is the sample to commission when the research question is about evaluative consensus and divergence on a third party's output. Distinct from peer-to-peer review because the reviewers are not the work's authors.
One or more naive users ask questions of a panel of experts on a topic. Experts may agree, disagree, or build on each other's answers. Captures how experts handle being challenged by someone outside their field, and how they coordinate (or fail to coordinate) on what to tell a non-expert.
This is the sample to commission when the research question is about expert-to-novice communication under panel dynamics. Different signal from one-to-one expert teaching because the experts watch each other.
The first participant produces a written brief or specification for a task. The second participant receives the brief and executes against it without further contact with the first. Captures how well intent transfers through written documentation alone, and where executors fill gaps, misinterpret, or push back.
This is the sample to commission when the research question is about specification quality and execution fidelity. The handoff gap is the signal: what the brief said versus what the executor did with it.
The first participant develops a strategy or approach for a problem. The second participant has to operationalise it. The format captures the translation step from strategic intent to operational action, where most workflows fail in practice.
This is the sample to commission when the research question is about strategy-to-execution translation. Particularly relevant for any model that needs to convert high-level plans into concrete steps.
The first participant produces a plan or design. The second participant builds or implements from the plan. The format captures the gap between design intent and implementation reality, where the builder must reason about what the planner left ambiguous.
This is the sample to commission when the research question is about specification interpretation under genuine ambiguity. The builder's gap-filling decisions are the signal.
The first participant analyses information and produces findings or recommendations. The second participant uses those findings to make a decision. The format separates analysis from action and captures the judgement layer that converts findings into commitment.
This is the sample to commission when the research question is about decision-making downstream of analysis. The decision is not the analysis; the gap between them is structurally distinct.
The first participant gathers raw information from multiple sources. The second participant synthesises it into a usable output. The format captures the structural choices a synthesiser makes against raw evidence, often quite different from what the gatherer assumed they would do.
This is the sample to commission when the research question is about information synthesis under multiple sources. The synthesiser's selection and weighting decisions are the signal.
The first participant produces a draft. The second participant reviews and returns it with feedback. The format captures both the draft as a real artefact and the review as a structured response to it. Optional back-and-forth where the drafter responds to the review.
This is the sample to commission when the research question is about review behaviour grounded in real draft material. Distinct from open critique because the reviewer is responding to specific drafted choices.
Three or more participants pass an artefact through sequential roles, each adding, transforming, or gating. Captures realistic cross-functional workflows where each handoff is a potential failure point, and the final artefact is the cumulative product of multiple decision-makers.
This is the sample to commission when the research question is about multi-hop workflows. The compound effect of sequential handoffs cannot be reconstructed from single-handoff data.
A single expert completes a real task end-to-end across multiple tools, narrating reasoning continuously. No second party present. Captures uncontaminated expert workflow at full speed with the reasoning layer attached.
This is the sample to commission when the research question is about how an expert actually works when nobody is watching. The reasoning narration is the signal; the absence of an audience is the production condition that makes it real.
A single participant gathers information across multiple sources and synthesises it into a deliverable, narrating choices of tool and source. The format captures both the search behaviour and the synthesis layer that converts search results into a usable artefact.
This is the sample to commission when the research question is about research-and-synthesis workflows. Distinct from search behaviour because the synthesis is the point.
A single participant completes a real workflow that spans a customer database, a productivity suite, and supporting tools. Includes the data lookups, the decisions, and the final action. The format captures the realistic shape of CRM-driven work where context lives in one tool and action lives in another.
This is the sample to commission when the research question is about workflows that span structured data and unstructured action. The cross-application reasoning is the signal.
A single participant coordinates a real-world plan across booking systems, calendars, and communication tools, with realistic constraints and trade-offs. The format captures multi-system orchestration where the participant must hold competing constraints in mind simultaneously.
This is the sample to commission when the research question is about real-world planning under constraint, where the cognitive load of juggling multiple systems is part of the work.
A developer completes a real coding task that spans local development tools, version control, browser-based documentation, and deployment interfaces. The format captures the full developer workflow rather than just the code, including the reasoning about when to consult docs, when to test, when to deploy.
This is the sample to commission when the research question is about developer behaviour at the workflow level rather than the code level. The tool-switching reasoning is part of the signal.
A designer takes a brief through to a working artefact, moving between reference gathering, design tools, and feedback channels. The format captures the realistic shape of design work where the artefact emerges through iteration against reference material, not in a single forward pass.
This is the sample to commission when the research question is about iterative design work. The reference-to-artefact loop is structurally distinct from linear task completion.
A real practitioner completes a real piece of their own work using the AI tools they normally use, narrating their reasoning as they go. The AI use is not staged. The participant decides, in the moment, when to ask the AI and when not to. The session captures both the work and the AI interactions inside it.
This is the sample to commission when the question is not *can an AI tool do this task* but *what does a competent professional actually do with an AI tool in their hands while doing their real job.*
The participant pursues a task approach that does not work, recognises the failure, and recovers by choosing a different path. The format captures the moment of detection and the reasoning about how to recover, both of which are usually invisible in clean task completion data.
This is the sample to commission when the research question is about self-correction at the workflow level. The decision shape, when to abandon an approach versus push through, is the signal.
Two practitioners with complementary expertise work on a shared problem. Each brings a different perspective; the resolution requires both. The asymmetry is structural rather than incidental.
This is the sample to commission when the research question is about complementary-expertise collaboration. The handoff and integration between perspectives is structurally distinct from peer collaboration.
Two or more participants surface a genuine disagreement on a shared problem and work through to a resolution. The disagreement is real, screened for during recruitment, not staged.
This is the sample to commission when the research question is about live disagreement resolution in collaborative work. The resolution path is the signal; the disagreement is the precondition.
A group decomposes a shared problem into sub-problems, divides the work, executes in parallel, and reconvenes to integrate. Captures realistic distributed work where the decomposition decision shapes the eventual outcome.
This is the sample to commission when the research question is about distributed parallel work. The decomposition and integration steps are structurally distinct from the parallel execution itself.
Two or more participants build a single artefact together in real time, with active hand-off, negotiation of choices, and live revision. Synchronous co-creation rather than asynchronous handoff.
This is the sample to commission when the research question is about real-time collaborative production. The negotiation embedded in the act of creating is the signal.
An expert explains a concept to a learner who has a real, pre-existing misunderstanding, and adjusts the explanation as the misunderstanding surfaces. Recruitment screens to ensure the misunderstanding is genuine, not performed. Conversation only, no interface used.
This is the sample to commission when the research question is about how experts diagnose and correct learner mental models in real time, in dialogue alone. Particularly relevant for AI assistant tuning on explanation behaviour.
A domain expert and a novice work together verbally to define and frame a problem the novice has brought to the session. The expert narrates choice-making throughout: which questions to ask first, what to defer, what to ignore, why. No tool driving, just the framing layer.
This is the sample to commission when the research question is about expert problem-framing reasoning, the cognitive layer that sits before any tool is touched. Most data captures expert execution; this captures expert orientation.
A senior practitioner critiques a junior's work in progress, with the work owner defending decisions and incorporating feedback live. Conversation-focused; the artefact is referenced but not actively built.
This is the sample to commission when the research question is about evaluative dialogue rather than evaluative monologue. The defence layer from the work's author is part of the signal.
A primary practitioner consults a peer or senior expert mid-task for a second opinion. Both participants engage with the same artefact verbally, but the consult itself is conversational rather than action-driving.
This is the sample to commission when the research question is about consultation dynamics, the specific shape of asking for and giving a second opinion under time pressure.
The participant tackles a problem once, then is given a reframed version of the same problem and tackles it again. Captures whether and how the framing changes the approach.
This is the sample to commission when the research question is about how problem framing shapes expert reasoning. The paired structure isolates the framing effect from the underlying problem.
An AI moderator conducts a structured 45-minute interview with a participant on a defined topic. Conversation only, no interface driven. After the session, a separately recruited expert annotates the transcript, marking where the participant's answers reveal real reasoning versus rehearsed surface responses.
This is the sample to commission when the research question is about high-volume structured qualitative data without a human moderator's overhead. The expert annotation layer compensates for the lack of live human judgement.
An AI moderator conducts a structured session while the participant shares their screen and demonstrates work. The AI prompts for narration and clarification across the workflow. After the session, an expert annotates the resulting recording.
This is the sample to commission when the research question requires both a real workflow capture and structured AI-led prompting to surface reasoning, with expert annotation for quality control.
An AI moderator debriefs a participant after they have completed a real task, structured around the artefact they produced. An expert then annotates the AI-led debrief, marking where the debrief missed important reasoning the participant skipped past.
This is the sample to commission when the research question is about retrospective reasoning surfacing. The AI conducts the debrief at scale; the expert annotation reveals where the AI's questioning was insufficient.
A participant completes a task entirely in their native (non-English) language across native-language tools. The format captures language-specific workflows, terminology, and tool behaviour that English-only sessions miss. Optional aligned English translation is delivered as a separate output.
This is the sample to commission when the research question is about how work actually happens in a target language, not how it is described post-hoc in English. The language-specific tool behaviour is part of the signal.
Two or more participants conduct a discussion (debate, prioritisation, consensus building, etc.) entirely in a non-English language. Captures multi-party reasoning in the target language with native participants on native ground.
This is the sample to commission when the research question is about multi-party reasoning in a non-English language. Translation of English transcripts is not equivalent.
Participants who naturally code-switch between English and another language during real work do so during the session. Captures realistic mixed-language professional reasoning rather than enforced monolingual sessions.
This is the sample to commission when the research question is about realistic code-switching behaviour. The switching itself is the signal, and forced monolingualism would erase it.
An identical (or carefully matched) task is run with one participant in English and another in a target language. The outputs are paired for translation and localisation research, with the matching structure preserved across both sessions.
This is the sample to commission when the research question is about cross-language comparison on controlled tasks. The matching is what enables translation-quality and localisation work.
Any expert demonstration format (training a learner, deliberate failure, plan-then-execute) is run end-to-end in a non-English language with bilingual capture. Standard expert demonstration session structure with language overlay.
This is the sample to commission when the research question is about expert-mode reasoning in a target language. Distinct from translation: the demonstration was never in English to begin with.
Every sample in the catalogue is produced through Askable’s audited platform — the same SaaS that 10,000+ research teams use daily, repurposed for LLM training-data extraction. Not consultants with spreadsheets. Code paths.
Recruitment, consent, capture, moderation, transcription, tagging, review, and delivery all happen inside one system, with one audit trail. That’s what makes the work productised — and what makes it hold up under scrutiny at scale.
Millions of identity-verified humans across 50+ countries. Triple-verified, paid, consenting. Any demographic, industry, or geography — including the niche audiences others can’t reach.
Scheduling, calendar sync, SMS & email reminders, identity verification, brief-specific consent, incentive payments, and automated replacements. The entire participant lifecycle, productised.
LiveKit-powered real-time A/V, screen-share, and multi-party rooms. Sessions recorded against the consented brief, encrypted in transit and at rest, multi-region routing for low-latency capture.
Moderated interviews, AI-moderated interviews, remote and live mobile/website testing, prototype tasks, surveys, diary studies. Every modality is a code path on the same platform — mixable per brief.
Card sorting, tree tests, first-click tests, five-second tests, stack ranking, prioritisation. Structured, unmoderated tasks run at scale — producing clean, quantifiable behavioural signal alongside qualitative capture.
Real-time transcription, speaker diarisation, semantic embedding, AI annotation, and evidence-linked output structuring. Every finding traces back to a timestamp, a quote, and a participant.
Every layer above runs inside the same audited platform. One identity system. One consent ledger. One audit log. One delivery pipeline. That’s productisation — not a description of how we work.
See the audit postureThe public site shows the front of the card. The lab partnership unlocks the underlying capture: video, audio, transcripts, structured logs, attribution chains.