CAPABILITIES & SAMPLE LIBRARY

Explore session formats, outputs, samples and data categories.

Sixty-two patterns across ten session formats. Every entry is a real session captured with a named, paid, consenting practitioner. Filter by format, mode, participants, or the kind of human signal you need.

Session formats

Delivery modes

A + B

Session patterns

Examples per pattern

Formats engineered to capture the human experience.

Every pattern in the library belongs to one of these ten formats. Each format defines the shape of the session — who is present, who leads, and what kind of work is being captured.

CAT 01

Expert demonstrating interface use to naive user

Expert drives end-to-end while narrating. The user asks clarifying questions but never takes the controls.

Patterns9

ModesA×9

— Expert demonstrates, user watches

— Expert demonstrates, user replicates after

— Expert teaches by deliberate failure

CAT 02

Naive user learning from expert

User drives the task; the expert intervenes, narrates over the top, or validates choices.

Patterns4

ModesA×4

— User drives, expert guides on request

— User drives, expert intervenes proactively

— User drives, expert narrates over the top

CAT 03

Expert reviews user's unmoderated session

An expert reviews a captured user session and produces judgement, rewriting, ranking, or prediction.

Patterns8

ModesA×7 · A+B×1

— Expert reviews user's unmoderated session

— Expert reviews multiple users tackling the same task

— Expert grades and ranks user attempts

CAT 04

Multi-party discussion

Three or more experts surface, debate, decompose, or converge on a contested question.

Patterns9

ModesA×9

— Multi-expert prioritisation

— Multi-expert debate on a contested question

— Multi-expert consensus building

CAT 05

Multi-stage handoff

Work moves across stages between participants — brief writer, executor, reviewer.

Patterns7

ModesA×7

— Brief writer to executor

— Strategist to operator

— Planner to builder

CAT 06

Solo expert thinking aloud

One practitioner completes a real task end-to-end across multiple tools, narrating reasoning continuously.

Patterns8

ModesA×7 · A+B×1

— Expert solo task with think-aloud

— Cross-application research and synthesis

— CRM-to-output workflow

CAT 07

Collaborative problem-solving

Two practitioners share a task with role asymmetry, genuine disagreement, or live co-creation.

Patterns4

ModesA×4

— Two-practitioner shared task with role asymmetry

— Genuine disagreement and live resolution

— Distributed problem decomposition across roles

CAT 08

Asymmetric expertise dialogue

Two participants with unequal expertise — explanation under misunderstanding, second-opinion consult, reframed second attempt.

Patterns5

ModesA×5

— Conversational explanation under genuine misunderstanding

— Practitioner-led problem framing

— Asymmetric design or work critique

CAT 09

AI-moderated session with post-task review

AI moderates the session; a human expert reviews and annotates the capture afterwards.

Patterns3

ModesA+B×3

— AI-moderated interview

— AI-moderated interview with screen share

— AI-moderated post-task review with annotation

CAT 10

Multilingual / translation sessions

Sessions captured in the participant's native language, with optional translation and code-switching annotation layers.

Patterns5

ModesA×5

— Native-language solo task

— Native-language multi-party discussion

— Bilingual session with code-switching

The sample library.

Each row is a session pattern. The front of the card shows title, format, and modality emphasis. Click a row for the full pattern record — pattern, illustrative scenario, outputs, configuration, capabilities served, visual artefact, and adjacent patterns in the graph.

62 of 62 patterns

Sort

Format0 selected

Capture mode0 selected

Participants0 selected

Human signal0 selected

Contains moments0 selected

S-001Expert demonstrates, user watchesAudioScreenCAT 01 · expert demoMode A

Illustrative scenario

Experienced home cook training a beginner to plan and shop for a week of dinners across Google Sheets (meal plan), Woolworths online ordering, AllRecipes, YouTube cooking videos, and a notes app for the running shopping list.

An experienced practitioner drives a complex multi-tool task from start to finish while narrating their reasoning. The learner watches and asks clarifying questions but never touches the interface. The expert's mental model is captured cleanly, anchored against learner questions that surface what the expert would otherwise leave implicit.

This is the sample to commission when the question is what an expert's uncontaminated workflow looks like, with the *why* layer attached. Ideal for training models that need to demonstrate expert-mode reasoning in a domain rather than learn it from scratch.

S-002Expert demonstrates, user replicates afterAudioScreenCAT 01 · expert demoMode A

Illustrative scenario

Experienced eBay seller listing a high-value collectible across eBay, Google Lens for comps, Canva for photo edits, PayPal, and Gmail, then watching a first-time seller list a comparable item.

An expert performs the task once with the learner observing in silence, then the learner attempts the same task while the expert watches and only corrects at the end. The two-pass structure isolates the gap between watching competence and reproducing it, and reveals which steps experts execute without conscious thought.

This is the sample to commission when the research question is about the gap between observing skill and exercising it. Useful for instruction-tuning models that need to convert a worked example into a usable procedure.

S-003Expert teaches by deliberate failureAudioScreenCAT 01 · expert demoMode A

Illustrative scenario

Experienced Reddit moderator deliberately running a broken sub-launch workflow across Reddit, Discord, Imgur, Google Forms, and Canva, watching it fail, then re-running it correctly so the user sees both outcomes.

An expert demonstrates the wrong way to perform a task first, lets the learner see the failure mode, then re-runs the workflow correctly. Comparative reasoning is the central artefact: *the right way* is articulated explicitly against *why the wrong way is wrong*.

This is the sample to commission when the research question is about contrastive reasoning. The two-version structure provides natural preference pairs grounded in real expert judgement rather than synthesised pairings.

S-004Expert plans, user executesAudioScreenCAT 01 · expert demoMode A

Illustrative scenario

Experienced renter verbally mapping out how to find and apply for a competitive rental across Realestate.com.au, Domain, Google Maps, Gmail, and a budgeting spreadsheet, then the user executes the plan.

The expert lays out a verbal plan up front, *here's how I'd tackle this*, before any tool is touched. The learner then attempts to execute. The structure separates strategy from tactics, capturing the planning layer that gets compressed or skipped in most session formats.

This is the sample to commission when the research question is about expert planning behaviour. Particularly useful for agentic-reasoning training where the gold path needs an explicit plan attached, not just the trajectory.

S-005Expert troubleshoots a user who is already stuckAudioScreenCAT 01 · expert demoMode A

Illustrative scenario

Tech-savvy friend joining a session where a user has a broken self-hosted blog across WordPress.com, Cloudflare, Namecheap DNS, Gmail, and a free SSL provider, diagnosing and fixing the misconfiguration.

The user starts the session already in a broken or confused state. The expert must first diagnose what went wrong, then fix it, then explain how to avoid the same failure in future. Three distinct cognitive moves are captured: diagnosis, repair, and prevention.

This is the sample to commission when the research question is about debugging reasoning, not just task reasoning. Models trained on clean trajectories often fail when handed a broken state; this captures the unfailed step where an expert reads the broken state correctly.

S-006Expert challenges a competent userAudioScreenCAT 01 · expert demoMode A

Illustrative scenario

Experienced budget traveller challenging a competent traveller to replan a trip after a missed connection across Skyscanner, Google Flights, Booking.com, Gmail, and Google Maps, with hard time and cost constraints.

The learner is moderately skilled, not naive. The expert deliberately pushes them with harder variants and edge cases they would not normally encounter. Expert reasoning at the upper end of difficulty rather than the entry level.

This is the sample to commission when the research question is about expert behaviour at the difficult end of a domain. Synthetic data tends to model entry-level competence; this captures the part most other corpora miss.

S-007Expert helps user adapt a known process to a new contextAudioScreenCAT 01 · expert demoMode A

Illustrative scenario

Experienced eBay seller helping a confident Facebook Marketplace seller move their workflow to eBay across eBay, PayPal, Google Photos, Gmail, and a shipping calculator.

The learner knows how to do the task in their usual environment but now needs to do it in an unfamiliar one: a new tool, a new jurisdiction, a new constraint. The expert bridges the gap. Transfer reasoning rather than first-time learning.

This is the sample to commission when the research question is about adaptation and generalisation. Trains models on the specific cognitive work of mapping a known method to an unfamiliar context.

S-008Expert helps user recover from a mistake mid-taskAudioScreenCAT 01 · expert demoMode A

Illustrative scenario

Experienced eBay seller helping a user who has just listed 30 items with the wrong shipping settings across eBay, PayPal, Gmail, a spreadsheet of listings, and a shipping calculator.

The user makes an error partway through the task and must decide whether to undo, repair, or start over. The expert helps them triage. Sunk-cost reasoning is captured under real time pressure, against real consequences for getting the call wrong.

This is the sample to commission when the research question is about recovery and triage reasoning. The decision shape itself, not the eventual outcome, is the signal.

S-009Expert helps user complete a task they will need to repeatAudioScreenCAT 01 · expert demoMode A

Illustrative scenario

Experienced tax filer walking a first-time sole trader through their MyGov / ATO online tax return alongside a Google Sheet of expenses, bank statement PDFs, a receipts folder in Google Drive, and Gmail, framed so they can do the next four returns alone.

The framing shifts from *let's get this done* to *let's make sure you can do this next time without me*. The expert deliberately holds back, prompts the learner to recall, and builds in reinforcement loops. Pedagogical reasoning rather than task reasoning.

This is the sample to commission when the research question is about teaching behaviour. Captures the specific cognitive shape of scaffolding, intentionally distinct from just completing the work efficiently.

S-010User drives, expert guides on requestAudioScreenCAT 02 · naive learningMode A

Illustrative scenario

First-time renter applying for a competitive rental property across Realestate.com.au, Domain, Gmail, a payslips PDF, and an ID verification portal, pulling in a long-time renter only when stuck.

The learner attempts the task and only calls on the expert when they get stuck. The expert stays silent the rest of the time. The session captures the natural confusion-articulation-intervention-resolution loop that drives most real-world help-seeking behaviour.

This is the sample to commission when the research question is about how naive users actually ask for help, in their own words, when they hit a wall. Particularly valuable for help-system and assistant-behaviour training.

S-011User drives, expert intervenes proactivelyAudioScreenCAT 02 · naive learningMode A

Illustrative scenario

First-time investor opening a brokerage account and placing initial trades across CommSec / Robinhood, a banking app, an ID verification portal, Google for research, and Gmail, with an experienced investor intervening before each common slip.

The expert watches the learner work and intervenes *before* failure rather than after, predicting where the learner is about to go wrong. The expert's predictive model of novice mistakes is made explicit through pre-emptive correction.

This is the sample to commission when the research question is about how experts anticipate user error. The intervention shape, what triggers it and what gets corrected, is the signal.

S-012User drives, expert narrates over the topAudioScreenCAT 02 · naive learningMode A

Illustrative scenario

Experienced recruiter narrating live as a first-time hiring manager sources candidates across LinkedIn (free), Gmail, Google Calendar, a Google Docs job description, and Google Forms for screening.

The learner performs the task. The expert provides continuous running commentary on what the learner is doing right and wrong, like a coach calling a game. High-density signal because the expert is articulating judgements continuously rather than waiting for explicit decision points.

This is the sample to commission when the research question requires a near-continuous expert reasoning track. Most session formats produce sparse reasoning; this format produces it densely across the whole session.

S-013User plans, expert validates or correctsAudioScreenCAT 02 · naive learningMode A

Illustrative scenario

First-time investor talking through their intended ETF rebalancing plan across CommSec / Vanguard's free portal, a Google Sheets portfolio tracker, the ATO website for CGT, ChatGPT, and Gmail, with an experienced investor stress-testing the plan before any trades.

The learner explains their intended approach before doing anything. The expert challenges, refines, or approves. Then the learner executes. The format captures expert reasoning about whether a plan will work *before* it is tested against reality.

This is the sample to commission when the research question is about plan critique. The judgement shape is distinct from execution shape, and most data conflates them.

S-014Expert reviews user's unmoderated sessionAudioScreenCAT 03 · expert reviewMode A

Illustrative scenario

First-time investor completing an ETF portfolio setup alone across CommSec, Vanguard's free portal, a Google Sheets tracker, the ATO website, and Gmail, then a financial advisor reviewing the recording and output asynchronously.

The user completes the task alone, then submits the recording and any artefacts produced. An expert reviews asynchronously, walking through the finished output, scrubbing the recording with running commentary, or reverse-engineering intent from the artefact alone. The user's behaviour is uncontaminated by the expert's presence.

This is the sample to commission when the research question requires both an unmoderated trace of user behaviour *and* expert evaluative reasoning over it. The async structure means neither contaminates the other.

S-015Expert reviews multiple users tackling the same taskAudioScreenCAT 03 · expert reviewMode A

Illustrative scenario

Senior copywriter reviewing five independent users each writing a cold outreach email across Gmail, Google Docs, ChatGPT, LinkedIn (free), and a Google Sheets prospect list.

Several independent users attempt the same task. An expert reviews all of them and compares them. Cross-session reasoning rather than single-session review. The format surfaces what experts treat as common pitfalls versus what they treat as idiosyncratic mistakes.

This is the sample to commission when the research question is about the *distribution* of user behaviour on a fixed task, viewed through an expert's pattern-recognition lens.

S-016Expert grades and ranks user attemptsAudioScreenCAT 03 · expert reviewMode A

Illustrative scenario

Experienced editor grading and ranking three users' independent attempts at a short essay across Google Docs, ChatGPT, Grammarly free, and a research sources list in Google Drive.

An expert applies an explicit rubric to user attempts and produces a ranking with justifications per criterion. Explicit rubric articulation is a distinct cognitive task from open-ended critique: the expert must surface and defend the standards they would normally apply implicitly.

This is the sample to commission when the research question is about graded judgement against explicit criteria. Particularly relevant for reward-model training where the rubric structure matters as much as the rating.

S-017Expert rewrites the user's approachAudioScreenCAT 03 · expert reviewMode A

Illustrative scenario

Senior copywriter rewriting a user's email newsletter draft across Gmail, Google Docs, Mailchimp free tier, Canva, and ChatGPT, alongside the user's original version.

The expert produces their own version of the task rather than just commenting on the user's attempt. Expert output, not just expert critique, is captured, anchored against a real user attempt as the reference point.

This is the sample to commission when the research question needs both a reference output and a parallel expert output on the same task. The pairing is the signal.

S-018Expert reviews and the user respondsAudioScreenCAT 03 · expert reviewMode A

Illustrative scenario

Senior designer reviewing a junior's logo concepts across Canva, Figma free, a Google Slides moodboard, Pinterest, and Gmail, with the junior responding to each note before resubmission.

Two-party asynchronous dialogue. The expert leaves review notes; the user responds in writing or video with clarifications, defences, or revisions. Captures the back-and-forth that single-expert review misses, including the rejection of expert feedback when the user has good reason.

This is the sample to commission when the research question is about dialogic review, where the user's defence of their choices is part of the signal.

S-019Multiple experts review the same user sessionAudioScreenCAT 03 · expert reviewMode A

Illustrative scenario

Three experienced freelance writers independently reviewing one new freelancer's pitch package across Gmail, Google Docs, LinkedIn (free), a portfolio site on Wix, and Calendly.

Multiple experts independently review one user session, then their reviews are compared. Multi-expert disagreement is the point. Reveals where the field itself does not agree on what good practice looks like.

This is the sample to commission when the research question is about contested expert judgement, where labelling a single answer would mislead the model.

S-020Expert reviews and predicts what happens nextAudioScreenCAT 03 · expert reviewMode A

Illustrative scenario

Experienced eBay seller reviewing the first half of a new seller's listing session across eBay, PayPal, Google Photos, Canva, and Gmail, predicting the next set of decisions before the reveal.

The expert reviews the first half of a user session and predicts what the user will do next, then the prediction is revealed against what the user actually did. The expert's predictive model is captured explicitly rather than inferred from post-hoc commentary.

This is the sample to commission when the research question is about expert anticipation rather than expert critique. The prediction-then-reveal mechanic produces structured, ground-truthed predictions of human behaviour.

S-021Without-AI versus with-AI comparisonAudioScreenText submissionsCAT 03 · expert reviewA + B

Illustrative scenario

User writing a long-form blog post across Google Docs, Grammarly free, Unsplash, WordPress.com, and Gmail, first unassisted, then re-doing a matched post with ChatGPT integrated into the same toolchain.

The same user completes the same (or carefully matched) task twice: once without AI assistance, once with. The pair captures where AI changes the workflow, what it replaces, and what it leaves unchanged. Mode A captures both sessions; Mode B captures the AI interactions inside the second one.

This is the sample to commission when the research question is about the *effect* of AI on workflow rather than about AI use in the wild. The matched-pair structure gives controlled comparison rather than naturalistic distribution.

S-022Multi-expert prioritisationAudioScreenCAT 04 · multi partyMode A

Illustrative scenario

Three experienced indie game developers ranking 10 candidate features for their next game across a shared Google Doc, a public Trello board, YouTube competitor footage, Steam reviews, and Discord.

Two to five experts in the same domain work from a list of options and argue toward a ranked priority order. Disagreement is expected and the resolution process is the point. Run as two stages: independent ranking first with think-aloud reasoning, then group discussion focused on divergence points, with optional revision at the end.

This is the sample to commission when the research question is about how experts weigh contested options against each other. The two-stage structure separates individual judgement from group dynamics.

S-023Multi-expert debate on a contested questionAudioScreenCAT 04 · multi partyMode A

Illustrative scenario

Three experienced personal finance creators debating whether buying or renting is currently the better decision in their city across Realestate.com.au, the RBA's public rates page, a Google Sheets model, YouTube comparison videos, and a shared Google Doc of assumptions.

Two or more experts who hold genuinely different positions on a contested question argue the case for their view. No resolution is required, the disagreement itself is the output. Recruitment screens for opinion strength to ensure the disagreement is real, not staged.

This is the sample to commission when the research question is about contested terrain where the field disagrees. The absence of a 'correct' answer is the feature, not the bug.

S-024Multi-expert consensus buildingAudioScreenCAT 04 · multi partyMode A

Illustrative scenario

Four experienced parents reaching a single agreed recommendation for choosing between two primary schools across the schools' public websites, MySchool.edu.au, Google Maps, a shared Google Doc of criteria, and Facebook parent groups.

A group of experts must reach a single agreed answer or recommendation on a complex question. The negotiation, the concessions, and the reasoning shifts along the way are captured. Two-stage: independent positions first, then group negotiation toward consensus.

This is the sample to commission when the research question is about consensus-building behaviour. The reasoning shifts under social pressure are part of the signal, not noise.

S-025Multi-expert estimation under uncertaintyAudioScreenCAT 04 · multi partyMode A

Illustrative scenario

Five experienced home renovators independently estimating the cost and timeline of a kitchen reno from a brief and photos, using Bunnings' public pricing, a Google Sheets cost calculator, Houzz references, Pinterest, and a shared Google Doc.

A group of experts independently estimate something (cost, timeline, probability, severity), then compare and discuss the differences. Essentially the Delphi method. Two-stage mandatory: independent estimates first, then reveal and discuss, then optional revised estimates.

This is the sample to commission when the research question is about calibration of expert estimates and the social dynamics of revision. The structured comparison reveals where experts cluster and where they spread.

S-026Multi-expert case reviewAudioScreenCAT 04 · multi partyMode A

Illustrative scenario

Three small business mentors independently reviewing the same struggling local cafe's situation, presented as a Google Doc brief plus the cafe's public Instagram, Google Reviews, the cafe's website, and an Uber Eats listing, then comparing recommendations.

Two or more experts are presented with the same case, problem, or artefact and asked to independently diagnose, evaluate, or recommend, then compare. Common pattern in medicine, law, engineering review. Two-stage: independent diagnosis first with reasoning narrated, then group comparison focused on divergence.

This is the sample to commission when the research question is about reasoning convergence and divergence on the same input. Tightly comparable expert opinions on identical material.

S-027Multi-expert option generationAudioScreenCAT 04 · multi partyMode A

Illustrative scenario

Four experienced freelancers independently generating ways for a friend to monetise an existing personal audience, using public competitor pages, a shared Google Doc, ChatGPT, Stripe / Substack public pricing, and a Google Sheets revenue model.

A group of experts brainstorms approaches to a problem with the goal of producing as many distinct viable options as possible. The opposite of consensus building. Two-stage: independent option generation first, then group review to deduplicate and add hybrids.

This is the sample to commission when the research question is about creative range rather than judgement quality. Captures the breadth of expert imagination on an open problem.

S-028Adversarial expert pairing (red team / blue team)AudioScreenCAT 04 · multi partyMode A

Illustrative scenario

Two experienced investors assigned to attack and defend the decision to invest a lump sum in a single ETF right now, across a Google Sheets model, the RBA's public rates page, historical returns data on a free finance site, a shared Google Doc, and YouTube commentary.

Two experts are deliberately assigned opposing positions (one defending, one attacking) on a proposal, plan, or decision. The format captures the strongest counter-arguments experts can construct, even against positions they might privately hold.

This is the sample to commission when the research question is about adversarial reasoning quality. The assigned positions force expression of arguments that might otherwise stay implicit.

S-029Multi-expert critique of a third party's workAudioScreenCAT 04 · multi partyMode A

Illustrative scenario

Three experienced indie founders critiquing a real public startup pitch (from Y Combinator's public demo day) across the public deck, the company's website, their public Twitter / LinkedIn presence, Crunchbase free tier, and a shared Google Doc.

A group of experts collectively reviews someone else's work, plan, or decision and discusses what is good and bad about it. Disagreements between reviewers are where the signal lives. Two-stage: independent critique first with written notes, then group discussion that surfaces where critiques agreed and where they diverged.

This is the sample to commission when the research question is about evaluative consensus and divergence on a third party's output. Distinct from peer-to-peer review because the reviewers are not the work's authors.

S-030Expert panel answering naive user's questionsAudioScreenCAT 04 · multi partyMode A

Illustrative scenario

First-time home buyer asking a panel of three experienced buyers questions ahead of an upcoming auction, with shared access to the property listing on Realestate.com.au, Domain, Google Maps, comparable sales on a free property data site, and the auctioneer's public Instagram.

One or more naive users ask questions of a panel of experts on a topic. Experts may agree, disagree, or build on each other's answers. Captures how experts handle being challenged by someone outside their field, and how they coordinate (or fail to coordinate) on what to tell a non-expert.

This is the sample to commission when the research question is about expert-to-novice communication under panel dynamics. Different signal from one-to-one expert teaching because the experts watch each other.

S-031Brief writer to executorAudioScreenCAT 05 · handoffMode A

Illustrative scenario

Experienced YouTuber writing a detailed video brief in Google Docs, then a freelance video editor producing the final cut across DaVinci Resolve free, YouTube Studio, Google Drive for assets, a stock music site, and Gmail, without further contact.

The first participant produces a written brief or specification for a task. The second participant receives the brief and executes against it without further contact with the first. Captures how well intent transfers through written documentation alone, and where executors fill gaps, misinterpret, or push back.

This is the sample to commission when the research question is about specification quality and execution fidelity. The handoff gap is the signal: what the brief said versus what the executor did with it.

S-032Strategist to operatorAudioScreenCAT 05 · handoffMode A

Illustrative scenario

Experienced ecommerce operator writing a quarterly marketing strategy across Google Docs, Google Trends, a Google Sheets model, and competitor public sites, then a junior marketer operationalising across Shopify, Meta Ads Manager, Klaviyo free tier, Canva, and Gmail.

The first participant develops a strategy or approach for a problem. The second participant has to operationalise it. The format captures the translation step from strategic intent to operational action, where most workflows fail in practice.

This is the sample to commission when the research question is about strategy-to-execution translation. Particularly relevant for any model that needs to convert high-level plans into concrete steps.

S-033Planner to builderAudioScreenCAT 05 · handoffMode A

Illustrative scenario

Experienced renovator producing a kitchen reno plan across Pinterest, a Google Sheets budget, Bunnings' public site, Houzz, and Google Docs, then a builder implementing across Google Calendar, Bunnings online ordering, supplier websites, Gmail, and a photo log in Google Drive.

The first participant produces a plan or design. The second participant builds or implements from the plan. The format captures the gap between design intent and implementation reality, where the builder must reason about what the planner left ambiguous.

This is the sample to commission when the research question is about specification interpretation under genuine ambiguity. The builder's gap-filling decisions are the signal.

S-034Analyst to decision-makerAudioScreenCAT 05 · handoffMode A

Illustrative scenario

Experienced retail investor producing a buy / hold / sell memo across a Google Sheets model, free broker research reports, Yahoo Finance, the ASX public announcements page, and a Google Doc, then another retail investor making and logging the actual decision in a brokerage app and a personal investment journal.

The first participant analyses information and produces findings or recommendations. The second participant uses those findings to make a decision. The format separates analysis from action and captures the judgement layer that converts findings into commitment.

This is the sample to commission when the research question is about decision-making downstream of analysis. The decision is not the analysis; the gap between them is structurally distinct.

S-035Researcher to synthesiserAudioScreenCAT 05 · handoffMode A

Illustrative scenario

Research assistant gathering raw evidence across Google Scholar, Wikipedia, public government statistics portals, Reddit, and YouTube primary-source clips into a Google Drive folder, then a senior policy commentator synthesising a public op-ed in Google Docs.

The first participant gathers raw information from multiple sources. The second participant synthesises it into a usable output. The format captures the structural choices a synthesiser makes against raw evidence, often quite different from what the gatherer assumed they would do.

This is the sample to commission when the research question is about information synthesis under multiple sources. The synthesiser's selection and weighting decisions are the signal.

S-036Drafter to reviewerAudioScreenCAT 05 · handoffMode A

Illustrative scenario

Freelance writer drafting a long-form article across Google Docs, ChatGPT, Grammarly free, a research sources folder in Google Drive, and Gmail, then an editor reviewing in Google Docs with suggested edits and comments.

The first participant produces a draft. The second participant reviews and returns it with feedback. The format captures both the draft as a real artefact and the review as a structured response to it. Optional back-and-forth where the drafter responds to the review.

This is the sample to commission when the research question is about review behaviour grounded in real draft material. Distinct from open critique because the reviewer is responding to specific drafted choices.

S-037Multi-role pipeline (3+ participants)AudioScreenCAT 05 · handoffMode A

Illustrative scenario

Creator scoping a video idea in a Google Doc, scriptwriter writing the script in Google Docs, editor cutting in DaVinci Resolve free with assets in Google Drive, and a thumbnail designer producing in Canva, with the artefact moving through all four.

Three or more participants pass an artefact through sequential roles, each adding, transforming, or gating. Captures realistic cross-functional workflows where each handoff is a potential failure point, and the final artefact is the cumulative product of multiple decision-makers.

This is the sample to commission when the research question is about multi-hop workflows. The compound effect of sequential handoffs cannot be reconstructed from single-handoff data.

S-038Expert solo task with think-aloudAudioScreenCAT 06 · solo aloudMode A

Illustrative scenario

Experienced freelancer doing their quarterly bookkeeping across a personal bank export CSV, a Google Sheets ledger, Stripe / PayPal dashboards, the ATO Business Portal, and Gmail receipts, narrating each decision.

A single expert completes a real task end-to-end across multiple tools, narrating reasoning continuously. No second party present. Captures uncontaminated expert workflow at full speed with the reasoning layer attached.

This is the sample to commission when the research question is about how an expert actually works when nobody is watching. The reasoning narration is the signal; the absence of an audience is the production condition that makes it real.

S-039Cross-application research and synthesisAudioScreenCAT 06 · solo aloudMode A

Illustrative scenario

Independent journalist researching a feature article across Google Search, Wikipedia, public court records, X / Twitter for primary sources, and a Google Doc for synthesis.

A single participant gathers information across multiple sources and synthesises it into a deliverable, narrating choices of tool and source. The format captures both the search behaviour and the synthesis layer that converts search results into a usable artefact.

This is the sample to commission when the research question is about research-and-synthesis workflows. Distinct from search behaviour because the synthesis is the point.

S-040CRM-to-output workflowAudioScreenCAT 06 · solo aloudMode A

Illustrative scenario

Freelance consultant preparing a renewal proposal across a Notion CRM, Gmail thread history, a Stripe dashboard for billing context, a Google Slides proposal template, and DocuSign free tier.

A single participant completes a real workflow that spans a customer database, a productivity suite, and supporting tools. Includes the data lookups, the decisions, and the final action. The format captures the realistic shape of CRM-driven work where context lives in one tool and action lives in another.

This is the sample to commission when the research question is about workflows that span structured data and unstructured action. The cross-application reasoning is the signal.

S-041Complex booking and coordination across systemsAudioScreenCAT 06 · solo aloudMode A

Illustrative scenario

Frequent traveller booking a complex multi-leg personal trip across Skyscanner, Booking.com, Google Calendar, a visa application portal, and a hotel loyalty program, balancing real constraints.

A single participant coordinates a real-world plan across booking systems, calendars, and communication tools, with realistic constraints and trade-offs. The format captures multi-system orchestration where the participant must hold competing constraints in mind simultaneously.

This is the sample to commission when the research question is about real-world planning under constraint, where the cognitive load of juggling multiple systems is part of the work.

S-042Developer workflow across IDE, terminal, and browserAudioScreenCAT 06 · solo aloudMode A

Illustrative scenario

Indie developer adding a new feature to their open source side project across VS Code, the terminal, a public GitHub repo, public API docs, and a free Vercel deployment.

A developer completes a real coding task that spans local development tools, version control, browser-based documentation, and deployment interfaces. The format captures the full developer workflow rather than just the code, including the reasoning about when to consult docs, when to test, when to deploy.

This is the sample to commission when the research question is about developer behaviour at the workflow level rather than the code level. The tool-switching reasoning is part of the signal.

S-043Designer workflow from brief to artefactAudioScreenCAT 06 · solo aloudMode A

Illustrative scenario

Freelance designer taking a client brief through to a clickable prototype across Google Docs, Figma free, Pinterest, Unsplash, and Gmail feedback threads.

A designer takes a brief through to a working artefact, moving between reference gathering, design tools, and feedback channels. The format captures the realistic shape of design work where the artefact emerges through iteration against reference material, not in a single forward pass.

This is the sample to commission when the research question is about iterative design work. The reference-to-artefact loop is structurally distinct from linear task completion.

S-044Real human use of AI tools embedded in workflowAudioScreenText submissionsCAT 06 · solo aloudA + B

Illustrative scenario

Freelance writer building a long-form article across Google Docs, ChatGPT, Perplexity, Grammarly free, and Gmail, with natural decisions about when to ask the AI and when not to.

A real practitioner completes a real piece of their own work using the AI tools they normally use, narrating their reasoning as they go. The AI use is not staged. The participant decides, in the moment, when to ask the AI and when not to. The session captures both the work and the AI interactions inside it.

This is the sample to commission when the question is not *can an AI tool do this task* but *what does a competent professional actually do with an AI tool in their hands while doing their real job.*

S-045Recovery from a wrong path mid-taskAudioScreenCAT 06 · solo aloudMode A

Illustrative scenario

Hobbyist data analyst attempting to answer a personal question (e.g. spending patterns) first with a pivot table in Google Sheets, hitting a wall, and switching to a Python notebook in Colab with the same CSV.

The participant pursues a task approach that does not work, recognises the failure, and recovers by choosing a different path. The format captures the moment of detection and the reasoning about how to recover, both of which are usually invisible in clean task completion data.

This is the sample to commission when the research question is about self-correction at the workflow level. The decision shape, when to abandon an approach versus push through, is the signal.

S-046Two-practitioner shared task with role asymmetryAudioScreenCAT 07 · collab problemMode A

Illustrative scenario

A freelance copywriter and a freelance designer jointly producing a landing page for a small business across Google Docs, Figma free, Canva, Unsplash, and Gmail.

Two practitioners with complementary expertise work on a shared problem. Each brings a different perspective; the resolution requires both. The asymmetry is structural rather than incidental.

This is the sample to commission when the research question is about complementary-expertise collaboration. The handoff and integration between perspectives is structurally distinct from peer collaboration.

S-047Genuine disagreement and live resolutionAudioScreenCAT 07 · collab problemMode A

Illustrative scenario

Two experienced indie developers genuinely disagreeing on the right architecture for a shared open source side project, working through to a single decision across a public GitHub repo, a Google Doc RFC, a free whiteboard tool (Excalidraw), Stack Overflow, and Discord.

Two or more participants surface a genuine disagreement on a shared problem and work through to a resolution. The disagreement is real, screened for during recruitment, not staged.

This is the sample to commission when the research question is about live disagreement resolution in collaborative work. The resolution path is the signal; the disagreement is the precondition.

S-048Distributed problem decomposition across rolesAudioScreenCAT 07 · collab problemMode A

Illustrative scenario

Three friends decomposing the planning of a group overseas trip across a shared Google Doc, Google Sheets budget, Skyscanner, Google Maps, and WhatsApp, each owning a sub-area and reconvening to integrate.

A group decomposes a shared problem into sub-problems, divides the work, executes in parallel, and reconvenes to integrate. Captures realistic distributed work where the decomposition decision shapes the eventual outcome.

This is the sample to commission when the research question is about distributed parallel work. The decomposition and integration steps are structurally distinct from the parallel execution itself.

S-049Synchronous co-creation of a shared artefactAudioScreenCAT 07 · collab problemMode A

Illustrative scenario

Two hobbyist designers pair-designing a personal portfolio website in Figma free, with shared access to public reference sites, Unsplash, Google Fonts, and a Google Doc of feedback.

Two or more participants build a single artefact together in real time, with active hand-off, negotiation of choices, and live revision. Synchronous co-creation rather than asynchronous handoff.

This is the sample to commission when the research question is about real-time collaborative production. The negotiation embedded in the act of creating is the signal.

S-050Conversational explanation under genuine misunderstandingAudioScreenCAT 08 · asymmetricMode A

Illustrative scenario

Experienced personal finance creator explaining how superannuation co-contributions actually work to a young earner who has internalised a wrong mental model from TikTok, adjusting in real time as the misunderstanding surfaces.

An expert explains a concept to a learner who has a real, pre-existing misunderstanding, and adjusts the explanation as the misunderstanding surfaces. Recruitment screens to ensure the misunderstanding is genuine, not performed. Conversation only, no interface used.

This is the sample to commission when the research question is about how experts diagnose and correct learner mental models in real time, in dialogue alone. Particularly relevant for AI assistant tuning on explanation behaviour.

S-051Practitioner-led problem framingAudioScreenCAT 08 · asymmetricMode A

Illustrative scenario

Experienced indie developer verbally framing how to think about a beginner's idea for a side project, narrating which questions to ask first, what to defer, and why.

A domain expert and a novice work together verbally to define and frame a problem the novice has brought to the session. The expert narrates choice-making throughout: which questions to ask first, what to defer, what to ignore, why. No tool driving, just the framing layer.

This is the sample to commission when the research question is about expert problem-framing reasoning, the cognitive layer that sits before any tool is touched. Most data captures expert execution; this captures expert orientation.

S-052Asymmetric design or work critiqueAudioScreenCAT 08 · asymmetricMode A

Illustrative scenario

Experienced freelance writer critiquing a junior writer's draft article (referenced from a shared Google Doc, but not edited live), with the writer defending choices and incorporating feedback verbally.

A senior practitioner critiques a junior's work in progress, with the work owner defending decisions and incorporating feedback live. Conversation-focused; the artefact is referenced but not actively built.

This is the sample to commission when the research question is about evaluative dialogue rather than evaluative monologue. The defence layer from the work's author is part of the signal.

S-053Real-time second-opinion consultAudioScreenCAT 08 · asymmetricMode A

Illustrative scenario

Experienced home renovator consulting a builder friend by phone mid-decision on a structural change, both referencing the same photos and rough sketch shared via WhatsApp.

A primary practitioner consults a peer or senior expert mid-task for a second opinion. Both participants engage with the same artefact verbally, but the consult itself is conversational rather than action-driving.

This is the sample to commission when the research question is about consultation dynamics, the specific shape of asking for and giving a second opinion under time pressure.

S-054Same-question, reframed second attemptAudioScreenCAT 08 · asymmetricMode A

Illustrative scenario

Experienced personal finance creator talking through advice for a couple framed as a 'how do we save for a house' question, then approaching the same underlying situation framed as a 'when should we have kids' question.

The participant tackles a problem once, then is given a reframed version of the same problem and tackles it again. Captures whether and how the framing changes the approach.

This is the sample to commission when the research question is about how problem framing shapes expert reasoning. The paired structure isolates the framing effect from the underlying problem.

S-055AI-moderated interviewAudioScreenText submissionsCAT 09 · ai moderatedA + B

Illustrative scenario

AI moderator running a 45-minute structured interview with an experienced YouTuber about how they make thumbnail decisions, followed by another experienced creator annotating the transcript for where the participant's answers reveal real reasoning versus rehearsed answers.

An AI moderator conducts a structured 45-minute interview with a participant on a defined topic. Conversation only, no interface driven. After the session, a separately recruited expert annotates the transcript, marking where the participant's answers reveal real reasoning versus rehearsed surface responses.

This is the sample to commission when the research question is about high-volume structured qualitative data without a human moderator's overhead. The expert annotation layer compensates for the lack of live human judgement.

S-056AI-moderated interview with screen shareAudioScreenText submissionsCAT 09 · ai moderatedA + B

Illustrative scenario

AI moderator running a think-aloud session with an experienced freelance bookkeeper doing personal-side bookkeeping across a personal bank CSV, Google Sheets, the ATO Business Portal, and Gmail, followed by another experienced bookkeeper annotating the recording.

An AI moderator conducts a structured session while the participant shares their screen and demonstrates work. The AI prompts for narration and clarification across the workflow. After the session, an expert annotates the resulting recording.

This is the sample to commission when the research question requires both a real workflow capture and structured AI-led prompting to surface reasoning, with expert annotation for quality control.

S-057AI-moderated post-task review with annotationAudioScreenText submissionsCAT 09 · ai moderatedA + B

Illustrative scenario

AI debriefing a freelance writer after they finished a real article draft across Google Docs, ChatGPT, and Gmail, then a senior editor annotating where the debrief missed important reasoning the writer skipped past.

An AI moderator debriefs a participant after they have completed a real task, structured around the artefact they produced. An expert then annotates the AI-led debrief, marking where the debrief missed important reasoning the participant skipped past.

This is the sample to commission when the research question is about retrospective reasoning surfacing. The AI conducts the debrief at scale; the expert annotation reveals where the AI's questioning was insufficient.

S-058Native-language solo taskAudioScreenCAT 10 · multilingualMode A

Illustrative scenario

Native Portuguese-speaking freelance writer in Brazil completing a real article draft in Portuguese across Google Docs in Portuguese, ChatGPT in Portuguese, Wikipedia in Portuguese, Gmail, and a Portuguese-language research site.

A participant completes a task entirely in their native (non-English) language across native-language tools. The format captures language-specific workflows, terminology, and tool behaviour that English-only sessions miss. Optional aligned English translation is delivered as a separate output.

This is the sample to commission when the research question is about how work actually happens in a target language, not how it is described post-hoc in English. The language-specific tool behaviour is part of the signal.

S-059Native-language multi-party discussionAudioScreenCAT 10 · multilingualMode A

Illustrative scenario

Three native Mandarin-speaking personal finance content creators in Shanghai debating saving versus investing strategies in Mandarin across a shared Tencent Docs doc, Weibo, Bilibili, a Chinese-language brokerage public site, and a Mandarin-language Google Sheet.

Two or more participants conduct a discussion (debate, prioritisation, consensus building, etc.) entirely in a non-English language. Captures multi-party reasoning in the target language with native participants on native ground.

This is the sample to commission when the research question is about multi-party reasoning in a non-English language. Translation of English transcripts is not equivalent.

S-060Bilingual session with code-switchingAudioScreenCAT 10 · multilingualMode A

Illustrative scenario

Two Filipino-English freelancers in Manila working together on a client project across Google Docs, Canva, Gmail, a public Trello board, and Messenger, code-switching naturally between Tagalog and English.

Participants who naturally code-switch between English and another language during real work do so during the session. Captures realistic mixed-language professional reasoning rather than enforced monolingual sessions.

This is the sample to commission when the research question is about realistic code-switching behaviour. The switching itself is the signal, and forced monolingualism would erase it.

S-061Translation-paired sessionAudioScreenCAT 10 · multilingualMode A

Illustrative scenario

One French-speaking and one English-speaking freelance writer each completing a matched short article in their own language across Google Docs, Wikipedia, Gmail, and free image sites, with paired outputs.

An identical (or carefully matched) task is run with one participant in English and another in a target language. The outputs are paired for translation and localisation research, with the matching structure preserved across both sessions.

This is the sample to commission when the research question is about cross-language comparison on controlled tasks. The matching is what enables translation-quality and localisation work.

S-062Expert demonstration in non-English languageAudioScreenCAT 10 · multilingualMode A

Illustrative scenario

Native Spanish-speaking experienced freelance writer in Mexico training a new freelancer in Spanish on how to land first clients across LinkedIn (free, in Spanish), Gmail, Google Docs, Canva, and a Spanish-language freelance platform.

Any expert demonstration format (training a learner, deliberate failure, plan-then-execute) is run end-to-end in a non-English language with bilingual capture. Standard expert demonstration session structure with language overlay.

This is the sample to commission when the research question is about expert-mode reasoning in a target language. Distinct from translation: the demonstration was never in English to begin with.

PROPRIETARY TOOLING

The extraction stack, built over eight years.

Every sample in the catalogue is produced through Askable’s audited platform — the same SaaS that 10,000+ research teams use daily, repurposed for LLM training-data extraction. Not consultants with spreadsheets. Code paths.

Recruitment, consent, capture, moderation, transcription, tagging, review, and delivery all happen inside one system, with one audit trail. That’s what makes the work productised — and what makes it hold up under scrutiny at scale.

AI-moderated sessions

230,121& counting

Research teams served

10,000+across enterprise & startup

Participant panel

50+ countries5,300+ cities · triple-verified

Show rate

97.8%63% above industry average

Operating since

20178 yrs of production engineering

T01 · Sourcing

Participant panel

Millions of identity-verified humans across 50+ countries. Triple-verified, paid, consenting. Any demographic, industry, or geography — including the niche audiences others can’t reach.

97.8% show rate · 5,300+ cities

T02 · Operations

Participant management engine

Scheduling, calendar sync, SMS & email reminders, identity verification, brief-specific consent, incentive payments, and automated replacements. The entire participant lifecycle, productised.

Global payouts · auto-replacements · consent ledger

T03 · Session

Video call platform

LiveKit-powered real-time A/V, screen-share, and multi-party rooms. Sessions recorded against the consented brief, encrypted in transit and at rest, multi-region routing for low-latency capture.

Multi-region A/V · screen capture · recording-native

T04 · Capture

Multi-method capture suite

Moderated interviews, AI-moderated interviews, remote and live mobile/website testing, prototype tasks, surveys, diary studies. Every modality is a code path on the same platform — mixable per brief.

10 capture methods · one platform

T05 · Unmoderated

Human-behaviour testing suite

Card sorting, tree tests, first-click tests, five-second tests, stack ranking, prioritisation. Structured, unmoderated tasks run at scale — producing clean, quantifiable behavioural signal alongside qualitative capture.

Unmoderated at scale · structured outputs

T06 · Processing

Data processing engine

Real-time transcription, speaker diarisation, semantic embedding, AI annotation, and evidence-linked output structuring. Every finding traces back to a timestamp, a quote, and a participant.

Multi-vendor pipeline · evidence-linked outputs

Every layer above runs inside the same audited platform. One identity system. One consent ledger. One audit log. One delivery pipeline. That’s productisation — not a description of how we work.

See the audit posture

WANT THE WHOLE LIBRARY?

The full catalogue, with raw artefacts, is delivered under partnership.

The public site shows the front of the card. The lab partnership unlocks the underlying capture: video, audio, transcripts, structured logs, attribution chains.

Talk to the lab →How we work