Interviews with synthetic users. There’s been a lot of gung-ho on social media platforms like LinkedIn and Reddit around this subject, especially about its pros and cons.
But is it really the snake oil it’s been made out to be for UX and marketing professionals? Or is there actually some substance behind the idea? After all, the whole point of UX research is observing and interviewing real users, learning about their pain points, and spotting factors that make their experiences unique.
Traditional interviews are expensive, yes, but ultimately, they allow one to discover the nuances and complexities behind human interactions and emotions.
Generative AI tools like ChatGPT, which run on highly advanced large language models, can only create AI participants that mirror the collective average, lacking the diverse personalities and idiosyncrasies found in the common demographic.
Yet, for researchers who want to get past the blank page problem, or for MNCs facing organizational friction, synthetic research can indeed be the catalyst that gets things moving. The cost and time required, as compared to the traditional alternatives, are also quite minimal.
So, does this mean you can replace real people with AI-generated users in your market research studies? Or rely on synthetic interviews to make important decisions; decisions that could have a big impact on your business? Let’s look into it.
Before we get into synthetic interviews, or synthetic personas for that matter, we need to get into synthetic data. Synthetic data is not something recent; it has been widely used to train large language models once the original training data is exhausted.
As is apparent from its name, synthetic data is artificial data generated using artificial intelligence (AI). It isn’t collected from real sources, nor does it represent real-world data, but is rather derived from patterns found in the real world.
Now, what’s the appeal behind it? Well, it’s cheap and relatively easier to generate.
Real-world data is often prone to errors and biases, but synthetic data can be designed in a way that minimizes these errors. You can generate data that resembles your original dataset without sharing any sensitive or personally identifiable information, so you don’t have to worry about user privacy laws.
Also, it comes in handy when you’re dealing with niche audiences or in cases where the data available is incomplete or limited.
Everyone knows that it takes a lot of money and time to compile original market research from users and customers. This is one of the major reasons why big brands and small businesses are using synthetic data to gather valuable insights about their target audience today.
Synthetic respondents are trending in marketing and design research. Generative AI is being used everywhere, so it’s not a surprise that people in these fields (though not all) are leveraging it to create synthetic users.
Put simply, synthetic personas are digital avatars or virtual representations of your customers or users. They are not like the traditional personas developed by folks in marketing or design, but are built with AI and machine learning technologies.
The easiest way to create one is through LLMs like GPT-4o: start by describing your target customers, then specify their goals, challenges, and needs, and finally, add your product or service idea, which will serve as a solution to those problems. Once that’s done, prompt it to adopt the “persona” of this audience.
You can generate hundreds or even thousands of AI-generated personas that represent certain demographic groups and can accurately respond to your surveys and interviews. Since tools like ChatGPT are trained on diverse datasets that include written and spoken conversations from multiple sources and industries, they can mimic the behaviors and preferences of a range of users.
As such, you can use synthetic users to simulate various scenarios and run trials and tests before releasing your products or services to the market. They can process large volumes of data and provide feedback to support design and campaign iterations.
However, for all their uses, synthetic personas do come with a fundamental limitation: if you're dealing with a novel concept or trying to understand a new type of customer, like plantation workers in Colombia, they are likely to fall short.
As Christopher Roosen, an advocate for Human-Centered Design, explains:
“Generative AI is based on a capture of the internet… it stores the structure of all the internet’s biases, which includes biases away from richly presented people in all their colours, shapes and ideas and towards a very stereotyped presentation of people.”
AI struggles to represent people who’re underrepresented online, so when you ask it to generate personas for workers in Colombia, it may create generic stereotypes because the data it was trained on contains superficial portrayals of people like that. Your tools might miss key points, fall back on clichés, and lack the depth required to answer questions meant for new or marginalized groups.
Synthetic interviews are similar to user interviews; however, instead of posing questions to real people, you interact with synthetic participants (aka synthetic users), usually through text-based dialogue or survey-based formats, to gather user insights on various topics.
Now, anyone who has ever done research, whether it’s interviews or focus groups, knows how time-consuming and expensive it can be. After all, it takes a lot of planning, coordination, and trained professionals just to prepare the right questions. Plus, you need to talk to a lot of people to pinpoint the right goals and pain points.
Compared to this, synthetic interviews seem to offer a faster, more cost-effective alternative to carrying out conversations with your users or customers.
According to John Whalen in his lesson "Hands on with Synthetic Users: Customer Research's Future," synthetic interviews are ideal for:
It doesn’t matter the kind of questions you ask; synthetic users never hesitate.
But the generated data or output you get is not always perfect. These interview responses may answer the “what,” but not necessarily the “why.”
You miss the subtle context that only a human face or voice can provide – without any microexpressions, intonations, and body language. Synthetic interviews also struggle to capture the depth of human emotions, as written words often fail to include the contextual richness found via the traditional research process.
That’s why you must regularly evaluate and validate synthetic interviews against human responses. Why the hybrid approach? Because the interview data may be AI-generated, but the decisions you make based on them will have a real-world impact.
Scott Stevens and Michael Christel discussed the topic of synthetic interviews in 1998, in their paper “Synthetic Interviews: The Art of Creating a "Dyad" Between Humans and Machine-Based Characters.” Of course, the researchers from Carnegie Mellon focused on Synthetic Interviews, the technology developed by the university.
Their paper defined Synthetic Interviews as:
“A means of conversing in-depth with an individual or character, permitting users to ask questions in a conversational manner… and receive relevant, pertinent answers to the questions asked.”
The technology was supposed to be “life-like” in relaying information, in a way that reflects human thinking and thought processes. Users could ask questions to computer-generated (CG) personas via spoken or typed interfaces to understand their preferences, behavior, and values.
To make this happen, thousands of video clips of human actors were recorded and stored in a database. These actors were videotaped answering potential questions, along with extra non-verbal clips, like drinking coffee, scratching their head, or smiling, to make the characters feel more human-like.
These responses were then presented in a talking head format, so the users could feel like they were face-to-face with the personality. The CG personas also included habits unique to the personalities being portrayed, like Albert Einstein’s, to make them more authentic.
The interface allowed users to speak normally into a microphone and set the flow of the interview. The speech recognition system would analyze their questions using its existing LLMs and provide accurate results for relevant questions and acceptable results for unexpected questions (read the paper to understand what this means).
Back then, this technology was seen as a way to offer interactive experiences with actors, religious leaders, and other public figures.
This was before generative AI tools came around. Now, you can simply ask a model to "act like" a particular person, or in our case, a specific customer or user group, and it will instantly generate lifelike responses, minus the talking heads, of course.
Conducting user research without people sounds tough, but it’s not. Really. All you need is a generative AI tool, a good idea of who your target audience is, and the right set of prompts and questions.
Besides this, outline your research needs. What are your goals and use cases? What do you want to learn from this interview session? This will help you frame your interview questions. Decide how many synthetic users you want, in what format (text, survey responses), and how diverse their responses should be.
For demonstration purposes, we will create seven synthetic users for our latest product, Research Persona by Delve AI, using ChatGPT’s free version.
The first step is to upload a document containing details about our ideal user group – this helps limit respondents to a certain demographic. Then, we add the product description and specify the number of synthetic participants we want the system to generate using a well-worded prompt.
Once prompted, ChatGPT instantly generated diverse synthetic personas. Here’s an example of one named Dan Miller, VP of Marketing in an ecommerce company.
We ran the following interview questions (involving a mix of open-ended and survey-style questions) through each of these personas.
We asked the chatbot to summarize the interview responses into a clean, professional report, highlighting the problems or concerns they have with the product, what they like about it, and supporting quotes.
As expected, the system got most of the common problems users would associate with this kind of persona creation – generic output, privacy concerns, etc. This was followed by a summary of what they liked about the product, which, again, was not that bad.
This interview process left us with things that our potential users may care about, but nothing out of the ordinary. The responses were fast – the entire exercise took about 15 minutes – however, this data can only be used to create better, more specific questions for your actual users.
You can view the complete results, along with the prompts and questions used here.
Your interview data is only as good as the synthetic users the system builds. So if you want to generate synthetic interviews in this way, thoroughly review the customer data you’ve fed into the model for quality. Are there any biases or gaps? Are you including the right data to begin with? You’re training the system to mimic something, so make sure that “something” is actually what you want.
For LLMs to generate data that mirrors the complexity and nuance of real human responses, you need good prompts.
Remember: Great input data + well-crafted prompts = Quality interview output.
Once the synthetic data is generated, validate it before using it. Run manual checks and leverage validation tools to assess quality, consistency, and accuracy. Stakeholders are skeptical about synthetic data (and rightly so), so be transparent if you’re using AI-generated data in your user research reports.
Michael Mace compares Synthetic Users, a tool which uses generative AI to simulate customer interviews, to interviews with real people in his article “Can AI replace discovery interviews? A competitive comparison.”
If we have to discuss the cons of synthetic respondents and, in turn, synthetic interviews, his post provides a great starting point.
Mace conducted a study between Synthetic Users and UserTesting (which allows you to ask questions to real people from an online panel) using a novel idea – a rideshare service that uses flying cars.
The results of this study were not in favor of synthetic users.
Synthetic Users was definitely fast. You create the users, add the questions, and the responses are generated in minutes. In contrast, using UserTesting required an entire day – writing a plan, setting up screeners, waiting for responses, then analyzing the results. And even then, finding the right people with the traits you wanted was difficult.
The thing with Synthetic Users is, if you define a user with a specific problem, it won’t question whether that problem exists. It just assumes it does. So, you end up with hypothetical users who don’t exist in reality.
Which is a problem.
Even the transcripts generated were a little too perfect, with no filler words, repetitions, or incomplete sentences. They lacked the emotional cues UserTesting offered, since there were no videos or expressions to analyze.
Further, the synthetic participants sounded similar to one another, with near-identical answers. They felt basic, average, and disconnected from real life. Real users, though their responses were unpolished, brought more variety and detail.
Other than speed and a basic overview of your audience, this tool did nothing in the way of helping you learn more about the way your customers think and react. Which, unfortunately, is the problem with most synthetic research methodologies.
We’ve looked at the manual ways of conducting synthetic research; now, let’s check out the AI-powered ways. Delve AI’s Synthetic Research Software is one of the tools that helps you create AI personas for your users or customers, and leverage them to run surveys and interviews.
Currently, the research software includes three functionalities:
Built from first-party and public data sources, these virtual personas are scalable and diverse. You can generate any number of users and conduct as many quantitative and qualitative interviews as you’d like.
You need to create base personas before you develop simulated users. To do that, you have to subscribe to one of our persona products.
Currently, our AI-persona generator offers six of them:
In this case, we’ll build personas using the Research Persona tool mentioned earlier.
Sign up or log in to Delve AI, go to Research Persona, and upload your documents. You can include anything relevant, like interview transcripts, survey reports, industry news, or past user profiles.
Hit “Create Personas,” and Delve AI will develop personas based on the data you’ve provided. Don’t worry if you don’t have any research material; just add a brief description of your target audience along with some details about your product or business (optional), and we’ll take it from there.
In addition to your inputs, our platform draws on learnings from thousands of personas previously generated to create unique customer profiles. The output can be one or multiple personas, depending on your audience and use case. Each of these segments contains persona details, distribution, and journey maps.
Click PERSONA DETAILS, and you’ll see user demographics, lifestyle, career status, aspirations, factors influencing purchase decisions, psychological drivers (goals, motivations, needs), and core challenges.
This is followed by information about their preferred communication channels, social networks, brands, shopping websites, music, TV shows, movies, YouTube channels, podcasts, subreddits, influential resources, and more.
The DISTRIBUTION tab shows how your audience is distributed within a particular segment, broken down by channel, social network, age, gender, language, location, activity levels, and topics (both resonating and generic).
The last tab, CUSTOMER JOURNEYS, contains user journey maps divided into distinct phases, exploring your users’ goals, actions, problems, and thought processes.
Note: Your personas are automatically updated with fresh data each month, and you can also supplement them with additional research data.
Now that we’ve built the personas, let’s move on to the next step, i.e., generating synthetic users. Why is this necessary? Because you cannot conduct interviews without them.
To get started, go to the Synthetic Research dashboard and purchase the number of users you need – for example, 100. Then, create a panel with this group and give it a name, like Marketing Insights Group. This will help you organize and manage your audiences later.
You’ll be guided to a screen where you can select the persona(s) or persona product you'd like to use to generate synthetic users. Here, we’ll pick a segment from Research Persona named Edward Collins.
As shown below, the software has generated 100 synthetic users based on that specific user persona segment. Each user includes a “Start Chat” option, which you can use to interact with them (see Digital Twin for more information).
Our simulated personas are ready; it’s time to run user research surveys and interviews.
In the sidebar, click on Marketing Insights Group, then select Surveys from the dropdown. On the dashboard, click the “Create Survey” button. You’ll be prompted to enter your survey name (e.g., Product market fit survey), specify the number of users (e.g., 100), and upload a CSV file that contains your survey questions.
Your file can include various types of questions, such as multiple-choice, rating scale, Likert scale, open-ended, and ranking questions. It will take only a short amount of time for your responses to be generated.
Along with the results, you’ll be able to drill down into which respondents gave specific answers to each question and even ask them the reasons behind their choices using the chat functionality.
You can also ask them for feedback on other marketing and product-related subjects. In the sample above, we’ve asked a user what factors influenced their decision to complete a purchase when shopping online.
A major criticism of AI-generated research is that it lacks the authenticity, depth, and emotional nuances one typically gets from real interviews. This, and the concern that biases in the training data can potentially skew your research findings
As Niloufar Salehi writes in her article on synthetic users:
“The whole point of spending the time to interview people and then… analyzing the large amounts of data gathered is the ability to connect with them, build trust, dig deeper, ask them to share stories, and learn about their feelings and emotions. Pattern synthesis engines have none of those.”
However, for people or teams just starting out, especially those without the time or budget for extensive user research, synthetic interviews can be an attractive and accessible alternative.
Delve AI’s Synthetic Research Software does not recruit people. Or interview them.
But it does use the information you’ve gathered about your users to create look-alike audiences. These virtual users, built from personas generated using your first-party (CRM, web analytics), second-party (social audience data, competitor intelligence), and Voice of Customer data, are not something generic that ChatGPT whipped up.
They’re customized to your business use case and would likely give you a better understanding of your users and their core needs.
Synthetic interviews can be problematic, but only if the simulated users you’re running them on are generic, inaccurate, or biased. The quality of synthetic personas is directly related to the quality of responses you get; the better the quality, the better the responses.
So, the onus lies on you to select, or if you’re up to it, build a reliable synthetic research software (which will be resource-intensive).
Once you find one, you can leverage it to identify the obvious problems before you conduct actual interviews. Synthetic users are also good for brainstorming new ideas. Fresh perspectives or opinions that were already out there, but you didn’t know about them.
It goes without saying that at the stage they are right now, synthetic interviews should only supplement your research studies. Simulated users should never take precedence over real users. You can use them to test multiple scenarios, but always validate those results with your real research findings.
At the end of the day, it doesn’t matter the kind of users you choose; what matters are the questions you ask, or don't ask.
A synthetic user is a digital profile created to test out software, websites, or services. It acts like a real user, doing things like clicking, searching, or even buying stuff. This lets you see how things hold up under different conditions and spot any issues before real people use it, making sure everything runs smoothly for the actual users.
Synthetic interviews allow you to have in-depth conversations with the virtual avatars of your target audience. These characters are built with the help of AI and ML technologies and can mimic real customer behaviors and thought processes. So, you can simply upload a questionnaire or use an interactive dashboard to get relevant answers to your questions without involving real users.