Market research

ART Forum 2012

One of my favorite conferences is the Advanced Research Techniques Forum (ART Forum) from the American Marketing Association (disclosure: I’m chairing the conference for 2012). At this year’s conference I was happy to announce that the 2012 conference will be:

June 24-27, 2012 in Seattle!

If you’re a researcher interested in the latest customer/marketing research innovations, please consider submitting your work. The CFP will be out soon, with abstracts due in late Fall. For ideas, check out the 2011 ART Forum program.

We also welcome suggestions for the conference: topics to include, tutorials you would like, or suggestions for speakers or future locations. You can find my contact info on the “about me” page, or on LinkedIn. We hope to see you in June in Seattle!

Tips on Learning R

Colleagues often ask me: “How can I learn R?”  Recently, I helped teach an “Introduction to R” class for the Advanced Research Techniques Forum ( So that’s one answer.  Here’s another:

Find an R-suitable project and force yourself to use it!  R is really a programming language, not a “statistics package” … and like any programming language, you can only learn it by using it to accomplish something.

What makes a project R-suitable?  I divide that into three groups:

1. Projects that need cutting-edge or custom statistical methods. R quite simply is the tool where new methods are developed first. If you need to try the latest in Bayesian, machine learning, classification, genomics, or similar areas: do it in R.

2. Processes that benefit from R’s language and object structure. This is why I started with the S language back in 1997: I needed to run hundreds of models and extract key information from them. If you need to bootstrap a process, or compare or iterate models, R is the place.

3. Something that you know quite well.  This is where R offers little attraction, but where you can leverage your knowledge. A frequency analysis you do every day; a regression model you run every month; a chart that you can make in 5 seconds in Excel – those are great places to replicate the work in R just to force yourself up the learning curve.

Note that groups #1 and #2 are the easiest and luckiest places to be: if nothing else does what you want (except complete custom code), then R is an obvious answer.  Group #3, choosing a problem you could solve elsewhere, is the most frustrating and requires enormous discipline. You’ll be questioning R every step of the way (“why can’t I just point and click?!”) … until something clicks and you discover the answer for yourself.  OTOH, #3 is the easiest place to start from the perspective of finding specific help for your task; if it can be done easily somewhere else, then a recipe has likely been developed for R.

There are scores if not hundreds of R books that can help you. If you use R for long, eventually you’ll own a shelf of them. Meanwhile, a great first book to learn and get stuff done is Paul Teetor’s R Cookbook ( 

But again, and most importantly: Pick a problem, use R to solve it, and stay with it until you’re done. Then repeat. R undoubtedly will frustrate you. It may take hours or even days for something that seems like it should be simple. Remember that you’re learning a new language, so progress should be slow. Yet every time you go through the process (choose, use, stick with it) you’ll know more and will work faster and better. Good luck!

Rcbc utilities for discrete choice models in R

I just posted my “Rcbc” code (Rcbc R scripts), which demonstrates some core functionality for choice-based conjoint analysis (CBC) in R. This code is in development, and represents a package-in-progress. [For those new to CBC, it allows one to determine user preference and tradeoffs among products or product features using a variety of logistic regression.]

The Rcbc code is primarily useful for didactic purposes to show how conjoint models work and to show a relatively easy-to-understand gradient descent method for aggregate multinomial logit model estimation (MNL). It may help supplement commercial CBC software (e.g., Sawtooth Software) for some analytic tasks such as MNL estimation on subsamples, or determining attribute importance, or for getting data and design matrices into a simple format. Note that more complete R functionality for conjoint models is provided in the “bayesm”, “clogit”, and “mlogit” packages.

One novel and useful feature in my Rcbc package is a new attribute importance estimation function (cf. ART Forum poster on attribute importance).

I have not yet written a code vignette, but the code is reasonably well-commented and there are various executable walkthroughs presented inside “if false” blocks in the code. Note that there may be both large and small bugs!

To use: (1) save the file as a “.R” file. (2) source it in its entirety (warning: functions will go into global namespace). (3) read the code and try the examples.

Rcbc R scripts

Assessing persona prevalence empirically

I just obtained permission to post our latest paper on Personas. We argued previously that the personas method should not be considered to be scientific, and that a complete persona almost certainly describes few people or no one at all. In the new paper, we present a complete formal model, and evaluate the prevalence of “persona-like descriptions” with both analytical methods and empirical data. Full paper on persona prevalence.

There are two key implications here: (1) if you want to claim that a persona describes real people, you need strong multivariate evidence. (2) Without such evidence, we provide a formula you can use that will give a better estimate than simply assuming something. We show how this formula has a better than chance agreement with 60000 randomly generated persona-like descriptions in real data with up to 10000 respondents.

None of this says that personas are not inspiring or useful. It just says that they cannot be assumed to have verifiable information content, unless that is demonstrated empirically. As for alternatives to answer key design and business questions using empirical data, check out our paper on quantitative methods for product definition.


One of my papers from 2 years ago is still causing discussion: “The Personas’ New Clothes: Methodological and Practical Arguments against a Popular Method” by me and Russ Milham. Email from researchers I didn’t know led me to look up citations, and the article appears to be commonly cited when people present criticism of the personas method. Google search. The paper itself is here.

There are a few misunderstandings of our position out there. Our basic argument is simple. Persona authors often make two claims: (1) personas present real information about users; and (2) using personas leads to better products. In a nutshell, we argue that neither claim has been supported by empirical evidence; rather, the claims for personas’ utility are based on anecdotes, generally from their own authors or other interested parties (such as consultants selling them).

This does not mean that personas are bad, but they cannot be taken at face value. As researchers, we suggest that persona authors should either provide better evidence (and we suggest how) or make weaker claims.

Some persona users don’t make claims about their personas’ usefulness or correspondence to reality; they simply say that personas might be helpful for inspiration for some people or teams. We take no issue with that, as long as they don’t forget those caveats and reify the persona. Unfortunately it is probably very difficult for people to read a persona and not think that it describes a user group.

We’ve recently published empirical work on (quasi-)persona prevalence using several large datasets, demonstrating that once a description has more than a few attributes it describes few if any actual people. I’ll put that paper up as soon as I get reprint permission. (If you have access to HFES archives, it is “Quantitative Evaluation of Personas as Information”, Christopher N. Chapman, Edwin Love, Russell P. Milham, Paul ElRif, James L. Alford, from HFES conference 2008, New York.)

What should one do instead of personas? I advocate stronger empirical methods that have more demonstrable validity.

New papers on user research

Just uploaded 2 new papers on user research. First is work on a multi-factorial product interest scale, designed to be easily administered in survey format and applicable to consumer products. See the abstract on my “papers” page, or get the file directly: wip337-chapman.pdf

Second is an overview of quantitative methods that are helpful in early evaluation of product needs and strategy. The abstract is on my “papers” page, or the complete file is chapman-love-alford-quantitative-early-phase-ur-reprint.pdf

I’ll be uploading more papers soon.