Tips on getting started with analyzing qualitative data

Analyzing qualitative data for the first time can entail a good deal of confusion among novice researchers. Mystification about how to being analyzing data can be heightened when reading published presentations of research in which descriptions of what authors actually did with their data in order to produce “findings” are both limited and opaque. One of the reasons for the opaqueness of descriptions of the analytic process is that researchers have literally been thinking with data in an iterative and recursive process which literally goes back and forwards between data, theory, and other research. Describing this process in detail does not fit with what is expected in journal articles for logically organized and concise descriptions of research processes. In any event, there are plenty of ways to get started with the analysis of qualitative data if this is a new task. In this blogpost, I’ll talk a little about preliminary coding, since that is widely used among qualitative researchers.

Data analysis is theoretically informed, and any approach to analysis of data will depend on the theoretical approach used for a study. Although “coding” as a practice has received a good deal of critique of late (e.g., St. Pierre, 2011), it is one – but not the only — way into the data set to gain an initial sense of what is going on. First what is coding? Scholars have been defining what they mean by “coding” for decades. Here are several definitions put forward by scholars who have written about analyzing qualitative data:

“Codes are tags or labels for assigning units of meaning to the descriptive or inferential information compiled during a study. Codes usually are attached to “chunks” of varying size – words, phrases, sentences, or whole paragraphs, connected or unconnected to a specific setting” (Miles & Huberman, 1994, p. 56).

The process of coding, for Corbin and Strauss involves “deriving and developing concepts from data” (Corbin & Strauss, 2008, p. 65).

“Coding means naming segments of data with a label that simultaneously categorizes, summarizes, and accounts for each piece of data. Coding is the first step in moving beyond concrete statements in the data to making analytic interpretations”(Charmaz, 2006, p. 43).

“The essence of coding is the process of sorting your data into various categories that organize it and render it meaningful from the vantage point of one or more frameworks or sets of ideas.” (Lofland, Snow, Anderson, & Lofland, 2006, p. 200).

Put simply, “coding” is a process by which researchers synthesize the “meanings” demonstrated in data sources (whether these are transcripts of interviews, naturally occurring data, field notes, documents, or visual data) through the use of some sort of “label.” Labels may be derived from the data itself when participants’ own words are used (i.e., “in vivo”), may be applied by the researchers to sum up what is observed, or may be derived deductively from the broader research goals and literature informing the study.

Develop a data inventory

Marie Kondo helps us tidy our houses; data inventories help us take stock of what our data set encompasses. Data inventories can help us with organizing project materials. For those who like working with paper, this will mean that the data set will be printed (transcripts of interviews, field notes of observations, documents etc.), and organized in a way that makes data easily accessible. This is how I used to organize my data sets when I first started doing qualitative research. Now, I’m more likely to organize digital versions of project documents in folders on my computer. I typically use a password protected spreadsheet to keep track of the different data sources I have, along with the dates when data were generated and/or collected, and any transformation processes that have been entailed (e.g., transcription etc.). In this file, I might include the names of participants along with pseudonyms used.

When I write up a data inventory, it typically has the following elements:

  1. Researcher/s
  2. Study description
  • Research Purpose
  • Research Questions
  • Definition of Terms (if applicable)
  1. Study design and methods
  • IRB procedures (is this a pilot study with an approved IRB? A course-approved IRB? Data from another researcher’s IRB?)
  • Participants (How many? How were they were recruited? What criteria were used for sampling?)
  • Duration of study (When was the study conducted, and what was the duration of the study?)
  • Data Description (how much data do you have?)
  • transcripts of interviews/video (duration of interviews)
  • documents & archival material (List of documents; How many?)
  • field notes (condensed, expanded; how many pages?)
  • audio/visual materials
  • Research context: Describe the context of your study and settings as appropriate
  1. Appendices as applicable:
  • Chart with summary of data (no. of pages, date of collection, participants etc.)
  • Data sample (e.g. 1 full transcript; sample of archival data)


Start reading and re-reading

The only way to start data analysis is to begin reading and reviewing data collected. Researchers who transcribe audio- or video recordings of interviews or naturally occurring interaction are a good position, since they have already listened carefully to their audio files, and should have developed a good sense of what is found in the data source. When data sets are very large, it may help to do some initial “indexing” of audio or video materials. What this means is that an “index” or short description (rather than a transcription) is created of what is included in an audio- or video-recording, along with time stamps. This is useful for locating specific events or moments in a data source to review it during the analytic process or complete further transcription of selected interactions within the larger data set.

Preliminary coding

Coding as a process has been described for decades in numerous texts (Coffey & Atkinson, 1996; Huberman & Miles, 1994; Miles, Huberman, & Saldaña, 2014). Several challenges await the novice researchers doing initial coding. First, there is “no one right way” to do preliminary coding. Second, there are all sorts of approaches to code the same set of data, and some researchers have developed coding schema that might be applied by others. For example, Bogdan & Biklen’s (2003, pp. 162-168) schema includes the following coding categories:

  1. Setting/Context
  2. Definition of the situation
  3. Perspectives of the subjects
  4. Participants’ ways of thinking about people & objects
  5. Process
  6. Activities
  7. Events
  8. Strategies
  9. Relationships and social structure
  10. Narrative
  11. Methods

Third, coding is used by researchers who describe their process as “thematic analysis”(Braun & Clarke, 2006) along with those who espouse “grounded theory” approaches (Charmaz, 2014; Corbin & Strauss, 2015; Glaser & Strauss, 1967). Although the processes have broad similarities, grounded theorists have detailed specific approaches to develop “grounded theory” which are not typically used in thematic approaches to analysis.

Develop a code dictionary

When starting out, it can help to track what is going on by developing a “code dictionary” in which each of the labels or codes used is defined. That means, writing out the parameters by which any particular code is applied to data – i.e., inclusion and exclusion criteria. Here, it will help to also include an excerpt of data to illustrate how the code has been applied. I find the following format to keep track of the initial coding helpful:

Code Code definition Illustrative excerpt

Once initial codes have been applied, then it’s possible to reorganize these into some kinds of larger groupings. Since this is early in the analytic process, this process might be seen as more of a “trying out” phase. Consider if there are labels (or categories) that might be used to describe the group of preliminary codes. It is at this point that is really useful to start writing “memos.”

Write memos

I’ve written elsewhere about memo writing – this is a process where researchers start to write about the data, codes, and categories. Through memo writing, one can ask questions of the data, consider what is important in the data, and perhaps consider how the various codes might relate to one another. Here, one might also include an excerpt from the data set to write about. The key here is to write down one’s initial thoughts and meaning-making.

What I’ve described here are the initial processes that might be used to explore a data set. This is by no means the conclusion of the data analysis process. But where to next? I think the answer to that question depends on the particular project, what the researcher wants to accomplish, and will necessarily involve going back to the literature involved in developing the study, as well as literature that engages with the ontological, epistemological and theoretical perspectives that one takes in any particular study.

Here are a few other tips for analyzing data:

Managing fear and anxiety in inductive analysis of qualitative data

11 “tricks” to think with when analyzing data

Kathy Roulston


Braun, V., & Clarke, V. (2006). Using thematic analysis in psychology. Qualitative Research in Psychology, 3(2), 77-101. doi:10.1191/1478088706qp063oa

Charmaz, K. (2006). Constructing grounded theory: A practical guide through qualitative analysis. Thousand Oaks, CA: Sage.

Charmaz, K. (2014). Constructing grounded theory (2nd ed.). Los Angeles: Sage.

Coffey, A., & Atkinson, P. (1996). Making sense of qualitative data: Complementary research strategies. Thousand Oaks: Sage.

Corbin, J., & Strauss, A. (2008). Basics of qualitative research (3rd ed.). Los Angeles: Sage.

Corbin, J., & Strauss, A. (2015). Basics of qualitative research: Techniques and procedures for developing grounded theory (4th ed.). Los Angeles: Sage

Glaser, B. G., & Strauss, A. L. (1967). The discovery of grounded theory: Strategies for qualitative research. New York: Aldine de Gruyter.

Huberman, A. M., & Miles, M. B. (1994). Data management and analysis methods. In N. K. Denzin & Y. S. Lincoln (Eds.), Handbook of qualitative research (pp. 428-444). Thousand Oaks: Sage.

Lofland, J., Snow, D., Anderson, L., & Lofland, L. H. (2006). Analyzing social settings: A guide to qualitative observation and analysis (4th ed.). Belmont, CA: Thomson, Wadsworth.

Miles, M. B., & Huberman, A. M. (1994). Qualitative data analysis: An expanded sourcebook (2nd ed.). Thousand Oaks, CA: Sage.

Miles, M. B., Huberman, A. M., & Saldaña, J. (2014). Qualitative data analysis: A methods sourcebook (3rd ed.). Los Angeles: Sage.

St. Pierre, E. (2011). Post qualitative research: The critique and the coming after. In N. K. Denzin & Y. S. Lincoln (Eds.), The SAGE handbook of qualitative research (4th ed., pp. 611-625). Los Angeles: Sage.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s