The Methods: Evaluation Design and Data Gathering & Analytical Approaches
In Part II, building off of the program evaluation foundations, this webinar will focus on the methods of a program evaluation: design, data gathering strategies, and analytical approaches. We also encourage you to watch Part I: the Foundations and Part III: The Finale.
Welcome, everyone. Thank you so much for joining us for the second portion of our three-part webinar series on a formula for writing a grant proposal evaluation. We are thrilled that you decided to join us today and hopefully, you’ll gather a lot of great information from this experience. Before we get started, I just want to go over a couple of housekeeping items:
It’s important that you have an opportunity to ask questions so make sure that you use the question function on your computer to be able to do that. We have with us our intern, Alyce Hopes who will help us to moderate the Q & A portion that is dispersed throughout the webinar. You should also note that in the previous webinar we got a lot of really great questions and we were not able to harvest all those questions before closing out the system. We will make sure to grab those this time so that if we’re not able to answer your question during our session we can follow up, so make sure to continue to use this function.
Let me introduce myself – I am Lana Rucks; I am the principal consultant of The Rucks Group. The Rucks Group is a research and evaluation firm that gathers, analyzes, and interprets data to enable our clients to measure the impact of their work. We were formed in
2008 and over the past 12 years we have worked with a variety of clients primarily within higher education on grants funded by federal agencies such as the National Science Foundation, Department of Education, and the Department of Labor.
As I said last time, the overall purpose of this webinar series is really reflected in four objectives: First I want to organize the real diversity of evaluation terms around a framework. Then, taking that framework and leveraging it to provide a formula for writing evaluation plans. Hopefully, those two together will reduce the angst that’s so common around evaluation. And then, of course, I want to make sure that we answer questions.
So, let me provide a high-level recap of the first part of the webinar series, highlighting the concepts that will be relevant for today’s webinar:
Last time I introduced a framework for how to write an evaluation plan that’s organized around six elements. The first two I call “The Foundation” which we covered before and that involve the theory of change in the evaluation questions. The next two are “The Methods”, or evaluation design and data gathering and analysis which we’ll cover today. The last two are “The Finale” which we’ll cover in two weeks and involves the use of findings in the operational approach.
In introducing this framework, I argued that writing a small, moderate, or large-scale evaluation doesn’t vary in terms of the presence or absence of those core elements rather they vary in terms of the level of detail that’s provided around those core elements. For a small-scale evaluation, you’ll have less detail, maybe a sentence on a component, and on a large scale you may have a lot more detail, a paragraph, or more on an element. The analogy that I provided was that on a small scale you would push in on an accordion with less detail whereas on a large scale you would pull out with more detail like you would pull out on an accordion.
All that led to the first step in which I suggested that as part of the formula, is to summarize the theory of change in a sentence. I also said for moderate and large grants try to include a logic model at the proposal stage. In developing that sentence, we talked about thinking about the theory of change as an “If we do this, then we get that” sentence. So then pulling that out, the logic model would have that same type of information but in much greater detail. So, the “if” part would include the inputs, activities, and outputs while the “then” part would relate to the outcomes.
The next portion of the formula that I talked about was articulating the evaluation questions. In articulating the evaluation questions, again, what I argued was that it’s not about the presence or absence of the evaluation questions, it’s really about the number of evaluation questions that you’ll have by the size of the grant. I provided a couple of examples of that and I also provided some thought prompts around what we call formative evaluation which maps on the “if” part of the logic model and then I also provided some thought prompts for summative evaluation which falls on the “then” part of the logic model and the theory of change.
For the meat of today’s discussion, I first want to give this disclaimer that if there’s a part that is a little bit difficult or where confusion can arrive, it’s in this place. I often say that evaluation is similar to research and this is particularly the case when thinking about evaluation design. So, if this piece feels dense it’s because it is. Again, I will invite you to go back to the recording and review that your own case and of course reach out if there are any questions.
The third step of the formula is really about interpreting the findings that you’re going to obtain. You want to choose the evaluation design that is appropriate for the context and the level of resources that are available. There are a lot of different types of evaluation designs but to try to bound this conversation I’m going to focus on five that are probably most common within the evaluation space: experimental, quasi-experimental, a nonequivalent comparison group, pre-post single group, and a case study. In thinking about the evaluation designs I also just want to tease apart evaluation designs from the data gathering process, even though those two are intimately conducted. The design is really about how are you going to put the findings in context and be able to interpret what you obtain in terms of the findings. So, to understand the difference in these designs I going to take us back to our internal statistics or intro research methods class in which I’m sure you’ve heard this mantra before, “correlation does not imply causation”. What this mantra really means is that just because there’s a statistical correlation between two variables, does not mean that one is causing the other. This mantra is illuminated when considering what we call spurious correlations, or in which it’s clear that one variable is not causing another variable.
Let’s look at an example of this: There is a very nice, very pretty correlation between the divorce rate in Maine and the per capita consumption of margarine, but the divorce rate in Maine is not causing the consumption of margarine nor is the consumption of margarine causing the divorce rate. So, this idea reads to the concept of internal validity. Internal validity is the extent to which we can conclude that A is causing B. What is generally the problem in evaluation and research designs is that there’s a third variable that is actually responsible for that causal relationship that’s not being detected. Evaluation designs differ to the extent that that design approach addresses internal validity – the extent to which we can say with confidence that A is causing B.
So, in trying to all these elements together, let’s talk about each of those design elements and then let’s say “does that design element actually address internal validity?”; let’s consider the resources that will be needed to implement that design such as the expertise, the time, and that all fundamentally relates to the money and the cost of implementation; then let’s also talk about how participants are going to be selected to be involved in the initiative and as a consequence, a part of the evaluation design – what sample size would be needed, and whether or not that particular design is appropriate for the scale of evaluation
Let’s first look at an experimental and quasi-experimental design. Both of these design approaches address internal validity, however, both of them are high in terms of implementation resources. There is a lot of expertise that’s needed to be able to implement and to make sure that it’s implemented to fidelity and that can be really challenging. Another challenge, particularly of an experimental design, is how the participants are selected. In an experimental design, participants are randomly selected so some participants may receive an intervention while others may not. For educational professionals, sometimes that type of random assignment is problematic and so very often if some type of design that addresses internal validity is appropriate, a quasi-experimental is chosen instead of an experimental one. Again, like with quasi-experimental you require a large size and neither of these really is that appropriate for a smaller or moderate scale, more so for a large-scale type of initiative.
That brings us to the next option – non-equivalent comparison. A non-equivalent comparison does not address internal validity to the extent that experimental quasi-experimental does but it requires less in terms of the number of resources and you don’t have to do random assignment and it’s a really good option for a moderate to small-scale project or initiative because of the type of sample size that is associated with it.
I should note that fundamentally a non-equivalent comparison group differs from a quasi-experimental and when creating two different groups you’re much more able to address that third variable issue, either by how you’re designing the initiative or through additional kind of statistical analysis in terms of controlling those to that third variable and that’s not the case when you’re talking about a non-equivalent comparison group, but it’s still a really good option.
Now a pre/post single group is a good option when you may not necessarily have two groups – there’s not another group that you can compare against. In this type of situation, it addresses internal validity even less than in an equivalent comparison group, but the implementation resources are lower- you’re not having randomization and the sample size is smaller and so it’s appropriate for a moderate small initiative.
A case study can be another option, again doesn’t really address internal validity, implementation resources are low, participants are identified and not randomly assigned, it’s for small sample size and can be used for a small initiative as well.
I do want to make a distinction between two terms that we sometimes hear often which is a systematic evaluation and a rigorous evaluation– all evaluation should be systematic. One of the definitions that I gravitate to in thinking about evaluation revolves around using social science research methods to measure the effectiveness of social interventions. Using that definition, you can see how just by using those methods you’re going to be systematic in terms of completing the evaluation. With rigorous evaluation, even though it’s used in slightly different ways, rigorous really means the extent to which you’re addressing internal validity. Rigorous evaluation generally is referring to an experimental or quasi-experimental design. While all evaluations should be systematic not all evaluations will necessarily be rigorous – just want to make sure I made that distinction.
So, let’s go back through to another example:
We introduced this example in the first part of the webinar and this is a Department of Education initiative related to a project needing funding for the undergraduate international studies and foreign language program. In this design approach, we talked about possibly using the comparison group (the non-equivalent comparison group) if it was feasible. We’re really also saying that we will probably gather data through a pre/post design approach. I should also note in this example we talked about a mixed method design and you’ll often hear that as a term in terms of design approaches. It’s also very possible that in one part of the evaluation you may use one design approach and in another part of the evaluation we use a different design approach because of the accessibility to a comparison group. I should also say too that sometimes mixed methods are also used in terms of how the data actually gathered and what type of data you’re gathering, whether it be quantitative or qualitative. So, keep in mind that you can kind of mix some of the design approaches as well even though I’m kind of presenting them in a very kind of clean, pristine approach.
Another example is in regard to the Department of Education grant that we introduced last time in which the intention of the project was to remove some non-academic barriers to be able to increase graduation rates and course completion within the STEM arena. In this evaluation plan, we could have acknowledged that experimental or quasi-experimental would be ideal, but we also said that probably more feasibly, we would be able to have a non-equivalent group would be incorporated into the evaluation design.
Then, in this last example which was a large-scale initiative we actually did say we would have a rigorous quasi-experimental design to be able to put the findings in context. I should note there was a lot more detail that has to be included in your rating for quasi-experimental or experimental design and so this is again a place where this can all get very complicated and you may need to seek out some external expertise to provide support on that.
The take-home message here is to really use a design approach that’s going to help with the findings to help you to interpret the findings and that’s appropriate for the context the resources and the level of rigor.
Let’s go into the data gathering and analysis piece. Data gathering can occur through quantitative data, which is the data gathering that’s based on numbers obtained through some sort of measurement, or through qualitative data which is based on descriptions and other characteristics. I thought one item that may be helpful is to map on common data-gathering strategies or into the type of data that you’re interested in obtaining. Qualitative data can be obtained through Projects, local documents, interviews, and focus groups whereas quantitative data can be a team through national data like the Clearing House data, Bureau of Labor Statistics, organizational institutional data, or some type of specialized tool. Surveys and questionnaires come across both lines in terms of qualitative and quantitative data – quantitative if you have question items that you can analyze and create descriptive statistics on and then qualitative data because you may have open-ended items in which people are providing responses.
When thinking about the data gathering process there are a couple of things to keep in mind – consider what you are gathering, how you’re gathering it, and when you’re gathering it, then describe how you’re going to analyze it. Let’s go through a couple of examples for this.
In the Department of Education foreign language project, we basically gave some descriptions around what we were going to collect. In this situation, it was around document reviews, institutional data, the partnership rubric, and survey dissemination. We also described how we were going to do the analysis, the descriptive statistics, and how we were going to analyze the qualitative data as well. If you’re talking about the moderate or large-scale, what may be really useful, in addition to having that type of language, is to include a data matrix. In the data matrix, I typically have these four different elements (sometimes they vary a little bit depending on the funding source): the objectives, the evaluation questions, the data sources, and then some form of analysis and interpretation.
Let’s give an example by first stepping back with the Department of Education improvement Grant – this is kind of reiterating what that theory of change was (removing the non-academic barriers to increase the performance graduation and stem rates). These were the evaluation questions that we articulated, and this is how this all got linked together. While you don’t have the objective, what’s clear about the data matrix is that the evaluation questions are linked back to the project objectives, what type of data that we said we’re going to collect is articulated, and the timing and how we’re going to analyze and interpret that information. For that first one that’s about the implementation of the project and we were going to review documents, have discussions with the project team, and then do a content analysis. I should say again, in the narrative there were some additional details, but what’s great about the data and matrix is that you can present a lot of information in a really short period of time.
So, the take-home message for this part is to summarize the data gathering and analytical approach for moderate and large-scale projects also include a data matrix.