Saturday 30 November 2013

Our search for evaluations

As explained in our October 2013 post “What evaluations do we review?” we have looked for evaluation reports in English that meet certain criteria. Our search started on 23 September 2013. We used a three-pronged strategy: (1) web-based search (“web search”), (2) communication with contacts in the fields of VAWG and evaluation, and snowballing via these contacts (“snowballing”) and (3) specialised web-based networks (“DFID and helpdesk”). In addition to evaluation reports, we identified meta-evaluations and specialised publications on “best practices” to end VAWG, as well as literature on evaluation quality and effectiveness.

For the web search, we used combinations of the terms “evaluation”, “review”, “assessment”, “best practice” and 38 terms closely related to violence against women and girls and work to end VAWG, such as “violence against women”, “gender-based violence”, “forced sexual initiation”, “forced marriage”, “human trafficking”, “masculinities”. The web search yielded many duplicates and triplicates, i.e. it reached a high degree of saturation which could be expected in view of the overlaps between these terms.

However, focussing on the web search would have yielded incomplete results. As shown in the figure below, 84 per cent of the evaluation reports that we identified came from a single source, i.e. either from web-search, direct communication with our networks or snowballing.

This confirms that it was a good idea to combine three search methods. We have been particularly impressed by the effects of our snowballing action. It was initiated through two channels – (i) direct contact to evaluation and VAWG specialists known by DFID and the Review team, and (ii) publication of our call for evaluation reports on the following list servers and social web sites: Platform for Evidence-Based Learning list server (PELICAN), professional groups on LinkedIn (AEA, Evaluations and Development), Michaela’s Facebook and Twitter accounts, and our blogs and, after registration of the new domain,

Sending out e-mails yielded an excellent response. Examining the messages received within three weeks from our call, we counted some 175 persons who had received the call by e-mail (as direct addressees or in copy). The actual number is probably higher, as we cannot assume that we have been copied into all e-mail correspondence. The interest raised by social web posts was considerable: (Michaela’s blog) registered a peak of 270 page views on the day our call for evaluation proposals was posted on the above-mentioned platforms (as compared to about 40-120/day in “normal” times).

Soon we will describe what kinds of evaluations we have found. Our draft Scoping Report is with the Review Reference Group; when we'll have the final version we'll provide a link to it on this blog.

Wednesday 6 November 2013

Soon to come: QCA conditions

We are on the last straight line of two -now that I think of it, three- processes:

  • Finalising the scoping - i.e. the gathering and sorting of evaluation reports in a systematic manner and preparing a scoping report explaining how we have done it. The good news is, there are more evaluations - especially published ones - than we had expected to find! The bad news is, we cannot conduct qualitative comparative analysis (QCA - see the posts below for a quick introduction) on all of them. Not because QCA would not allow for it - in the opposite, maximum openness is a key feature and the beauty of QCA - but because going through all reports would take much more time and resources than what we can afford. We must sample! Watch this space. 
  • Defining the conditions that -we (and the literature we have reviewed) suspect- contribute to making an evaluation of a VAWG-related intervention useful (or not), i.e. that increase or diminish positive evaluation effects. Our tentative conditions have been reviewed by the Review Reference Group, whose members have come up with useful questions and comments. We are refining the conditions now; soon they will be tested 'in earnest' in a first round of coding.
  • Recruiting and instructing coders - a highly qualified team of five (we'll ask them whether they would like to be presented on this blog; if they do, you'll read more about them) has been brought together and awaits our detailed definitions and directions for coding.
We will share more on these points (including our set of conditions) on this blog near the end of this month, when this busy phase of scoping, sorting, sampling, refining our model and instructing the coders will be over. An intensive and exhilarating process!

Thursday 24 October 2013

Hopes and Misgivings

Rick Davies has shared his hopes and reserves about our Review on his Monitoring and Evaluation NEWS blog. We share Rick's hopes, and we realise our approach is ambitious. That is one reason why we are excited about it! We'll respond in more detail near the end of our scoping phase.
Right now we are super-busy coping with an  avalanche of evaluation reports. Our combination of web-search and snowballing has yielded an overwhelming response. We were worried that we might unearth too few evaluation reports. It turns out that there are many, especially from recent years and even many published ones.

Huge thanks to DFID, the Reference Group and everyone who have contributed to the avalanche by responding to or forwarding our request for evaluations!

Thursday 17 October 2013

Why qualitative comparative analysis (QCA)?

To generate realistic, practice-oriented findings and recommendations, the Review needs to differentiate between a wide range of evaluation approaches, methods and contexts. The number of existing evaluations in the field of VAWG is too small for statistical analysis to yield accurate conclusions. Yet, it would not do justice to the variety of evaluation settings if we selected only few evaluations for detailed analysis, as a conventional comparative case study would do. Qualitative comparative analysis (QCA) enables us to make full use of evidence from a wide spectrum of evaluations - without jeopardizing the applicability and generalisability of our findings. QCA has been designed for “medium-N” situations, i.e. situations where there are more than a handful of cases, but too few for meaningful statistical analysis.

QCA rests on the assumption that several cause-to-effect chains coexist. It matches sets of characteristics (in our case, the characteristics of evaluations) with specific outcomes (for instance, improved results of advocacy efforts). This method helps reveal which interactions between different kinds of methodology, resources and other conditions are necessary to achieve high quality evaluations under specific sets of circumstantial factors. 

QCA is transparent and replicable: It makes it possible and necessary to explain the iterative process of categorizing and coding evaluation reports that will be included in the analysis. We will go back and forth between conceptual work (categorisations of evaluation practice) and the evidence (evaluation reports and users’ narratives on evaluation processes and outcomes). We will thereby refine the definitions of dimensions of evaluation practice, and indicators that can be used to categorise evaluations. New factors will be taken into account when they prove necessary; old differentiations between evaluation settings will be given up if they prove superfluous.

Statistical methods or “conventional” comparative case studies may include similarly iterative processes, but their movement between theoretical levels and the evidence tends to be unsystematic and implicit. This “black box” situation may lead to the omission of important explanatory factors, and makes it difficult to replicate the findings.

What is the review about?

The UK Department for International Development (DFID; LINK) has commissioned a review of evaluation approaches and methods for violence against women and girls (VAWG)-related interventions (‘the Review’). Its purpose is to generate a robust understanding of the strengths, weaknesses and appropriateness of evaluation approaches and methods in interventions addressing violence against women and girls (VAWG), particularly in international development and humanitarian contexts. Review findings and recommendations are expected to support practitioners’ efforts to (i) commission or implement evaluations that yield robust findings, and (ii) assess the relevance and generalizability of the evidence gathered in evaluations. Furthermore, the Review will contribute to the growing body of literature on applied research on VAWG in development and humanitarian contexts, and to the current debate on broadening impact of evaluation designs.

Interventions tackling violence against women and girls (VAWG) have characteristics that make them difficult to evaluate. VAWG takes many forms and potentially affects all stages and spaces of women’s and girls’ lives. Programmes tackling VAWG tend to combine different types of activities – such as a mix of services for VAWG survivors, public sensitisation campaigns and policy advocacy – to address multiple causes. Some of the changes pursued, for example, reduced social acceptance of VAWG, take many years and are complicated to measure. Social stigma and the risk of re-traumatising survivors make it problematic to gather data from beneficiaries. Seemingly simple indicators – for example, the numbers of clients at counselling centres, or of court cases on VAWG – lend themselves to contradictory interpretations: An increase in reported VAWG cases suggests a welcome attitude change in places where under-reporting has been a problem, while in a different context it could indicate an undesired increase in VAWG incidence. Existing reviews on ‘what works’ (e.g. Bott et al 2005, Heise 2011) have noted issues with evaluation quality, but there is no clear consensus as to what a good evaluation should look like in this field. A number of VAWG-related evaluations exhibit gaps in the validity, reliability and generalisability of their findings. 

Our review team is convinced that there is no single evaluation method likely to produce the best possible results for all VAWG-related interventions. That is why we strive to examine a broad spectrum of evaluation approaches and designs. We will use qualitative comparative analysis (QCA) to identify which combination of factors is needed to produce a good evaluation and in what context. We trust that this approach will reveal a variety of effective ways to assess interventions tackling VAWG in development and humanitarian work – as well as the pitfalls that come with different evaluation designs. Our findings will be distilled into concrete recommendations, illustrated with exemplary evaluations. Process tracing, the second pillar of our methodology, will allow us to precisely identify best practices for successful evaluations. 

What evaluations do we review?

Our review focuses on evaluations that meet the following criteria:

Evaluative character
The systematic and objective assessment of an on-going or completed project, programme or policy, its design, implementation, and results in relation to specified evaluation criteria.” (OECD/DAC)
“Systematic and objective assessment” are defined in a fairly loose manner, as an organised assessment that includes efforts to reduce bias. 
Evaluation context
International development and humanitarian interventions.
Types of interventions
Interventions explicitly tackling any form of VAWG, as the main or secondary purpose.
Language of the evaluation report
Evaluations completed in 2008 or later.
Publication status
Published and unpublished evaluations.

So far, as of 16 October 2013, our mix of web-search and snowballing via e-mail contacts and the social web has yielded more than 100 evaluation reports that meet our criteria. Our scoping phase ends in late October. If you happen to have an evaluation report that meets the criteria above, please forward it to the review team (click on the link to get to our address), possibly including e-mail addresses of the evaluator and the evaluation commissioners.