Swarm Survey - The collective knowledge engine

Swarm Survey turns collective research into connected data on the Internet. The platform will enable quick and easy deployment of new collective research tasks. It will offer researchers ways to analyze and edit information. And it will publish research to an open data network for the world to access and build together.

Photo of Matt McAlister
5 5

Written by

Who are the users or target customers of your project, and what have you learned from them so far? Please give specific examples.
The target customers of Swarm Survey are publishers, news organizations, in particular.  They already experiment with crowd-based research and collective tasks for news projects today, but the tools they use for this purpose are insufficient.

Deploying interactive research is hard to do cheaply and quickly.  The Guardian either plans months ahead or uses swat teams for big stories.  Both options have costs, most of which is the cost of development and time to release something meaningful.

In addition, many of the data journalism efforts happening now at places like ProPublica, The New York Times, The Atlantic and now FiveThirtyEight, among others, are reliant on data scientists and uniquely talented developers.  These are skilled jobs.

We believe crowd-based research projects should be easier to deploy by a wider range of professionals.  All news orgs, including the leaders in the field, would benefit from technology platforms that improve the data collection, the analysis, and the output of crowd research.

Recent examples:

  • The Guardian Australia partnered externally with a team on the Detention Logs
  • Satellite imagery searches can be used to cover a large area very quickly, but these tools are not readily available to news organizations

What assumptions are you making in what you propose, and how will you test them?
There are many assumptions about Swarm Survey that will have to be assessed as part of the development process.  From input methods to data analysis and editing to output usability, the entire stack is new and will present many challenges.

We’re also making assumptions about a core premise of the platform - the benefits and usability of open and shareable data.  We’re hopeful that the market will embrace this as it becomes real and tangible, but publishers have a poor track record for sharing content.

The project will operate in 2 distinct phases - prototype and beta. We will iterate very quickly during the prototype phase and maintain clear success criteria. We will conduct a thorough review at the end of that phase prior to progressing to the beta.

Crucially, we will rely heavily on the teams at The Guardian to trial aspects of the platform with real news stories and real end-users.  That will include using The Guardian’s UX lab where we will watch The Guardian’s end-users using the product in context.

How will you get your project in front of the necessary people or organizations?
The team will work at The Guardian’s offices in London, and we will sit with the relevant teams that will use Swarm Survey.  This will help us design and optimize the service for a key customer that we believe to be representative of the wider market.  

After the prototype is complete we will invite applications for a beta period where we will work closely with a small number of customers of the platform in a very hands-on way.  

We will see the platform in use by a range of different kinds of customers and how they work with their end-users to create successful crowd-based investigations.

What are the obstacles to implementing your idea, and how will you address them?
The technology stack proposed may be too complicated.  We will have to stay focused on the core needs of the customer and always be willing to throw away our work in order to get the solution right for their needs.

Working with customers’ internal platforms may be time-consuming.  We will be very careful about maintaining loosely coupled integrations with things like user registration and content management.

Customers may not use the platform in meaningful ways.  If a news story is uninteresting, then users won’t engage, and, as a result, Swarm Survey may appear unhelpful.  Depending on progress, we may deploy some training resources to work closely with news orgs to get the most out of the service.

We may run out of money before completing a solid product. While unforeseen issues may arise, we will monitor progress against a clear set of goals that map to our budget.  The prototype will be completed with significant budget remaining to adjust for the next phase.

We will also establish a board of advisors with experience in the field who can help guide our decision-making.

How much do you think your project will cost, and what are the major expenses?
The initial phases - prototype and beta - will require an investment of $250,000 ($125k for each phase).  

Most of the expenses will be staff-related.  We will complete the prototype with a team of 4 (CTO, Developer, Designer, Editor).  We will require a small budget for hosting, software tools and minor expenses.  
In addition, The Guardian will contribute to the cost of the project through shared resources, including business management and operations, office space, use of the UX lab, and editorial and development collaboration.

In the future, we may seek additional funding from other sources and partners in order to develop the product further and turn Swarm Survey into a sustainable and hopefully successful independent platform.

How will you acquire users and build a community around this product?
In Phase I (prototype) we will apply the tools in context with The Guardian’s journalism, the news desk, in particular.  This will establish need and benefit for our target customers.  We may engage another partner, but we only want 1 to 3 prototype partners.

As the product is refined in response to The Guardian’s use of it, we will begin Phase II (beta).  We will open applications for access and reach out directly to Knight Foundation partners and others.  We expect 5 to 10 beta partners. 

Additional funding will be required for following phases, but we have a plan for growth if and when that happens. *

After the collaboration model is established in Phase II (beta) we will then begin Phase III (v1).  The goal of v1 will be to create an active network of news publishers from around the world.  We will use co-promotion tactics with our partners to establish the customer base and create a robust partner network.  We will also deploy a traditional sales and marketing team for outreach and to build the commercial business.
If we are successful with news organizations then we will assess how to serve other publishers interested in open data and also organizations that want to use the platform commercially.

* We will seek additional funding if and when the product demonstrates real potential, the data provides real value in the world and customers want more from us.  That may or may not include venture capital partners, foundation support or other funding sources.


The insights we gain collectively from pooling the information we have individually has the power to change our understanding of the world and what’s happening around us.  

From crowdmapping disasters to searching satellite imagery to tracking political campaign messaging to crowdsourcing the price of milk, there are many ways to use the power of active communities as a data task force to get a clearer picture of an issue that affects us all.

The problem for publishers, communities and groups of active citizens who all have a need to create clearer pictures of what’s happening in the world is that each crowdsourced data research project is typically a standalone and independent data set with a data collection and publishing system that gets built and rebuilt over and over again for each new project.  

This long and diverse list of collective research projects from 2012 is a small sample of what is in development every day today:




Swarm Survey aims to platformize the crowd research process.  

It will provide easy to use self-serve interfaces for collecting and connecting data, robust data editing and analysis tools, and output systems that make data accessible and useful.

In addition to the high quality crowd research toolset, Swarm Survey will be an open data network.  

Research projects will be open and re-usable by other research projects on this and other platforms.  The political campaign messaging from one project may be useful when joined up against voting intentions in another study, for example.  Or perhaps the crowdmapped price of milk from one research project may be a useful datapoint in a research project about urban supermarket development.


The technology platform will run off a robust ElasticSearch environment with a distributed database.  The service layer will be built for performance and growth from the outset.  The architecture will be based on services used in production at The Guardian.  

The front-end will be developed in a modular way in order to extend it and integrate with other tools as users find new use cases for the platform.  The first task should we be awarded funding will be to assess the front-end environment options and to identify a solution that will support modular development most effectively.

Business Model

The principles of freemium software services will be applied in order to create a useful platform for both small organizations with few resources and large organizations with more complex requirements.  

Not all research is suitable for public access, for example, and many customers of this platform will value private data over publicly shared data.  Fees will be introduced for services that restrict use and access or special services that incur costs to operate.  

Similarly, integrations with 3rd party tools such as email services, CRM and Single-Sign-On platforms will be available for appropriate fees.    

Commercial features of the platform will be investigated after the initial proof-of-concept phase of the project is complete and clearly demonstrates value and committed customers.



Initial deployments will be trialled with publishing partners including The Guardian.  Use cases will be identified and tested in partnership with key editorial desks in order to optimize all the requirements for both large and small research projects.

After achieving success in testing, we will target publishing partners with similar use cases in the US and Europe.  Then we will expand marketing efforts out to global aid and development agencies, social health and academic institutions, and, finally, the larger commercial organizations and more grassroots local communities across the US and globally.

Any organization that publishes information will want to use Swarm Survey.


Our vision is to build a collective knowledge engine.  

Swarm Survey will make this possible by enabling topical and issue-based, data-as-storytelling research at both large and small scales.  We will create quick, easy, useful, and valuable methods for collecting and connecting data, managing and analysing information, publishing and visualizing research.

The result will be a new kind of open data network, a public place where expert local knowledge fuels collective insights about the world. 

In ONE sentence, tell us about your project to strengthen the Internet for free expression and innovation.

Swarm Survey will unlock and connect crowdsourced research to improve collective human knowledge.

Who will benefit from what you propose? What have you observed that makes you think that?

Publishers, communities, and active citizens will benefit from the ease with which they can trigger new collective research projects that they struggle with today. Academics, journalists, analysts and many other organizations will benefit from a valuable pool of human insight in which to study topical interests. A shared view of the data being collected by participants will improve the way everyone understands the world.

What progress have you made so far?

At the Guardian, we currently employ many elements of the platform we have in mind on a weekly and sometimes daily basis. Our knowledge and experience will be applied to creating a new system that answers the many problems we have encountered over the years.

What would be a successful outcome for your idea or project?

At minimum, we would feel Swarm Survey was successful if publishers were able to conduct effective crowd research projects and publish their results in meaningful ways for the public.

But we believe Swarm Survey can activate collective knowledge more broadly through this approach. It should be possible to activate collective knowledge at a micro level as neighbors survey their local area together, at a wider interest level where large communities improve their understanding of the issues they care about, and at a global level where we’re able to make smarter choices about the world around us based on a view of the world as a whole.

Who is on your team, and what are their relevant experiences or skills?

**Graham Tackley** is Director of Architecture at Guardian News and Media. He ran the Web Platform Team and led the implementation of the Guardian's unique Open Platform Content API. More recently, Graham has been focused on large scale reader engagement, news innovation and new development techniques at the Guardian.

**Tom Armitage** is a freelance technologist, designer and writer living and working in London. He makes systems, tools, toys, and art out of hardware, software, and the network. Tom has worked on everything from a large-scale website to aggregate and visualise UK schools data to giant, multi-part games that span a Parisian art gallery; from bridges that talk on Twitter and cities that speak over SMS to laser-cut sculptures of actors’ movement.

**Seán Clarke** joined the Guardian website in 1999. Now Head of interactives, he has worked on projects including the US embassy cables document publication and the MPs' expenses crowdsourcing investigation.

**Matt McAlister** develops new businesses at Guardian Media Group. He has been involved in various aspects of the digital publishing ecosystem since 1994 - leading digital arms of print businesses, building platform services at large media companies and creating new digital businesses. Matt is the founder of UGC platform http://n0tice.com and a co-founder of the collaborative journalism platform http://Contributoria.com.


London, England, United Kingdom

Attachments (1)


Swarm Survey will apply the principles of other similar open platforms including open and modularized inputs, commercial add-ons, open and publicly accessible outputs, etc.


Join the conversation:

Photo of Emi

Hi Matt,
What effect, if any, do you anticipate this will have on modern polling techniques? Do you seek to replace or augment them? Thanks!

Photo of Matt

To be honest, I really don't know what effect it will have. We can assume that an open and democratic system for collecting and sharing public insight in this way is going to enable different questions and therefore different answers. Hopefully, this platform will help surface data in the world that has been too hard to get and use easily in the past.

Photo of Emi

Thanks for the additional insight, Matt -- appreciate it. Best of luck!

View all comments