The Challenge

3789 followers

How can we strengthen the Internet for free expression and innovation? read the brief

Entry

Global Internet Monitoring Project

The internet's potential as a medium for innovation and self-expression is hampered by increasingly invasive surveillance and censorship practices, which stifle the freedom of expression and the empowerment of marginalized voices worldwide. The incidental, anecdotal reporting on these practices is insufficient to provide policy makers, researchers and the general public with a comprehensive perspective on the scope and reach of these practices. Our goal is to provide a monitoring platform that delivers structured, up-to-date information that reflects the reality of internet censorship and surveillance, using open software, an ethical data governance framework, and peer reviewed methodologies.
1. Who are the users or target customers of your project, and what have you learned from them so far?

Our audience is very broad, but can be split into three categories: 1) researchers and data analysts, 2) programmers and developers, and 3) journalists, policy makers and the general public. The boundaries that delineate these categories roughly correspond to the level of granularity of the datasets to be consumed. 

Researchers, analysts: raw data.

The raw, unfiltered data is most likely to be useful for individuals and groups working in research. They could have very specific criteria requirements such as the need to analyze a precise network event, coming from a set of countries within given time intervals. Conversely, the raw data should also allow for original experiments pertinent to a respective researcher's field. For example, social scientists might attempt correlating data from the Global Internet Monitoring Project with sentiment analysis results from prior research.  Chokepoint Project and the Tor Project will be releasing the raw data generated as it becomes available in a bulk format. We will use this input to feed Application Programming Interfaces (APIs) described below.

Programmers, developers: structured data, APIs.

Programmers, developers and other groups building tools requiring structured data make up the second targeted users group. APIs are abstraction layers meant to remove most of the processing required to generate meaningful data for a given use case. This data will be predictable in its format, periodically updated, clearly organized and documented according to specifications made public. Typical use cases for this level of data granularity would be a resource endpoint providing data about blocked website domains in a country or a list of backends that appear to manipulate headers sent from our probes. Chokepoint and Tor will be leveraging the raw data gathered to create simple APIs for potential data consumers.

General public, journalists, policy makers : data applications. 

By creating compelling visualizations, condensed reports and rich diagrams from API data, one can communicate a strong message. These tools are popular among journalists looking for effective ways to contextualize facts. Policy makers can make more informed decisions if they are presented with solid data in accessible form. The general public should also have access to these tools and be able to contribute by creating their own easily. Chokepoint and Tor will lead by example by creating sample application that demonstrate the use of our produced structured data.
An important lesson we have learned from our target groups is the importance of involvement at each stage of the data publication process, providing useful tools along the way. It it also quite common to observe scientists, journalists or researchers produce papers and articles without releasing the raw data alongside it. In this respect, we hope to set an example by exposing our methodology and raw data to public scrutiny in the hopes of providing what could be considered a "best practice" in the field of internet monitoring data analysis.
 
2. What assumptions are you making in what you propose, and how will you test them?
The core approach of this project is NOT to assume, but to test. That having been said there are the fundamental assumptions that:

A)        The events to be tested for with the probes do in fact take place in certain countries. (based on previous reports, projects, etc.)

B)        The events to be tested will change over time.

C)        These events are detrimental to free speech, net neutrality, equal access to information and the like. 

D)        Gathering this test data in an ongoing fashion will provide hard fact as to the technical state of affairs. 

E)        Public dissemination of this data both in its raw format and in digested form will raise awareness and positively impact a large number of people.

F)        Understanding both means and content of interference will improve successful mitigation of this interference. 

Responding to this question more technically the following can be said: 

A)        Different network tests have different assumptions. For this reason all of the test results always need to be cross referenced with results from a network vantage point that does not perform network filtering.

B)        There is a wide range of tests, should the underlying assumption of one prove false, others are likely to compensate so that collectively there will be actionable information generated.

C)        Disseminating probes will require supporting those in the best position to run these probes.

D)        Analysis of generated raw data will not happen by itself.

E)        Digest results have to be presented at the applicable level of understanding of the target audience. 
Test specific assumptions of the various ooni-probe tests can be found here: https://github.com/TheTorProject/ooni-spec/tree/master/test-specs.
3. How will you get your project in front of the necessary people or organizations?
By: 

A)        Leveraging our existing network of people and organizations. This network covers a wide area of interest and expertise and corresponds to the intended audiences, including media, human rights defenders, free speech advocates, industry, academia and policy makers.

B)         Coordinating a media campaign to promote the tool using both the existing publicity channels, such as the Tor blog and by soliciting participation from the aforementioned network of existing contacts.
 
4. What are the obstacles to implementing your idea, and how will you address them?
A)        Getting people from the countries we are interested in to run the tool. 
The approach to address this issue is to perform outreach and provide significant support to "on-the-ground" partner organizations. This activity should be made somewhat easier by "productizing" the probes on raspberry pi´s, thereby decreasing the level of technical acumen and consequent time spent to run a probe.

B)        Having relevant inputs to the probes, such as country specific domain lists.  
Some very good work addressing this issue has recently been made available by Citizen Lab:  https://github.com/citizenlab. 
In addition to this, we will be disseminating a survey to our on-the-ground partners to further expand the available input sets. Unfortunately any input set will have a limited time-span. Addressing this longevity issue is not a focus at this time, but the problem is known.

C)        Automated analyses are limited in that it is very difficult to determine causality.
There will be no attempts at determining causality and any statement of fact as a result of such automated analyses will be treated with the utmost suspicion.

D)        Visualizations are inherently biased, its narrative power risks presenting a false reality.
As with the analytic data underpinning these visualizations, statement of fact should be treated with the greatest suspicion. The approach to mitigate this issue is to be very conservative in regards to what narrative might be interpreted from any visual representation of the data analytics.

E)        Data processing, infrastructure, security and methodologies. 
Once the probes have generated reports and sent them back for analysis (raw data), most expected challenges will relate to the manipulation of aforementioned data. Known obstacles such as data transport, data security, data publishing methods, data anonymizing and data processing intersect the realms of expertise found in both Chokepoint and Tor projects' team members. Our experience in building reliable, secure systems to handle data at scale will be leveraged to design pipelines to coordinate data traffic. We have extensive experience in producing successful production systems, from conceptualizing high-level interactions between different moving parts to implementing the code that make them run smoothly. 
 
5. How much do you think your project will cost, and what are the major expenses?
Based on a breakdown of activities and expenses the project is estimated to cost $ 402856
This breaks down as follows: 

Analysis & development:                                      252300
Project Management & Project Support:              34250
System Administration:                                               26500
Resources:                                                                        86320
The resources break down as follows: 
100 Raspberry pi´s                                                        55200
Legal support                                                                  10000
Partner support (10x2000)                                            20000
Travel & stay        (10x1500)                                            15000
Server HW or service equivalent (4*3500)                14000
Incidentals (50*100)                                                       5000
Bandwidth (24*700)                                                      16800

A more detailed budget and corresponding budget rationale is available and represents an “all features“ effort.
 
6. What other people or projects are working in this space, and what have you learned from them?
This space has become joyfully crowded and any listing would not do it justice. That having been said: 
There are various projects that are focusing more on network neutrality in general. Examples of this are NeuBot developed by the Nexa center, Glasnost by the Max Planck Institute, Project BISMark from Georgia Tech.
We are currently in contact and collaborating on some projects with the Nexa Center and Georgia Tech. Their experience in this field has proven useful in understanding what good deployment strategies are. For example the idea of project BISMark to give out home routers to people interested in contributing results (and their success with such a strategy) is the basis of our plan to ship rasperry pi devices to potential ooni-probe users.
More specifically aimed at internet censorship measurement there is: ONI and Herdict by the Berkman center. We have learned from these projects that it is important to provide the raw data of the measurement results in order to allow other people to base analyses on this work. The ONI project aimed at defining the standards for Government removal content has also taught us that it's very important to have a standard data format that is well specified.
The Oxford Internet Institute has done and is still doing good work on external probing of DNS poisoning, working with them has helped understand the differences between and respective limitations of manual versus automated analyses. As well as the benefits of ongoing bulk data for further research.
Measurement lab provides a fantastic infrastructural platform to run tests against and publish raw data. It is clear that as uptake increases, infrastructural requirements and problems in processing analytics results grow exponentially. 
GreatFire has successfully focused on China and provides daily updates both about full domain blockages and Weibo content censorship. It provides a very good example that the technical means of censorship are increasingly sophisticated, further strengthening our conviction in the importance of ongoing, eventually global, and publicly available measurements. 
In general there is a lot of excellent work being done out there by many great people and organizations, many of whom we praise ourselves lucky to count amongst our friends. Each has a slightly different target audience which helps limiting the problem set to be solved.
 

With the rapid growth of censorship and surveillance practices that directly or indirectly violate civil and human rights, it has become of vital importance to augment our incidental and anecdotal understanding of these practices with on-going, evidence-based reporting on what is actually happening on our networks. To achieve this requires a globally distributed network of standardized network measurement nodes, as well as powerful analysis and visualization tools.

We, the Tor project and Chokepoint Project, have over the past two years amassed extensive technical and domain-specific expertise on the detection, analysis and reporting of surveillance and censorship events. The Tor Project has been developing open standards, software and a methodology for conducting measurements. Chokepoint Project has been working on near real-time processing, analysis, visualization and contextualization of this type of data.

For this proposal, we aim to extend, improve and integrate the existing software systems and analysis tools, with the goal of enabling more comprehensive, evidence-based, and up-to-date reporting on censorship and surveillance events. Our proposal works towards this goal with a three-pronged approach:


1. Expand and improve Tor's ooni-probe software suite, which provides the basic infrastructure to support a globally distributed measurement network.
  • Support for running ooniprobe on raspberry pi devices.
  • Running tests periodically, making ooniprobe a system daemon.
  • Support for remotely provisioning probes with tests and inputs to run based on their geographical location and ASN.

2. Integrate and enhance Chokepoint's data analysis and visualization tools, to incorporate and report on data from the ooniprobe software suite.
  • Automated processing of ooniprobe yaml reports.
  • Automated analysis of ooniprobe yaml reports.
  • Automated collection of ooniprobe yaml reports
  • Support for automated generation of analytics visualization and analytic data downloads.

3. Reach out to Tor's and Chokepoint's extensive list of contacts to plan the deployment of ooniprobes "on the ground", in a selected set of 10 to 20 countries.
  • Survey creation and distribution to determine country specific internet use
  • User feedback features
  • Training material
  • Plan for software distribution  

Since no country is alike, and internet use is equally diverse, any measurement needs to be contextualized into a regional socio-political framework. Surveys will be distributed to on-the-ground partner organizations to construct a measurement methodology that yields culturally relevant results.
In ONE sentence, tell us about your project to strengthen the Internet for free expression and innovation.
We believe that open and continuous knowledge detailing the innards of internet censorship reveals the cost it encumbers to freedom of expression and global innovation.
Who will benefit from what you propose? What have you observed that makes you think that?
We believe that access to up-to-date, properly contextualized, empirically verifiable information on surveillance and censorship benefits policy makers, researchers and the general public. Currently, this information, if it is available at all, is extremely fragmented, out of date, and/or unverifiable. While the past years have seen some laudable efforts on the part of influential actors to share more information more broadly, they do not generally meet the requirements of broad (geographical) scope, timeliness, and verifiability. Since it is imperative that decisions influencing internet freedom are formulated based on facts rather than anecdotal reports, policy makers will benefit from the ability to focus on actual, rather than suspected (or merely publicized), issues. Furthermore, researchers, in particular those who explore the socio-political ramifications of the internet within the context of freedom of expression and the right to privacy, will benefit from open access to a large repository of continuously updated information. Finally, the general public will benefit, by gaining a deeper understanding and increased awareness of the prevalence of internet censorship and surveillance in their local communities and worldwide. Having spoken extensively to both policy makers and researchers over the past two years, and noting the impact of high profile intelligence revelations on public discourse worldwide, we have been strengthened in our conviction that access to timely, verifiable information, presented in an understandable fashion, is paramount to preserve the internet's capability for innovation and self-expression in a globally connected world.
What progress have you made so far?
The Tor Project has developed a tool for collecting the measurements (https://gitweb.torproject.org/ooni-probe.git, https://gitweb.torproject.org/ooni-backend.git), a peer reviewed paper published (https://www.usenix.org/conference/foci12/workshop-program/presentation/filast%C3%B2) on the methodology used, specifications of the data format and the tests (https://github.com/TheTorProject/ooni-spec) and collected some results from a set of countries (https://ooni.torproject.org/reports/0.1/). Chokepoint Project has developed and is running a platform for the collection, processing, analysis and contextual presentation of data from multiple sources in near real-time, some live results can be seen here: https://beta.chokepointproject.net/country/CN?show=2014-03-13 , code is not publicly available as yet, it consists of collection, processing and analytics code as well as a distributable graphic presentation front-end. More about the Chokepoint Project´s approach here: https://chokepointproject.net/about-2/
What would be a successful outcome for your idea or project?
Improvement of the mitigation of censorship and interference, providing faster actionable information for policy makers, tool makers, publishers and journalists to counteract impediments on free speech and innovation. An improved, continuously up-to-date overview of what is censored where, how and by whom.
Who is on your team, and what are their relevant experiences or skills?
Arturo Filastò He is a developer at GlobaLeaks and The Tor Project. He studied Mathematics and is currently student of Computer Science at Università di Roma “La Sapienza”. He is a well known security researcher and regularly gives lectures at international conferences. He has trained activists in the use of security and censorship circumvention technologies. He is also the lead developer of OONI (Open Observatory of Network Interference), a project aimed at detecting and monitoring censorship in the world. Pascal Haakmat Is an analyst at Chokepoint Project. He has studied Artificial Intelligence at the University of Amsterdam and is currently studying Law at the University of Amsterdam. He has several decades of experience as a programmer in both free/open source and proprietary environments. Prior to working at Chokepoint, Pascal has been employed as co-founder and CTO of the digital agency Lightmaker Amsterdam. Ruben Bloemgarten Is architect at Chokepoint Project. He has over 18 years of experience in information technology, the past 15 years as a systems engineer in the telecom industry and as an independent systems architect. Laurier Rochon is a developer at Chokepoint Project. He has studied the socio-political impacts of Free Libre Open Source Software in the Networked Media Program of Rotterdam's Piet Zwart Institute. He has experience working on both FLOSS and prorietary projects for the last 10 years.
Location
Rome, Italy Amsterdam, The Netherlands Montreal, Quebec, Canada

Comments

Join the conversation and post a comment.

Login
Close
Login to News Challenge
 
or