Open Annotations for the Web

Together, Hypothesis and the W3C propose to deliver the software and standards necessary for sharable, distributed annotations on all Web content, enabling citizens, journalists and publishers to engage with information in a way that has not yet been possible. We will produce a reference implementation that web users can carry with them anywhere in their browsers, and that publishers and journalists can add to web sites.
In the process of revising this entry our team collaboratively annotated it here at this URL (screenshot above).  Download the Chrome extension to see our discussions.

The concept of a pervasive conversation layer over the Web is as old as the idea of the Web.  With the emergence of the Open Annotation data model and the large community of developers and platforms that are adopting it, we finally are in a place to realize this vision. Our aim is to make this functionality a core part of the Web. and the W3C propose to build on recent progress and deliver a fully functional implementation of annotation that works across websites, and enables a threaded conversation model with rich front-end and back-end capabilities.  This implementation will be packaged as a browser-specific extension, and as code that can be embedded natively into pages; it will also provide an annotation server that stores and delivers annotations to clients.  The open-source and distributed nature of the architecture mean that anyone can download and run their own installation for use by their own users or as a service across other websites.

The impact of Web annotation on journalism can be especially powerful. Journalists would be able to do faster and better research online, to manage their references and quotes more efficiently, and collaborate with editors and others more effectively, decreasing the time and effort to write accurate stories. Publishers would be able to bring more communities of readers to their site and keep them there longer. Readers would be able to contribute and discover corrections, facts, and opinions, and have more focused conversations with other readers.

Web Annotations are a new layer of connection on top of the Web, with many applications. We are working to prototype and standardize these features for widespread interoperability for the general case. We are seeking funding to help concentrate some of our efforts specifically on the case for journalism.

Who are the users or target customers of your project, and what have you learned from them so far? Please give specific examples.

We believe that web annotations have broad applicability for all citizens. This proposal is focused specifically on the use case for news and journalism-- funding dedicated feature development and partnerships with journalists and news organizations to accelerate availability of web annotation in this important category.

We recognize there are many roles in the ecosystem of journalism, and each have specific needs when it comes to annotation.  Last year, in our journalism summit in San Francisco and our workshop at the Poynter institute in December, we learned that journalists need better ways to conduct their own research in such areas as: annotating and organizing source material, saving links back to original context, enabling searches through this material and facilitating private discussions with other collaborators in those locations.
Publishers and editors need better ways to interact with authors, and to facilitate and moderate reader engagement.  Readers need more powerful tools to reference and link into useful information, have fine-grained conversations with their peers, and support higher quality collaboration with journalists (e.g. spelling and grammar corrections, fact checking, requests for clarification, supplying references to related material or other sources).

Of course annotations have major use cases in other areas, like review and discussion of scholarly articles, crowdsourced critique of legal or government documents, detailed analysis of images, video or other archival material, social commentary and collaborative, socratic learning on classroom materials (that can reside anywhere online). We expect the synergy between these additional use cases and the work of journalists to be beneficial for the development of widely used standards.

What assumptions are you making in what you propose, and how will you test them?

We assume that there is a need for annotation that the web does not presently satisfy.  

The vast majority of the web is dark to interaction of any kind.  When it does exist, usually it is in the form of the comment widget which restricts conversations to general discussions of the whole article, making targeted discussion of specific assertions or facts more difficult. As a consequence, traditional commenting leads to sprawling, lower quality discussions which are difficult to manage and awkward to navigate. Further, users and communities cannot easily be followed between sites-- or when they can, those interactions are only open within specific platforms. Recent examples of individual sites moving to or experimenting with annotations include Medium, Quartz, RapGenius and Discourse.

The idea of a generalized conversation layer over the web has captured the imaginations of many for decades-- we are testing this assumption ourselves in more focused ways.

Specifically in journalism, our dedicated workshops have been extremely helpful in highlighting the need to ensure that annotation creates efficiencies within the news workflow-- not extra work.  They’ve also suggested examples of what those efficiencies might be, including the ability to highlight source documents; enabling links to annotated resource documents; permitting the inclusion of supporting or ancillary material that do not fit story size guidelines; and helping place perspectives in context. If we can focus on enabling a wider range of functionality useful to journalists and the practice of journalism, rather than just replacing the comment widget with a fancier equivalent, then there is much greater opportunity for success.

We assume that generalized annotation of the open web is possible

There are many technical challenges in facilitating the annotation of the open web. The list includes things like re-attaching annotations to documents that are constantly shifting in structure and content; developing anchoring strategies for a diverse range of content types; and creating software that is effective, but also lightweight and performant in average browsers. Importantly, we also need a decentralized architecture that allows users to publish annotations to their preferred communities regardless of the underlying site.  Overcoming these challenges is core to succeeding where others have failed.  

We are well advanced in these goals.  At, we are currently testing the alpha version of our annotation application that works across HTML, PDFs (using Mozilla PDFjs), EPUBs, and images.  Key technical challenges remain-- some web pages require dynamic monitoring for changes; some applications (PDFjs, notably) do not render the page until the reader scrolls within range. We believe we have effective approaches for these and other challenges, which may be suitable for standardization and direct implementation in browsers, but further development and wider testing are still needed.  

We assume that we can interest critical parties in annotation as a web standard.

Embedding annotation on web pages and into web applications, and creating browser extensions that bring it the rest of the web are useful in the short term to demonstrate proof of concept and to develop a critical early user base--but long term we believe annotation needs to be web-native, and specifically browser-native, just as Marc Andreessen envisioned for Mosaic over 20 years ago

We think the important early seeds of a coalition exist that can bring this forward as a web standard.  This April, the W3C convened a workshop bringing together interested parties, including contributors to the Open Annotation Community Group, to draft the elements of a charter for a formal Working Group on Web Annotations.  The W3C is actively recruiting a range of key stakeholders within the web and digital publishing communities to ensure that a broad set of the most common use cases and needs inform the standards process.  

Web standards are complicated technical and social processes that can take substantial time to bear fruit.  It’s clear to us that web annotation will continue to move forward on its own through open toolkits like Annotator, regardless of the interest of browser vendors and other major platforms-- however, by identifying the primary subcomponents of annotation, which has substantial overlap with in-browser editing–such as text selection, copying, and finding text–we can expedite their inclusion into web browsers, and user interactions will be noticeably smoother.

We assume that we can bring quality conversation to the web.

Online communities vary in quality.  An assumption is that annotation can be conceived and implemented in a way that results in high quality discourse online.  We believe that is true.  

Quality depends on implementation-- an identity model for participants, good overall design, attention given to community moderation, and other factors.  We think user communities should be able to design their own standards and implement them within a range of open annotation frameworks.  Even within a single annotation platform, different communities might differ on moderation approaches.  Successfully moderated communities will thrive; less desirable ones will falter.

We will test this assumption by working with different existing communities and news organizations to make sure the necessary moderation strategies are in place.  Much of this work will revolve around a core set of group support features that will be completed this summer.

We assume that we can get adoption.

We are seeking broad adoption of annotation as a new means of interacting online.  Further, we are working towards an interoperable future-- where identity, annotation storage and user interface are independent and based either on web standards or widely used open source software libraries.  

Over the last several years we have partnered with organizations and individuals that bring deep domain expertise within distinct areas.  In addition to journalism, we have explored the application of annotation to the Law with the Berkman Center at Harvard; with the OpenGov Foundation we held a workshop in D.C. on bringing annotation to civic government.  In May, together with the American Geophysical Union, arXiv and eLife, we’re hosting a workshop on how annotation can enhance peer review within scholarly communication, and provide opportunities for making it more efficient and timely.  In June, we are working with the Monterey Institute for Teaching and Education to hold a Hewlett-funded workshop on how annotation can enhance education by adding value to open educational resources.

In all these efforts our objective is to first understand the needs of specific groups, and also build relationships that can help us test early systems.

In an early proof of the utility of annotation, edX has now integrated our AnnotatorJS library into its platform. In a test on one poetry class this spring at Harvard, students produced 50,000 annotations in just seven weeks, improving engagement and working with instructors and each other to better understand the reading material.

AnnotatorJS also powers a number of other NewsChallenge submissions, including Peer Library, Harvard Law's H2O platform, and is a front end client for the Annotopia platform,

How will you get your project in front of the necessary people or organizations?

Our efforts to partner with respected institutions, and to bring community together through our annual developers conference, I Annotate, have proven extremely successful at generating enthusiasm.  

In journalism, we imagine that a combination of similar partnerships with appropriate news organizations and journalists can serve a similar objective.  We would use the NewsChallenge funds to deepen this engagement, and to develop plans for high profile demonstrations of annotation around major news events.  In particular, we think trending issues that revolve around central documents could prove an effective demonstration of how collaborative annotation by knowledgeable individuals can deepen the value and experience of journalism.

What are the obstacles to implementing your idea, and how will you address them?

To integrate annotation into journalism most effectively, we will need to identify news organizations that are willing and able to implement annotation in either news production or in user engagement. We will reach out to online newspapers and reporting organizations through their reporters, editors, and when they exist, internal development labs. These engagements will benefit from educating interested parties on how annotation works on the web, including the range of its capabilities and options for different user experiences. We plan to pursue this by generating instructional videos, and holding webinars for potential adopters.

One of the outcomes of this process of engagement with news organizations, particularly with editors and journalists, is the identification of the optimal elements of journalist workflow and reader interaction that we should target for annotation. Although we have learned many potential ways that annotation can improve existing practices or suggest alternative ones, we know that many remain to be discovered. Additionally, it will only be clear through our engagements with news organizations which of these are susceptible to a straightforward introduction, or pose the greatest opportunity for adding value to the news cycle and its customers.  

One longer-term obstacle is to get core annotation functions implemented natively across major browsers, so that users don’t have to install special software or browser extensions in order to add or view annotations. Together with the W3C and Working Group members, we’ll work with vendors to determine the basic building blocks necessary and develop libraries that those vendors can incorporate over time.  Ultimately, we need to focus on adoption.  Only when people are already widely using annotation as a new way of engaging the web will the case be compelling for why these capabilities should be made native.

How much do you think your project will cost, and what are the major expenses?

We are seeking $320,000 over two years.

This funding would pay for one senior developer for 2 years, at $120,000 per year (fully burdened w/ health benefits), with time split between and the W3C.  This developer would work with others on our existing teams, and be responsible for:
  • Working with journalists and publishers to collect use cases and requirements to meet the specific needs of journalism, both for creating annotations and for the workflow of using these annotations–and importantly for ensuring the moderation mechanisms are in place to ensure high quality contributions.
  • Developing prototypes, proofs of concept, and code contributions for open-source implementations of these use cases.
  • Coordinating feature development and interoperability between different browsers, reading systems, services, and other parts of the ecosystem.
  • Identifying and assisting with integrations to existing tools and platforms in use by journalists (DocumentCloud, CMSs, etc)
  • Assisting with the standardization process for features and browser components.
  • Educating communities and promoting the idea of Web annotations among users and potential service providers.

These funds will also provide for at least two substantial workshops produced in coordination with partners to bring journalists together exploring annotation product and platform requirements for news and other media use cases.  ($60,000).

Finally, the funds will cover the travel necessary for team members to meet with potential partners and to attend relevant conferences and events in support of the effort.  ($20,000).

What have you learned from other projects focused on journalists that informs your approach to working with these users?

The most closely related effort is DocumentCloud, a Knight News Challenge funded effort. DocumentCloud provides a repository for source documents, enabling journalists to embed them in published stories and annotate them. We have had preliminary discussions about integrating the open annotation data model into the DocumentCloud open source effort, which this proposal would help fund.  This would bring the benefits of an open data model, which also implies an open storage and identity model, to that important resource for journalists.  It would also make it easy to have DocumentCloud support a more sophisticated real-time, threaded conversation style.

DocumentCloud has proven the utility of shared, open-source platforms for journalism.  We hope to extend its essential paradigm: the web itself can be the annotated repository of source material.
In ONE sentence, tell us about your project to strengthen the Internet for free expression and innovation.
We are delivering both an open-source reference implementation and an open standard for annotations as the new free discussion layer over the Web.
Who will benefit from what you propose? What have you observed that makes you think that?
Open annotation is a fundamental new Web technology that brings diverse benefits to all Internet citizens, allowing them to both consume information and contribute discussion and critique interchangeably without the need for pre-existing implementations. Specifically, the domains of journalism, science, open government, law and education are particularly fertile ground for this new capability, as each are fields that benefit from collaboration, precise engagement and close connection to source material. The response from a broad set of collaborators, including the strong interest from the W3C, the attendees at our two annual "I Annotate" conferences, and our workshops in partnership with the Berkman Center at Harvard, the Poynter Institute and the OpenGov Foundation have shown us that there is worldwide community of engaged beneficiaries that are eager to move forward.
What progress have you made so far?
We have built a working web application that implements annotation across HTML, EPUBs, PDFs and images, as well as an open platform for others to build web applications and services on top of. We have also reached initial agreement about the underlying data structure for standardization in a W3C Community Group, and we have fostered a strong community through numerous events and conversations with a wide range of different stakeholders.
What would be a successful outcome for your idea or project?
A successful outcome would be multiple interoperable annotation services, deployed both within websites and also available more broadly via browser extensions and natively in browsers, with special consideration for journalism use cases; specifically, the rising number of user accounts registered and annotations created using these tools. In societal terms, the refinement of open annotation standards and their incorporation into easy-to-use browser implementations will enable new forms of engagement by citizens and organizations.
Who is on your team, and what are their relevant experiences or skills?
Doug Schepers (W3C) has a decade of experience in working directly with web developers and browser vendors to standardize features of web browsers, and 15 years of web application development. Ivan Herman (W3C) has 15 years of experience in standardization, close connections to publishers and academics, and deep knowledge and skill with semantic and data technologies. Peter Brantley ( has extensive experience with digital publishing environments, and has managed technical groups at large digital libraries. He has helped to foster open standards in distributed information access and description. Randall Leeds ( is a computer scientist with an extensive background in Web architectures, databases and building scalable, open-source applications. Dan Whaley ( has 20 years of experience in building large scale web applications and the organizations around them. His first company, GetThere was the pioneer in web-based travel reservations and is the backbone of Sabre’s web based travel functionality today.
Location is headquartered in San Francisco. W3C has hosts in Boston, Massachusetts, USA; Sophia-Antipolis, France; Fujisawa, Japan; and Beijing, China.


