Innovating at THE Port Hackathon – Tackling the challenge of unstructured data to improve response

29.Oct.2018
By JIPS
Related Topics: Innovation

There is no such thing as too much information, right? Wrong! Sometimes, the large amount of information can create a lot of noise, which affects the ability of humanitarians to identify key characteristics of the situation in a crisis. A system to easily extract, process and interpret information meaningfully from all data available is, therefore, a priority for the sector and can contribute to a faster, more informed decision-making in crisis situations.

This is where the DEEP project comes into play (and more about this just below). At the recent Port Hackathon, hosted in the CERN IdeaSquare, our colleague Wilhelmina Welsch, JIPS’ Head of information management and innovation, took on the challenge together with a team of 11 hackers to further enhance the platform’s capacities, using artificial intelligence and machine learning elements, thanks to financial support from the IFRC. The event saw more than 60 professionals from all over the world come together to “build working prototypes and tangible solutions for real-life humanitarian problems”.

 

Watch from minute 19 to hear Wilhelmina Welsch presenting the DEEP Platform and the work of the DEEPER team:

 

The DEEP project & the Hackathon challenge

For more informed decisions and rapid response to humanitarian crises, we need to understand three key questions: Who is affected, where is it happening, and how bad is it. This is where the DEEP platform becomes relevant: initiated in the aftermath of the Nepal earthquake in 2015, the platform supports users by:

  • allowing the structured review of a large number of documents,
  • identifying key information and
  • extracting relevant data for prompt analysis.

One way it supports analysts to do so is through developing machine learning and natural language processing algorithms. The project is a concrete response to the Grand Bargain, and more specifically its focus on “a common approach for joint inter-sector analysis: frameworks, tools, and operational guidance”. DEEP governance members include ACAPSIDMCOCHAUNHCRIFRCOHCHRUNICEF and JIPS.

During the four weeks of preparation and 60 hours of final hackathon, the team worked together to understand how the DEEP platform can help address four specific challenges that analysts face in the midst of humanitarian crisis:

  • Within the overwhelming amount of information, what is new about a specific piece of information?
  • How can we get an overview of what entities, such as people and places, are mentioned in documents?
  • At a glance and without having to read through the full text, how can we understand what a specific document is about?
  • How bad is the situation? What do the documents tell me in terms of numbers?

Benefitting from the rich variety and different background and expertise of its members, the DEEPER team was able to produce four concrete outputs during the hackathon:

 

1. Creation of a similarity score for new documents

As an analyst working with a large number of documents ranging from PDFs to web articles, it is important to know if a new piece of information contains new insights or if it is very similar to existing documents. The team was able to set up a mechanism by which new documents could be scored for their uniqueness against all other available data sources in an analyst’s library.

2. Improved extraction of entities in documents

The process of attempting to identify what geographic locations are mentioned in a document can be time-consuming as it requires manually reading through a number of pages, or looking for words that may resemble the name of a location. The DEEPER team developed a technique to automatically extract and visualize names of places, people, events, organisations and a number of other types of entities from reviewed documents.

3. Enhanced automatic tagging functions

Analysts need to know what a document is about. Is it relevant to food, health, or logistics needs, or another topic that they need to be aware of? By applying machine learning methods, the hacking team was able to improve the overall precision of the suggestions for tagging to up to 71%.

4. New models to extract tabular data from PDFs

To improve analysis in humanitarian crises we need to be able to extract numerical data from tables and graphs included in documents, that are most commonly found in PDF format. However, while the human brain may easily understand at a glance the essence of a table, for a computer this is much more difficult. The team worked on enhancing functionalities of the DEEP platform to capture and extract tabular data from PDF documents, potentially resolving this very real challenge.

 

The solutions developed by the team are all open source and available for use by other humanitarian actors and platforms, not just limited to powering DEEP. All code developed in the challenge is available on GitHub, as is the code for the core DEEP platform.

Credit: The PORT – Hackathon at CERN

 

Innovate to survive

The ability to innovate is one of the most important issues that the humanitarian sector needs to invest in to keep doing and improving the crucial work it does.

Beyond our participation in the DEEP, at JIPS we have been investing in exploring new practices and identifying new tools and approaches to improve our profiling work. As part of our 2018-2020 strategy, we have been engaging in a variety of inter-disciplinary exchanges with academia and practitioners from the field, including the University of Oslo Architecture Department, the Bauhaus University of Design Weimar as well as CERN and MIT. We have been exploring innovation in key prioritized areas such as community engagement in data collection and analysis, anonymization of household survey data, and combining information needs of urban planners and other built environment professionals within the profiling process.

Shortly after THE Port Hackathon, our colleague Wilhelmina flew to the US to participate in the Big Data for Sustainable Development workshop at the MIT Media Lab, a leading hub of innovation studying “human dynamics” through the use of big data, organised by the UNSSC and the Data Pop Alliance. She exchanged with practitioners on the applied use of big data for tackling sustainable development challenges, on the politics and ethics of big data in the future, and the growing importance of understanding the trade-off between the protection of individual level data (such as through the GDPR) and its utility to generate insights allowing to serve communities faster and better. Stay connected to learn more about how we will explore linkages between the use of big data and profiling in the future.



Join us at these upcoming events

  • JIPS Event
    15 Jun 19 - 21 Jun 19

    Profiling Coordination Training 2019 - Dakar

  • Partner Event
    25 Jun 19 - 25 Jun 19

    ECOSOC's Humanitarian Affairs Segment 2019 | Side Event on Internal Displacement

Let's stay in touch

Email Format

JIPS – Joint IDP Profiling Service will use the information you provide on this form to send you email updates (usually once a month) about our activities.

You can change your mind at any time by clicking the unsubscribe link in the footer of any email you receive from us, or by contacting us at info@JIPS.org. We will treat your information with respect and will never share it with, or sell it to, others. By clicking 'SUBSCRIBE', you agree that we may process your information in accordance with these terms.

We use Mailchimp as our marketing platform. By clicking 'SUBSCRIBE' to subscribe, you acknowledge that your information will be transferred to Mailchimp for processing. Learn more about Mailchimp's privacy practices here.

This website uses cookies to ensure you get the best experience on our website. Learn More