The NGI Policy-in-Practice Fund – announcing the grantees

We are very excited to announce the four projects receiving funding from the Next Generation Internet Policy-in-Practice Fund.

We are very excited to announce the four projects receiving funding from the Next Generation Internet Policy-in-Practice Fund

Policymakers and public institutions have more levers at their disposal to spur innovation in the internet space than often thought, and can play a powerful role in shaping new markets for ethical tools. We particularly believe that local experimentation and ecosystem building are vital if we want to make alternative models for the internet actually tangible and gain traction. But finding the funding and space to undertake this type of trial is not always easy – especially if outcomes are uncertain. Through the NGI Policy-in-Practice fund, it has been our aim not only to provide the means to organisations to undertake a number of these trials but also make the case for local trials more generally.

Over the past summer and autumn, we went through a highly competitive applications process, ultimately selecting four ambitious initiatives that embody this vision behind the NGI Policy-in-Practice fund. Each of the projects will receive funding of up to €25,000 to test out their idea on a local level and generate important insights that could help us build a more trustworthy, inclusive and democratic future internet.

In conjunction with this announcement, we have released an interview with each of our grantees, explaining their projects and the important issues they are seeking to address in more detail. You can also find a short summary of each project below. Make sure you register for our newsletter to stay up to date on the progress of each of our grantees, and our other work on the future of the internet.

Interoperability to challenge Big Tech power 

This project is run by a partnership of three organisations: Commons Network and Open Future, based in Amsterdam, Berlin and Warsaw.

This project explores whether the principle of interoperability, the idea that services should be able to work together, and data portability, which would allow users to carry their data with them to new services, can help decentralise power in the digital economy. Currently, we are, as users, often locked into a small number of large platforms. Smaller alternative solutions, particularly those that want to maximise public good rather than optimise for profit, find it hard to compete in this winner-takes-all economy. Can we use interoperability strategically and seize the clout of trusted institutions such as public broadcasters and civil society, to create an ecosystem of fully interoperable and responsible innovation in Europe and beyond?  

Through a series of co-creation workshops, the project will explore how this idea could work in practice, and the role trusted public institutions can play in bringing it to fruition. 

Bridging the Digital Divide through Circular Public Procurement

This project will be run by eReuse, based in Barcelona, with support from the City of Barcelona, the Technical University of Barcelona (UPC) and the global Association for Progressive Communications.

During the pandemic, where homeschooling and remote working have become the norm overnight, bridging the digital divide has become more important than ever. This project is investigating how we can make it easier for public bodies and also the private sector to donate old digital devices, such as laptops and smartphones, to low-income families currently unable to access the internet. 

By extending the lifetime of a device in this way, we are also reducing the environmental footprint of our internet use. Laptops and phones now often end up being recycled, or, worse, binned, long before their actual “useful lifespan” is over, putting further strain on the system. Donating devices could be a simple but effective mechanism for ensuring the circular economy of devices is lengthened.  

The project sets out to do two things: first, it wants to try out this mechanism on a local level and measure its impact through tracking the refurbished devices over time. Second, it wants to make it easier to replicate this model in other places, by creating legal templates that can be inserted in public and private procurement procedures, making it easier for device purchasers to participate in this kind of scheme. The partnership also seeks to solidify the network of refurbishers and recyclers across Europe. The lessons learned from this project can serve as an incredibly useful example for other cities, regions and countries to follow. 

Bringing Human Values to Design Practice

This project will be run by the BBC with support from Designswarm, LSE and the University of Sussex

Many of the digital services we use today, from our favourite news outlet to social media networks, rely on maximising “engagement” as a profit model. A successful service or piece of content is one that generates many clicks, drives further traffic, or generates new paying users. But what if we optimised for human well-being and values instead? 

This project, led by the BBC, seeks to try out a more human-centric focused approach to measuring audience engagement by putting human values at its core. It will do so by putting into practice longer-standing research work on mapping the kinds of values and needs their users care about the most, and developing new design frameworks that would make it easier to actually track these kinds of alternative metrics in a transparent way. 

The project will run a number of design workshops and share its findings through a dedicated website and other outlets to involve the wider community. The learnings and design methodology that will emerge from this work will not just be trialled within the contexts of the project partners, but will also be easily replicable by others interested in taking a more value-led approach. 

Responsible data sharing for emergencies: citizens in control

This project will be run by the Dutch National Police, in partnership with the Dutch Emergency Services Control, the Amsterdam Safety Region and the City of Amsterdam.

In a data economy that is growing ever more complex, giving meaningful consent about what happens to our personal data remains one of the biggest unsolved puzzles. But new online identity models have shown to be a potentially very promising solution, empowering users to share only that information that they want to share with third parties, and sharing that data on their own terms. One way that would allow such a new approach to identity and data sharing to scale would be to bring in government and other trusted institutions to build their own services using these principles. That is exactly what this project seeks to do.  

The project has already laid out all the building blocks of their Data Trust Infrastructure but wants to take it one step further by actually putting this new framework into practice. The project brings together a consortium of Dutch institutional partners to experiment with one first use case, namely the sharing of vital personal data with emergency services in the case of, for example, a fire. The project will not just generate learnings about this specific trial, but will also contribute to the further finetuning of the design of the wider Data Trust Infrastructure, scope further use cases (of which there are many!), and bring on board more interested parties.


Policy in Practice Fund: An internet optimised for human values

Lianne Kerlin from the BBC answers a few of our eager questions to give us some insight into the project and what it will achieve.

We’re introducing each of our four Policy-in-Practice Fund projects with an introductory blog post. Below, Lianne Kerlin from the BBC answers a few of our burning questions to give us some insight into the project and what it will achieve. We’re really excited to be working with four groups of incredible innovators and you’ll be hearing a lot more about the projects as they progress. 

At the BBC, we believe that embedding human values into the heart of design practice is fundamental to building a more inclusive, democratic, resilient, trustworthy and sustainable future internet. We are pleased to have received a grant from the NGI Forward’s Policy-in-Practice fund to integrate our innovative work on human values with existing design frameworks so that it can be used by a wider range of practitioners.

What are human values and how do they relate to technology?

The Human Values Framework is based on the needs of users in today’s technology-driven world. It is the result of a research project that examined the link between people’s values, behaviours and needs through a series of workshops, interviews and surveys.

Our work found fourteen indicators of well-being that express fundamental needs. We have constructed a design framework that puts these needs at the centre of innovation and decision making so that products and services can support people in their lives. Values are judgements about what people deem to be necessary, but also represent underlying needs and motivations that drive and shape everyday behaviour, and include elements such as achieving goals, being inspired, pursuing pleasure, and being safe and well.

Some of the human values identified

So what’s the problem with our current approach to tech? 

As well as offering guidance to designers, the framework addresses a fundamental issue with our current approach to measuring the effectiveness of products and services, which is that they are largely concerned with attention metrics such as the number of users or the number of minutes consumed. As a result, any deeper questions of the impact on audience well-being or happiness are not just left unanswered – they are unaskable. 

This approach has serious implications across the online sphere. It means companies compete solely for consumer attention, creating pressure within organisations to increase consumption and adopt attention-grabbing designs that can lead to addictive user behaviour.

How do you see this approach changing in the future, if we get things right?

The framework offers an alternative perspective, one that asks designers to consider the impact of their product on the end-user. In re-framing success, decision-makers can move away from an end goal of consumption into thinking about their intended impact on the people behind the numbers. Using the framework they can consider how to help people live more fulfilled lives, rather than simply gaining their attention.

The framework also recognises the limitations of the current measurement approach and reframes success as the fulfilment of audience values. The framework is about considering what is fundamentally good for people and designing and measuring how they can enable people to explore, to grow or to understand themselves. We believe that having an alternative way to describe success will result in healthy and more sustainable practices.

What do you hope to learn from this project, and how might those learnings be used by others?

Our goal with this project is to take the insights we have developed into design practice and integrate them into existing approaches, specifically the well-known ‘double diamond’ process model, first outlined by the UK Design Council in 2005 and current work connecting user-centered design and agile development. We hope to make measuring the impact on quality of life and wellness part of the normal design cycle for every organisation.

This collaboration is an exciting opportunity to explore how the human values framework can integrate within existing frameworks and practices in all types of industries. We will work with industry experts to learn as many current processes of decision making in order to understand where the human values framework can best align. Our goal is to produce a set of tools, processes and best practice guidelines for embedding the human values framework into existing frameworks.

How can people get involved and find out more?

We will be posting regular updates on our website at which will launch early in 2021 – it currently points to our main page on the BBC R&D website where you can find out about the human values framework. You can also listen to our podcast series.


Policy in Practice Fund: Data sharing in emergencies

Manon den Dunnen from the Dutch National Police answers a few of our burning questions to give us some insight into the project and what it hopes to achieve.

We’re introducing each of our four Policy-in-Practice Fund projects with an introductory blog post. Below, Manon den Dunnen from the Dutch National Police answers a few of our burning questions to give us some insight into the project and what it hopes to achieve. We’re really excited to be working with four groups of incredible innovators and you’ll be hearing a lot more about the projects as they progress. 

Your project is exploring how we share information with public bodies. What is the problem with the way we do it now?

Europe has made significant progress on protecting our privacy, including through the General Data Protection Regulation (GDPR). However, there remain several problems with the collection and use of personal data. Information about us is often collected without our consent or with disguised consent, causing citizens and other data owners to lose control over their personal data and how it is used. This contributes to profiling (discrimination, exclusion) and cybercrime. At the same time, it is laborious and complex for citizens and public institutions to obtain personal data in a transparent and responsible way.

That is why a collective of (semi-) public institutions has been working towards an independent, public trust infrastructure that allows the conditional sharing of data. The Digital Trust Infrastructure or DTI aims to safeguard our public values by unravelling the process of data sharing into small steps and empowering data subjects to take control of each step at all times.

Rather than focusing on information sharing with public bodies, our project will explore public bodies taking responsibility for creating processes that help to safeguard constitutional values in a digitised society.

What kinds of risks are there for handling personal information? 

Preventing these problems requires organisations to work according to several principles, most of which can be found in the GDPR. Let’s take cybercrime as a potential threat. The risk of your data being exposed to cybercrime increases as your data is stored in more places. Organisations must therefore practice data minimisation, but there are other approaches available. For example, structures that allow citizens to give conditional access to information about them, rather than having to store the data themselves. This ‘consent arrangement’ is exactly what this project will set out to test.

This idea will be new for many people, so expert support and protection should be provided when setting up a consent arrangement for data sharing. But the potential impact is huge. Take for example someone with an oxygen cylinder at home for medical use. This is not something a citizen would be expected to declare as part of their daily life. But when there is a fire, both that citizen and the fire brigade would like that information to be shared.

That piece of data about the citizen’s use of an oxygen cylinder is the only information needed by the fire brigade. But current systems often share far more information automatically, including the person’s identity.  Citizens should be able to give the fire brigade conditional and temporary access to only the information that an oxygen bottle is present, such as in emergencies like a fire.

How can public bodies do this safely and with the trust of citizens? What kind of role do you think public bodies can play in increasing trust in data-sharing in the digital economy more broadly?

A data-driven approach to social and safety issues can truly improve the quality of life, facilitate safety and innovation. But we must set an example in the way we approach this. At each moment, we must carefully consider which specific piece of information is needed by a specific person, in what specific context and moment of time, for what specific purpose and for how long. 

By investing in this early on and involving citizens in a democratic and inclusive way, we can not only increase trust but also use the results to demand partners to do the same and create the new normal. 

You are working on a specific case study with the Dutch Police and other partners. Can you tell us a bit more about that experiment, and how you think this model could be used in other contexts too?

During the next few months, we will create a first scalable demonstrator of the Digital Trust Infrastructure. It will be based on the use-case of sharing home-related data in the context of an incident such as a fire. The generic building blocks that we create will be fundamental to all forms of data sharing, like identification, authentication, attribution, permissions, logging etc. They will also be open-source and usable in all contexts. We have a big list of things to work on!

But that is only part of the story. Complementary to the infrastructure, an important part of the project focuses on drawing up a consent arrangement. This will allow residents to conditionally share information about their specific circumstances and characteristics with specific emergency services in a safe, trusted way. To put people in control of every small step and ensure the consent arrangement will be based on equality, we will organise the necessary expertise to understand every aspect. The consequences of our actions have to be transparent, taking into account groups with various abilities, ages and backgrounds.

We will also explore how, and to what extent, the conditions and safeguards can be implemented in a consent arrangement and embedded in the underlying trust infrastructure in a democratic and resilient way. We will also look at how a sustainable and trustworthy governance structure can be set up for such a consent arrangement. We will share all our findings on these areas.

How can people get involved and find out more?

We are currently collecting experiences from other projects on how to engage residents in an inclusive, democratic and empowered (conscious) way. All the knowledge that we are building up in this area will be shared on the website of our partner Amsterdam University of Applied Sciences (HvA). Naturally, we would value hearing about the experiences of others in this area, so please do get in touch.


Workshop report: Follow us OFF Facebook – decent alternatives for interacting with citizens

The NGI Policy Summit hosted a series of policy-in-practice workshops, and below is a report of the session held by

The NGI Policy Summit hosted a series of policy-in-practice workshops, and below is a report of the session held by

Despite the incessant outcry over social media giants’ disrespect of privacy and unaccountable influence on society, any public sector organisation wanting to reach citizens feels forced to be present on their enormous platforms. But through its presence, an organisation legitimises these platforms’ practices, treats them like public utilities, subjects its content to their opaque filters and ranking, and compels citizens to be on them too — thus further strengthening their dominance. How could we avoid the dilemma of either reaching or respecting citizens?

Redecentralize organised a workshop to address this question. The workshop explored the alternative of decentralised social media, in particular Mastodon, which lets users choose whichever providers and apps they prefer because these can all interoperate via standardised protocols like ActivityPub; the result is a diverse, vendor-neutral, open network (dubbed the Fediverse), analogous to e-mail and the world wide web.

Leading by example in this field is the state ministry of Baden-Württemberg, possibly the first government with an official Mastodon presence. Their head of online communications Jana Höffner told the audience about their motivation and experience. Subsequently, the topic was put in a broader perspective by Marcel Kolaja, Member and Vice-President of the European Parliament (and also on Mastodon). He explained how legislation could require the dominant ‘gatekeeper’ platforms to be interoperable too and emphasised the role of political institutions in ensuring that citizens are not forced to agree to particular terms of service in order to participate in public discussion.


Workshop report: (Dis)connected future – an immersive simulation

As part of the Summit, Nesta Italia and Impactscool hosted a futures workshop exploring the key design choices for the future internet.

The NGI Policy Summit hosted a series of policy-in-practice workshops, and below is a report of the session held by Nesta Italia and Impactscool, written by Giacomo Mariotti and Cristina Pozzi.

The NGI Policy Summit was a great opportunity for policymakers, innovators and researchers to come together to start laying out a European vision for the future internet and elaborate the policy interventions and technical solutions that can help get us there.

As part of the Summit, Nesta Italia and Impactscool hosted a futures workshop exploring the key design choices for the future internet. It was a participative and thought-provoking session. Here we take a look at how it went.

Our aims

The discussion about the internet of the future is very complex and it touches on many challenges that our societies are facing today. Topics like Data sovereignty, Safety, Privacy, Sustainability, Fairness, just to name a few, as well as the implications of new technologies such as AI and Blockchain, and areas of concern around them, such as Ethics and Accessibility.

In order to define and build the next generation internet, we need to make a series of design choices guided by the European values we want our internet to radiate. However, moving from principles to implementation is really hard. In fact, we face the added complexity coming from the interaction between all these areas and the trade-offs that design choices force us to make.

Our workshop’s goal was to bring to life some of the difficult decisions and trade-offs we need to consider when we design the internet of the future, in order to help us reflect on the implications and interaction of the choices we make today.

How we did it

The workshop was an immersive simulation about the future in which we asked the participants to make some key choices about the design of the future internet and then deep dived into possible future scenarios emerging from these choices. 

The idea is that it is impossible to know exactly what the future holds, but we can explore different models and be open to many different possibilities, which can help us navigate the future and make more responsible and robust choices today.

In practice, we presented the participants with the following 4 challenges in the form of binary dilemmas and asked them to vote for their preferred choice with a poll:

  1. Data privacy: protection of personal data vs data sharing for the greater good
  2. Algorithms: efficiency vs ethics
  3. Systems: centralisation vs decentralisation
  4. Information: content moderation vs absolute freedom

For each of the 16 combinations of binary choices we prepared a short description of a possible future scenario, which considered the interactions between the four design areas and aimed at encouraging reflection and discussion.

Based on the majority votes we then presented the corresponding future scenario and discussed it with the participants, highlighting the interactions between the choices and exploring how things might have panned out had we chosen a different path.

What emerged

Individual-centric Internet

Data privacyProtection of personal data
Data sharing for the greater good
InformationContent moderation
Absolute freedom

The table above summarises the choices made by the participants during the workshop, which led to the following scenario.

Individual-centric Internet

Decentralized and distributed points of access to the internet make it easier for individuals to manage their data and the information they are willing to share online. 

Everything that is shared is protected and can be used only following strict ethical principles. People can communicate without relying on big companies that collect data for profit. Information is totally free and everyone can share anything online with no filters.

Not so one-sided

Interesting perspectives emerged when we asked contrarian opinions on the more one-sided questions, which demonstrated how middle-ground and context-aware solutions are required in most cases when dealing with complex topics as those analysed.

We discussed how certain non-privacy-sensitive data can genuinely contribute to the benefit of society, with minimum concern on the side of the individual if they are shared in anonymised form. Two examples that emerged from the discussion were transport management and research.
In discussing the (de)centralisation debate, we discussed how decentralisation could result in a diffusion of responsibility and lack of accountability. “If everyone’s responsible, nobody is responsible”. We mentioned how this risk could be mitigated thanks to tools like Public-Private-People collaboration and data cooperatives, combined with clear institutional responsibility.

Picture of Toomas Hendrik Ilves

NGI Policy Summit: Former Estonian President Toomas Hendrik Ilves interview

As president of Estonia from 2006 to 2016, Toomas Hendrik Ilves pushed for digital transformation, ultimately leading Forbes to label him “the architect of the most digitally savvy country on earth”. Every day, e-Estonia allows citizens to interact with the state via the internet. Here, Ilves discusses why other governments might be slower with such […]

As president of Estonia from 2006 to 2016, Toomas Hendrik Ilves pushed for digital transformation, ultimately leading Forbes to label him “the architect of the most digitally savvy country on earth”. Every day, e-Estonia allows citizens to interact with the state via the internet. Here, Ilves discusses why other governments might be slower with such developments, and ponders how things can improve further in the future.

Toomas Hendrik Ilves is one of the speakers of our upcoming NGI Policy Summit, which will take place online on September 28 and 29 2020. Sign up here, if you would like to join us.

This interview originally appeared as part of the NGI Forward’s Finding CTRL collection.

Estonia had a rapid ascent to becoming a leading digital country, how did you push for this as a diplomat in the 90s?

Estonia became independent in ’91, and everyone was trying to figure out what we should do – we were in terrible shape economically and completely in disaster. Different people had different ideas. My thinking was basically that no matter what, we would always be behind.

In ’93, Mosaic came out, which I immediately got. You had to buy it at the time. I looked at this, and it just struck me that, ‘Wow, this is something where we could start out on a level playing field, no worse off than anyone else’.

For that, we had to get a population that really is interested in this stuff, so I came up with this idea – which later carried the name of Tiger’s Leap – which was to computerise all the schools, get computers in all the schools and connect them up. It met with huge opposition, but the government finally agreed to it. By 1998, all Estonian schools were online.

How did things progress from there, and what was the early public reaction like?

We had a lot of support from NGOs. People thought it was a cool idea, and the banks also thought it was a good idea, because they really supported the idea of digitization. By the end of the 90s, it became clear that this was something that Estonia was ahead of the curve on.

But, in fact, in order to do something, you really needed to have a much more robust system. That was when a bunch of smart people came up with the idea of a strong digital identity in the form of a chip card,2 and also developed the architecture for connecting everything up, because we were still too poor to have one big data centre to handle everything. That led to what we call X-Road, which connects everything to everybody, but always through an authentication of your identity, which is what gives the system its very strong security.

It was a long process. I would be lying to say that it was extremely popular in the beginning, but over time, many people got used to it.

I should add that Tiger’s Leap was not always popular. The teachers union had a weekly newspaper, and for about a year, no issue would seem to appear without some op ed attacking me.

Estonia’s e-Residency programme allows non-Estonians access to Estonian services via an e-resident smart card. Do you think citizenship should be less defined by geographical boundaries?

Certain things are clearly tied to your nation, anything that involves political rights, or say, social services – if you’re a taxpayer or a citizen, you get those.

But on the other hand, there are many things associated with your geographical location that in fact have very little to do with citizenship. In the old days, you would bank with your local bank, you didn’t have provisions for opening an account from elsewhere because the world was not globalised. And it was the same thing with establishing companies.

So if you think about those things you can’t do, well, why not? We don’t call it citizenship, you don’t get any citizen rights, but why couldn’t you open a bank account in my country if you want to? If we know who you are, and you get a visual identity, you can open a company.

Most recently, we’ve been getting all kinds of interest from people in the UK. Because if you’re a big company in the UK, it’s not a problem to make yourself also resident in Belgium, Germany, France. If you’re a small company, it’s pretty hard. I mean, they’re not going to set up a brick and mortar office. Those are the kind of people who’ve been very interested in setting up or establishing themselves as businesses within the European Union, which, in the case of Estonia, they can do without physically being there.

What do you think Europe and the rest of the world can learn from Estonia?

There are services that are far better when they’re digital which right now are almost exclusively nationally-based. We have digital prescriptions – wonderful things where you just write an email to your doctor and the doctor will put the prescription into the system and you can go to any pharmacy and pick it up.

This would be something that would be popular that would work across the EU. Everywhere I go, I get sick. My doctor, he puts in a prescription. If I’m in Valencia, Spain, he puts it into the system, which then also operates in Spain.

The next step would be for medical records. Extend the same system: you identify yourself, authorise the doctors to look at your records, and they would already be translated. I would like to see these kinds of services being extended across Europe. Right now, the only cross-border service of this type that works is between Estonia and Finland. It doesn’t even work between Estonia and Latvia, our southern neighbour. So I think it’ll be a while, but it’s a political decision. Technologically, it could work within months. The Finns have adopted our X-road architecture especially easily. It’s completely compatible; we just give it away, it’s non-proprietary open source software.

The technical part is actually very easy, the analogue part of things is very difficult, because they have all these political decisions.

What would your positive vision for the future of the internet look like?

Right now I’m in the middle of Silicon Valley, in Palo Alto, and within a ten mile radius of where I sit are the headquarters of Tesla, Apple, Google, Facebook, Palantir – not to mention all kinds of other companies – producing all kinds of wonderful things, really wonderful things that not only my parents or my grandparents could never even dream of, but even I couldn’t dream of 25 years ago. But at the same time, when I look at the level of services for ordinary people – citizens – then the US is immensely behind countries like Estonia.

The fundamental problem of the internet is summed up in a 1993 New Yorker cartoon, where there’s a picture of two dogs at a computer, and one dog says to the other, “On the internet no-one knows you’re a dog”. This is the fundamental problem of identity that needs to be addressed. It has been addressed by my country.

Unless you have services for people that are on the internet, the internet’s full potential will be lost and not used.

What do you think prevents other nations pursuing this idea of digital identity?

It requires political will. The old model and the one that continues to be used, even in government services in places like the United States, is basically “email address plus password”. Unfortunately, that one-factor identification system is not based on anything very serious.

Governments have to understand that they need to deal with issues such as identity. Unless you do that, you will be open to all these hacks, all of these various problems. I think I read somewhere that in the Democratic National Committee servers, that in 2015 and 2016, they had 126 people who had access to the servers. Of those 126 people, 124 used two-factor authentication. Two didn’t. Guess how the Russians got in.

What we’re running up against today is that people who are lawmakers and politicians don’t understand how technology works, and then people have very new technology that we don’t quite understand the ramifications and implications of. What we really need is for people who are making policy to understand far better, and the people who are doing technology maybe should think more about the implications of what they do, and perhaps read up a little bit on ethics.

On balance, do you personally feel the web and the internet has had a positive or negative influence on society?

By and large, positive, though we are beginning to see the negative effects of social media.

Clearly, the web is what has enabled my country to make huge leaps in all kinds of areas, not least of which is transparency, low levels of corruption, so forth.

I would say we entered the digital era in about 2007, when we saw the combination of the ubiquity of portable devices and the smartphones, combined with social media. This led to a wholly different view of the threat of information exchange. And that is when things, I’d say, started getting kind of out of hand.

I think the invention of the web by Tim Berners-Lee in 1989 is probably the most transformative thing to happen since 1452, when Gutenberg invented movable type. Movable type enabled mass book production, followed by mass literacy. That was all good.

But you can also say that the Thirty Years’ War, which was the bloodiest conflict, in terms of proportion of people killed, that Europe has ever had, also came from this huge development of mass literacy. Because it allowed for the popularisation of ideology. Since then, we’ve seen all other kinds of cases; each technology brings with it secondary and tertiary effects.

We don’t quite know yet what the effects are for democracy, but we can sort of hazard a guess. We’re going to have to look at how democracy would survive in this era, in the digital era where we love having a smartphone and reading Facebook.


Making sense of the COVID-19 information maze with text-mining

The COVID-19 pandemic has brought with it an ‘infodemic’, flooding society with myriads of conflicting ideas and opinions. To help cut through the noise, we applied some of our data tools to map recent developments and understand how technology is being used and discussed during the crisis.

Register to attend our webinar to discuss this research, Wednesday 3rd June 2020 at 5 PM CEST

We want these insights to be as useful as possible and are keen to adapt and analyse the data in different ways to answer your burning questions. We invite you to join us in a webinar to discuss our methods and results, and exchange ideas about the most pressing tech challenges.

You can also view our full analysis at

Trending terms in news articles from our analysis.

The COVID-19 pandemic has brought with it an ‘infodemic’, flooding society with myriads of conflicting ideas and opinions. To help cut through the noise, we applied some of our data tools to map recent developments and understand how technology is being used and discussed during the crisis.

As part of NGI Forward’s work to create data-driven insights on social and regulatory challenges related to emerging technologies, we have developed various data-science tools to analyse trends in the evolution of Internet technology. In our previous studies, we focused on such areas as the content crisis in social media, regulating tech giants or cybersecurity.

Now we have opened our toolbox and mapped recent developments in the fight against COVID-19, to bring some clarity to how the crisis is evolving. We concentrated on four major areas:

  • Online tech news
  • Open-source projects at Github
  • Discussions on Reddit
  • Scientific papers

Mixed feelings on COVID tech in the news

First, we examined trends in 11 respected online news sources, such as the Guardian, Reuters or Politico. Based on the changes in the frequency of terms, we identified trending keywords related to COVID-19 and the world of technology. This enabled us to focus on key issues such as contact-tracing, unemployment or misinformation in the following sections of the analysis.

Next, we analysed terms that are frequently used together, or co-occurring, (e.g. “contact-tracing” and “central server”) to see how technology was associated with different aspects of the crisis.  We also measured the sentiment of the paragraphs containing these word pairs to understand whether coverage of COVID technology issues is positive, negative or neutral. As an example, we identified the key actors, initiatives and challenges related to contact-tracing, focusing on EU-wide projects such as PEPP-PT. 

The table below shows terms co-occurring with ‘contact-tracing’, ranked based on sentiment scores. DP-3T and TraceTogether are more associated with positive sentiments, while discussion of privacy and mission creep show that there are  concerns about the implementation of these systems.

Mapping the COVID tech ecosystem

Alongside this specific analysis, we have also mapped articles based on their vocabulary and topic. You can explore the main areas of technology news with characteristic words in these interactive visualisations.

The map below shows the clusters of news articles covering specific technologies and tech companies.

Throughout the crisis, numerous programmers have devoted their time to developing open-source tools to support the fight against COVID-19. We collected COVID-19-focused projects from Github, the software platform where much of this development is taking place, to examine various trends about location, aim and technology. You can find an overview of the top 50 most influential repositories on our analysis page. Perhaps you will be inspired to get involved!

The map below shows the number of Github projects related to COVID-19 in the week commencing 20th April 2020.

Tracking changes in social media

Looking next to social media, we examined activity on Reddit to uncover relevant changes. By analysing the text of posts and comments, we discovered a surge in discussions related to the job market, mental health and remote work. Our analysis also provides insight into the changing perception of lockdown measures and growing lockdown fatigue.

The graph below shows a sharp increase in Reddit discussions about unemployment in the latter half of March 2020.

Social science counts the consequences

Finally, we also examined trends in scientific journal articles related to COVID-19. Analysing articles from the social sciences gives us a broader picture than news articles, and we found increasing discussion of the immediate consequences of the pandemic and lockdown. The trending words range from health-related (pneumonia, infectious, epidemiology) ones to more common for social sciences: economic recession, policy or GDP. 

The word cloud below shows some of the most common terms in social science articles relating to COVID-19.


Greening the internet, remotely

Team NGI Forward has got off to a cracking start in the new world of online conferences, with our debut session at IAM Weekend. Run by a collective of artists, technologists and activists, IAM brought together hundreds of attendees from across the globe to discuss The Weirdness of Interdependencies. We had a great time joining […]

Team NGI Forward has got off to a cracking start in the new world of online conferences, with our debut session at IAM Weekend. Run by a collective of artists, technologists and activists, IAM brought together hundreds of attendees from across the globe to discuss The Weirdness of Interdependencies. We had a great time joining in and stirring up discussion about how we can make the internet more sustainable and climate-friendly. Here’s how it went.

Our aims

We’re experiencing an unprecedented and likely unsustainable proliferation of connected devices, data and traffic. Do we risk sleepwalking into a future of internet curfews and Netflix-shaming? Or can good policy and ‘smart everything’ save the day? We wanted to create an immersive workshop about these strange futures and difficult choices.

How we did it

Let’s start with some notes on the format of our workshop since everyone is looking for ideas on running online events these days. We used video conferencing and a shared slide deck so we could update it real-time with attendee contributions. For the interactive section, we used breakout rooms to split into two groups of around 15. It worked well, and as you’ll see below, we collected a ton of ideas from our willing participants.

The internet’s environmental impact

We started with a discussion of what the internet is, and where it begins and ends. The boundaries are blurry, and we have embedded the internet in so many aspects of human life that its impact is widespread and in many areas difficult to define. Powering this global network requires vast quantities of electricity, estimated at between 5-9% of the world’s total generation capacity. Despite current efforts to move to renewable sources, most of the electricity powering the internet comes from burning coal and gas, which means that our internet use is creating 2% of global greenhouse gas emissions, roughly equal to the entire global airline industry.

With the advent of IoT and 5G, the sustainability of our digital lives will become a pressing issue in the years to come. Yet, we are doing relatively little to build social awareness, spur positive cultural change or capture the market opportunities associated with innovating for a more sustainable, resilient and ethically-sourced internet.

In the rudimentary map below, you can see the different groups of people and activities that contribute to the internet’s environmental impact. From users streaming video and the data centres that serve them, to the devices they use and the mining required to produce them, everything has an impact.

And here is a set of facts about each of these areas that hits home:

Two possible ways forward

Any policy change can have unintended consequences, so we decided to present two potential approaches to reducing the internet’s climate impact. We wanted to explore what the world would look like if politicians, businesses and citizens had to make more conscious decisions about connectivity, our data and the devices we use. We wanted to challenge participants to consider tough trade-offs, predict winners and losers, and think through unexpected consequences that could range from YouTube rationing and internet class systems to urban mining booms and net-zero stock bubbles.

To get us started, Markus sketched out two equally challenging ways forward. We decided we wanted to avoid creating a dichotomy between decisive climate action and calamitous inaction. Instead, we created two imaginary scenarios that involved work on a broad scale, just with a different approach. In reality, the plan would likely include elements of both, but taking an extreme makes a futures exercise more interesting!

At this point, we split into two groups and used a futures wheel to plot out the potential consequences of each scenario. Here’s the template we used.

Scenario one: Fast Forward

In this imagined future, we’re going to accelerate our technological efforts towards solving climate change. There’d be an explosion of funding for research and development. We’d create apps, services and devices to help us monitor, analyse and act on the information we gather about the climate. We’d see all of the following actions:

💙Ramp up investment in green innovation and smart cities to mitigate climate change
💙Drive forward the digitisation of public services and make infrastructure ‘smarter’ to improve energy efficiency
💙Promote more decentralised data infrastructure
💙Encourage the proliferation of connected devices and data to better inform decision-making and policy

The good

This group predicted that new jobs, income streams and circular economy systems would emerge alongside greater access to the arts and the addition of new perspectives to global society. Greater efficiency in logistics and a greater variety of services available could lead to reduced digital lock-in for consumers, and digital tools could give us greater control over our lives.

The scenario would enable climate movements to rapidly scale, connecting people to local grassroots campaigns and improving coordination. We could see loneliness drop as we become more connected, with technology assisting ageing populations with healthcare and companionship. 

We’d invest more in technology to reduce food waste, which could help to resolve food insecurity. The price of healthcare could drop due to automation, and we could even see the beginning of a Universal Basic Income

The bad

On the negative side, our participants feared that this approach could exacerbate some already familiar problems. Greater reliance on technology could further challenge our right to privacy and worsen the digital divide. Remote areas and older people could also be left behind, which might make loneliness worse for marginalised communities.

We’ve seen in the past how a small minority of climate deniers can derail global efforts. Fake news could spread through the population like wildfire, making it challenging to verify the reliability of data. In this scenario, more people could gain access to the internet, and more conflicting views online could make it difficult to come to democratic decisions about what to do. 

Policymakers and the justice system could also be too slow to keep up with the pace of innovation we’d see in a world focused on accelerating research and development. A delayed reaction here could result in failing to protect workers that lose their jobs to automation. That could include farmers, challenged by an increase in mechanised processing and even genetically modified foods

Financial investment could be directed primarily towards large incumbent technology companies, crowding out small businesses in the online marketplace. Companies would also need to focus on long term investment in adapting, rather than short term gain.

Accelerating our adoption of technology could cause huge piles of electronic waste, with the toxic processing of rare earth metals and pollution rising interminably. 

It’s a complicated picture!

Scenario two: Press Pause

In this future, we’ve decided to hit the brakes. We’d slow down emissions by reducing our use of technology. Our focus could move back to spending time in nature. There might be campaigns to shame people that stream content ‘excessively’. We’d likely see the following:

💚Increase taxes on – and remove subsidies for – fossil energy and ‘dirty’ technology
💚Redesign public services to be more energy-efficient and less interdependent or reliant on the internet
💚Consolidate data centres, regulate energy use and traffic
💚Introduce a right-to-repair, discourage further proliferation of devices and encourage data minimisation

The good

This group felt that pressing pause would make things fairer by increasing tax on companies rather than individuals, forcing change at the core of the internet. Achieving this pause would require global collaboration on an unprecedented scale, presenting both a challenge and an opportunity to solidify global ties and collaborate.

With a more conscious approach to connectivity or even reduced internet access, we’d choose more carefully what to post online. The taxes collected in this scenario could fund all manner of social interventions, and we’d likely see a reduction in overall pollution. People could become more connected to their local communities, and we could bring marginalised groups into discussions on an equal level, especially those whom technology currently excludes because of their age, background or lack of infrastructure.

The bad

However, the consolidation of data centres could also lead to more centralised control over the internet, and it’s not clear whether this would be good for the environment in the long term. We could also see the emergence of a two-tier internet, where traffic for either vital services or wealthier groups is prioritised. 

Moreover, a culture of click-shaming could develop, forcing people to reduce or even hide their internet use. Is this how we want our approach to the internet to change?

Trade-offs along the way

Together we identified a set of trade-offs when considering greening the internet.

Green tech vs ethical supply chains: Green innovation is still heavily dependent on problematic battery technology and critical raw materials, such as rare earth minerals. These are often sourced through unsustainable means and under poor working conditions.

Saving the climate vs economic stability: Our economy and innovation ecosystems are heavily dependent on high levels of consumption. If we stop buying new devices every year, jobs may be lost and not necessarily replaced. R&D investment will have to come from new sources. 

Reducing consumption vs fair access: The global north and specific demographics benefit disproportionately from internet access. But there are still 3 billion people without a connection. Providing the same opportunities and fair access to everyone would have a tremendous impact on the environment and more environmentally friendly devices would likely be prohibitively expensive for many.

Public interest vs consumer benefit: Consumers are voters. Policies that hurt them tend to be unpopular. If we encourage companies to sell phones without chargers, will they just make greater profits to the detriment of lower-income families? Are we willing to stick with old phones for longer or pay a premium for repairable devices?

We don’t have the tech or the equality

The groups also questioned how long it would take to develop the technologies required. We decided that we will need serious investment in technology and corresponding public policy in either scenario, but that neither was particularly appealing overall. Large companies will likely be better prepared to adapt, but this depends on their sector. The geographical and political context of companies and users is another barrier to enacting these changes.

Both groups reflected next on who would benefit from these scenarios, and the answers were similar. Some participants expressed pessimism that either scenario would do much to shake up the power imbalance in the digital economy. We could easily empower those currently in marginalised groups to connect and benefit in either scenario. However, connecting them would require coordinated, concerted effort from those in power. Without this effort, participants felt that current inequalities would be exacerbated, continuing the exclusion experienced by people with lower incomes, the socially isolated, people with medical conditions and older people.

Thank you, IAM!

We had a fantastic time discussing these issues with our lovely attendees, and we’ll be contacting them, so we stay in touch. The rest of IAM Weekend was both insightful and great fun; we’d highly recommend it.

If you’re interested in these topics, join our newsletter and get in touch.


Predicting future trends from past novelty: Trend estimation on social media

Significant developments of internet technologies are being made across a wide range of fields – breakthroughs that will undoubtedly have a profound impact on society and be highly disruptive in nature. Given the importance of these technological developments, insights into emerging technology trends can be of great value for governments and policy-makers to be able […]

Significant developments of internet technologies are being made across a wide range of fields – breakthroughs that will undoubtedly have a profound impact on society and be highly disruptive in nature. Given the importance of these technological developments, insights into emerging technology trends can be of great value for governments and policy-makers to be able to response to the complex dilemmas that emerge around new technologies.

Today, trends can emerge from many different platforms. For instance, the growth of social media enables citizens to generate and share information in unprecedented ways, and thus sociocultural trends from social media have become an important part of knowledge discovery. Social media can provide policy makers with insights about which internet-related technologies and social issues are frequently discussed, and which trends are emerging in the discussions. Such insights can help identify new technologies and issues and create proper policy or regulatory responses.


A new approach to social media trend

Accurate trend estimation on social media is however a matter of debate in the research community. Standard approaches often suffer from several methodological issues by focusing solely on spiky behavior and thus equating trend detection with that of natural catastrophes and epidemics.

To remedy these problematic issues our team at DATALAB at Aarhus University in Denmark developed a new approach to trend estimation that combines domain knowledge of social media with advances in information theory and dynamical systems. In particular, trend reservoirs, i.e. signals that display trend potential, are identified by their relationship between novel and resonant behavior, and their minimal persistence.

The model estimates Novelty as a reliable difference from the past – how much does the content diverge from the past – and Resonance as the degree to which future information conforms to the Novelty – to what degree does the novel content ‘stick’. Using calculations of Novelty and Resonance, trends are then characterized by a strong Novelty-Resonance association and long-range memory in the information stream. Results show that these two ‘signatures’ capture different properties of trend reservoirs, information stickiness and multi-scale correlations respectively, and they both have discriminatory power, i.e. they can actually detect trend reservoirs.

Case study: AI discussions on Reddit

To exemplify the application of the model for trend estimation, it is applied on the social media site Reddit to discover innovative discussions related to artificial intelligence (AI). 

Reddit hosts discussions about text posts and web links across hundreds of topic-based communities called “subreddits” that often target specialized expert audiences on these topics. Topically defined discussions are thus an important part of the appeal of Reddit, unlike the information dissemination focus of Twitter, and the specialized audiences make it a promising source for topical discussions on, for example, internet technology.

The most trending subreddits are discovered using the model on a sample of subreddits with the highest overlap between their descriptions and a seed list of AI-related terms. The top 10 most relevant subreddits in terms of content matching can be found in Table 1 in two categories: Leaders and Prospects. The Leaders are subreddits with a substantial number of posts (more than 2560), while the Prospects are subreddits that can be small but rank the highest on the content matching. 

We can then use the trend estimation model to classify into maximally trending and just trending or not trending. In Table 1, the red subreddits are classified as maximally trending while blue are not. In the Leaders category, subreddits will have to be trending based on both signatures, i.e. having a strong Novelty-Resonance association and displaying long-range memory, to qualify as maximally trending. Prospects can qualify as maximally trending with only one of the two signatures. 

Leaders are relevant because of the many posts of potentially trending content, while Prospects can be used to discover singular new trends. A recommender engine can be trained with this classifier to identify trending subreddits within any given subject. Such classifications can be extremely useful for decision support in terms of which subreddits to follow for a continuum of information on trends in e.g. AI.

After the classification, the content on the most trending subreddits can be explored. To do this, we training a neural embedding model to query the highest-ranking words and their associated words, providing insights into e.g. the contexts in which the technologies are discussed. 

This method of content exploration produces concepts graphs as the one above of words used in the subreddit and their associations. Figure 1 is a concept graph from the trending subreddit r/MachineLearing – we can call the graph TOOL-DIVERSIFICATION. As data science and machine learning are complicated fields, many classes of tools are necessary to develop state-of-the-art deep learning models, and the concept graph shows three clusters of interest: 

  • The upper right corner shows important tools related to “Hardware and Cloud” technologies (NVIDIA, GPU, TPU, AWS, server, Ubuntu, Gluon), which are all characteristics of GPU accelerated high performance computing; 
  • The cluster in the center left of the graph is dominated by the most important deep learning “Software Libraries” in Python (Tensorflow, PyTorch, Keras, Theano) and related languages (JavaScript, Java, Caffe MATLAB); 
  • In the lower part, the graph displays “Classes of Problems” (supervised, unsupervised, reinforcement learning). 

Two further observations can be made. Firstly, “Tutorial” is highly interconnected to all clusters, supporting the fact that tutorials have become one of the primary sources of assimilating the diverse tools in the machine learning community. Secondly, software libraries, packages, and frameworks take a central role in the graph. They all signify bundles of preexisting code that minimize the amount of programming and hardware understanding required by machine learning enthusiasts.

These observations indicate that the subreddit does not consist of solely professional machine learning developers, but rather constitutes a community of machine learning enthusiasts with a do-it-yourself approach to machine learning. 

This is just an example of the content that can be extracted after identifying the most trending subreddits within the topic of investigation with the model for trend estimation. This approach to the estimation of trend reservoirs generalizes to other data sources, such as Twitter, and other data types, such as images. 


Mapping the tech world with text-mining: Part 1.

Introduction As part of the NGI Forward project, DELab UW is supporting the European Commission’s Next Generation Internet initiative with identifying emerging technologies and social issues related to the Internet. Our team has been experimenting with various natural language processing methods to discover trends and hidden patterns in different types of online media. You may find our tools and presentations at […]


As part of the NGI Forward project, DELab UW is supporting the European Commission’s Next Generation Internet initiative with identifying emerging technologies and social issues related to the Internet. Our team has been experimenting with various natural language processing methods to discover trends and hidden patterns in different types of online media. You may find our tools and presentations at

This series of blog posts presents the results of our latest analysis of technology news. We have two main goals:

  1. to discover the most important topics in news discussing emerging technologies and social issues,
  2. to map the relationship between these topics.

Our text mining exercises are based on a technology news data set that consists of 213 000 tech media articles. The data has been collected for a period of 40 months (between 2016-01-01 and 2019-04-30), and includes the plain text of articles. As the figure shows, the publishers are based in the US, the UK, Belgium and Australia. More information on the data set is available at a Zenodo repository.

In this first installment, we focus on a widely used text-mining method: Latent Dirichlet Allocation (LDA). LDA gained its popularity due to its ease of use, flexibility and interpretable results. First, we briefly explain the basics of the algorithm for all non-technical readers. In the second part of the post, we show what LDA can achieve with a sufficiently large data set.


Text data is high-dimensional. In its most basic – but comprehensible to computers – form, it is often represented as a bag-of-words (BOW) matrix, where each row is a document and each column contains a count how often a word occurs in the documents. These matrices are transformable by linear algebra methods to discover the hidden (latent and lower-dimensional) structure in it.

Topic modeling assumes that documents, such as news articles, contain various distinguishable topics. As an example, a news article covering the Cambridge Analytica scandal may contain the following topics: social media, politics and tech regulations, with the following relations: 60% social media, 30% politics and 10% tech regulations. The other assumption is that topics contain characteristic vocabularies, e.g. the social media topic is described by the words Facebook, Twitter etc.

LDA has been proposed by Blei et al. (2003), based on Bayesian statistics. The method’s name provides its key foundations. Latent comes from the assumption that documents contain latent topics that we do not know a priori. Allocation shows that we allocate words to topics, and topics to documents. Dirichlet is a multinomial likelihood distribution: it provides the joint distribution of any number of outcomes. As an example, Dirichlet distribution can describe the occurrences of observed species in a safari (Downey, 2013). In LDA, it describes the distribution of topics in documents, and the distribution of words in topics.

The basic mechanism behind topic modeling methods is simple: assuming that documents can be described by a limited number of topics, we try to recreate our texts from a combination of topics that consist of characteristic words. More precisely, we aim at recreating our BOW word-document matrix with the combination of two matrices: the matrix containing the Dirichlet distribution of topics in documents (topic-document matrix), and the matrix containing the words in topics (word-topic matrix). The construction of the final matrices is achieved by a process called Gibbs sampling. The idea behind Gibbs sampling is to introduce changes into the two matrices word-by-word: change the topic allocation of a selected word in a document, and evaluate if this change improves the decomposition of our document. Repeating the steps of the Gibbs sampling in all documents provides the final matrices that provide the best description of the sample.

For more details on topic modelling, we recommend this and this excellent posts. For the full technical description of this study, head to our full report.


The most important parameter of topic modelling is the number of topics. The main objective is to obtain a satisfactory level of topic separation, i.e. a situation in which topics are neither all issues lumped together nor overly fragmented ones. In order to achieve that, we have experimented with different LDA hyper parameters levels. For settings with 20 topics, the topics were balanced and separable.

Therefore, we identified 20 major topics that are presented in the visualisation below. Each circle represents a topic (the size reflects the topic’s prelevance in the documents), with distances determined by the similarity of vocabularies: topics sharing the same words are closer to each other. In the right panel, the bars represent the individual terms that are characteristic for the currently selected topic on the left. A pair of overlapping bars represent both the corpus-wide frequency of a given term, as well as its topic-specific frequency. We managed to reach gradually decreasing topic sizes: the largest topic has a share of 19%, the 5th 8%, and the 10th 5%.

After studying these most relevant terms, we labeled each topic with the closest umbrella term. Upon closer examination, we have reduced the number of topics to 18 (topics 5 & 16 became the joined category Space tech, while topics 10 & 19 were melded together to form a topic on Online streaming). In the following sections we provide brief descriptions of the identified topics.

AI & robots

AI & robots constitutes the largest topic containing around 19% of all tokens and is characterized by machine learning jargon (e.g. train data) as well as popular ML applications (robots, autonomous cars).

 Social media crisis

Social media topic is similarly prevalent and covers contentious aspects of modern social media platforms (facebooktwitter) as right to privacy, content moderation, user bans or election meddling with the use of microtargeting (i.a.: privacybanelectioncontentremove).

 Business news

A large share of tech articles cover business news, especially on major platforms (uberamazon), services such as cloud computing (aws) or emerging technologies as IoT or blockchain. The topic words also suggest great focus on the financial results of tech companies (revenuebillionsalegrowth).


Topic 4 covers articles about the $522B smartphone market. Two major manufacturers – Samsung and Apple are on the top of the keyword list with equal number of appearances. Articles are focused on the features, parameters and additional services provided by devices (cameradisplayalexa etc.).


Space exploration excitement is common in the tech press. Topic 5 contains reports about NASA, future Mars and Moon mission as well as companies working on space technologies, such as SpaceX.


Topic 6 revolves around Cambridge Analytica privacy scandal and gathers all mentions of this keyword in the corpus. The involvement of Cambridge Analytica in the Leave campaign during the Brexit referendum is of major focus, as suggested by the high position of keywords such as eu and uk. Unsurprisingly, GDPR is also often mentioned in the articles dealing with the aftermath of CA’ controversy.


Topic 7 pertains to cyberspace security issues. It explores subjects of malware and system vulnerabilities targeting both traditional computer systems, as well as novel decentralized technologies based on blockchain.


The much anticipated fifth-generation wireless network has huge potential to transform all areas with an ICT component. Topic 8 deals with global competition over delivering 5G tech to the market (huaweiericsson). It captures also the debate about 5G’s impact on net neutrality. 5G’s main quality is to enable signal ‘segmentation’, causing debate whether it can be treated like previous generations of mobile communications by net neutrality laws.

Cross platforms

The focus of Topic 9 is on operating systems, both mobile (iosandroi), desktop, (windowsmacos) as well as dedicated services (browsers chromemozilla) and app stores (appstore).


Topic 10 revolves around the most important media platforms: streaming and social media. The global video streaming market size was valued at around USD 37B in 2018, music streaming adds another 9B to this number and account for nearly half of the music industry revenue. Particularly, this topic focuses on major streaming platforms (youtubenetflixspotify), social media (facebookinstagramsnapchat), the rising popularity of podcasts and business strategies of streaming services (subscriptionsads).


During its 40 year history, Microsoft has made above 200 acquisitions. Some of them were considered to be successful (e.g. LinkedIn, Skype), while others were less so… (Nokia). Topic 11 collects articles describing Microsoft finished, planned and failed acquisitions in the recent years (githubskypedropboxslack).

Autonomous vehicles

Autonomous transportation is a vital point of public debate. Policy makers should consider whether to apply subsidies or taxes to equalize the public and private costs and benefits of this technology. AV technology offers the possibility of significant benefits to social welfare — saving lives; reducing crashes, congestion, fuel consumption, and pollution; increasing mobility for the disabled; and ultimately improving land use (RAND, 2016). Topic 12 addresses technological shortcomings of the technology (batteries) as well as positive externalities such as lower emissions (epaemissions).


LDA modelling has identified Tesla and other Elon Musk projects as a separate topic. Besides Tesla developments of electric and autonomous vehicles, the topic also includes words related to other mobility solutions (e.g. Lime).

 CPU and other hardware

Topics 14 & 15 are focused on hardware. Topic 14 covers CPU innovation race between Intel and AMD, as well as the Broadcom-Qualcomm acquisition saga, blocked by Donald Trump due to national security concerns. Topic 15 includes news regarding various standards (usb-c), storage devices (ssd) etc.


Topic 17 concentrates on startup ecosystems and crowdsource financing. Articles discuss major startup competitions such as Startup Battlefield or Startup Alley, and crowdfunding services such as Patreon.


We observe a surge in the adoption of wearables, such as fitness trackers, smart watches or augmented and virtual reality headsets. This trend brings important policy questions. On the one hand, wearables offer tremendous potential when it comes to monitoring health. On the other hand, it might be overshadowed with concerns about user privacy and access to personal data. Articles in topic 18 discuss news from the wearable devices world regarding new devices, novel features etc. (fitbitheartrate).


Topic 20 deals with gaming industry. It covers inter alia popular games (pokemon), gaming platforms (nintendo), various game consoles (switch) and game expos (e3).


We provided a bird’s eye view on the technology world with topic modelling. Topic modelling serves as an appropriate basis for exploring broad topics, such as the social media crisis, AI or business news. At this stage, we were able to identify major umbrella topics that ignite the public debate.

In the next post, we will introduce another machine learning method: t-SNE. With the help of this algorithm, we will create a two-dimensional map of the news, where articles covering the same topic will be neighbours. We will also show how t-SNE can be combined with LDA.