Technological developments play an important role in shaping virtually all aspects of life, from work and leisure to democracy and social cohesion. As technologies like AI grow in importance, and impact, even small issues can create huge challenges: from algorithmic biases to the collection of personal data, negative externalities can easily reach a critical scale. In such a face changing environment it can be incredibly difficult to stay up to data and informed. Governments and policy-making in particular find it difficult to manage the lag between technological developments and policy or regulatory responses. New approaches and data driven techniques can help.
To identify important internet related social challenges and emerging technologies, our team have developed various text-mining tools (that you can implement and explore yourselves here). In this blog post, we provide a brief overview of the potential uses of this methodology and examples of the work we’ve done so far as part of the Next Generation Internet (NGI) initiative (more examples can be seen in the Datalab section of the website).
Our text mining analysis is based on a novel dataset of technology news articles. Using web-scraping tools we have collected more than 213 thousand articles. The sources include 14 major English-language technology websites from the US, EU and Australia. The same approach could be applied to different input text data, we have also been experimenting with academic papers and social media sources.
We combined three methods: the analysis of term frequencies, co-occurrences and sentiment analysis.
First, we highlight emerging topics based on the analysis of term frequencies: we identify the terms that have an increasing average frequency over time. Next, we explore the connections between terms, e.g. between emerging social issues and technologies using co-occurrence analysis. The analysis of co-occurrences enables us to find which emerging terms were most often mentioned together in the same article, hence finding most relevant pairs of expressions. In the case of technology news, this is a good way to surface how the technology is being applied, or connections to regulatory issues. In order to track public perception of different issues and identify relevant positive and negative views we also perform sentiment analysis.
Figures 1. and 2. summarise the most trending technology related terms. These figures provide a high level map of the most important technology news from the previous years.
Case study: AI and ML
Figure 3. presents the terms related to such expressions as “facial recognition” or “AI ethics”. The co-occurrences provide rich details on the social issues related to AI and ML algorithms: while AI can lead to new innovations and be helpful in various recent challenges (e.g. tackling fake news), these algorithms often work in a non-transparent way (“black box”), may be prone to biases, or implemented for questionable purposes (“killer robots”).
Algorithms have been at the centre of recent controversies, such as the Cambridge Analytica scandal, or the implementation of facial recognition at the Berlin Südkreuz station. Another example is project Maven, a cooperation between Pentagon and Google to implement AI algorithms for the identification of people on drone footage. The involvement of Google in the military usage of AI stirred intensive debate, including the protest of Google employees. The backlash in the company led to Google’s resignation from further cooperation with the Department of Defence, e.g. in the JEDI project (Joint Enterprise Defense Infrastructure). Therefore, the ethical usage of AI is a key point of public debate. The co-occurrences enable us to identify crucial institutions (Pentagon, Google’s Advanced Technology External Advisory Council), persons (academics Jonathan Zittrain and Joanna Bryson) and companies (Byton – Chinese electric car producer, AI start-ups Doxel, Clarifai etc) as well.
Facial recognition is used as a case study for the sentiment analysis (Figure 4.) The analysis shows the monthly average sentiment of articles on facial recognition on a scale of -1 to 1. At the beginning of the explored time period, the articles on facial recognition were initially rather positive (compound score: 0.22), “voice assistant” and “AI technology” being the most positive connotations), with a significant decline at the end of 2017, possibly due to the increase in events reporting on the questionable usage of the technology.
On the one hand, this technology can be seen as a convenient tool for tagging photos in the social media or authorising mobile payments in a secure way. However, we observe growing privacy concerns around facial recognition applications in the marketing industry and law enforcement.
Table 1. Co-occurrences with most positive and negative sentiments
|Most positive||Most negative|
|voice assistant||border guards|
|ai technology||autonomous weapons|
|ai research||project maven|
|edge computing||big brother|
|ai startup||racial bias|
New interactive book bringing together a collection of essays, interviews, short stories and artworks by more than 30 contributors from 15 different countries and five continents reflecting on the internet’s past and future