Take a journey through the timeline using the red bar at the bottom. Click on the articles or clips for more details and to navigate through the years.
Trace the words through time
This interactive visualisation of 30 years of coverage of HIV by The Nation Media Group is a chance for journalists to explore trends in the media’s treatment of HIV as a political, economic, social and scientific challenge to Kenyans. Through the visualisation, you can answer important questions such as when specific high risk groups were identified, scientific breakthroughs took place and treatment became widespread as well as broader themes such as the role of stigma and education in the HIV conversation. The findings from this exploration can be compared to the research trends explored on this site to find out if more mentions of high risk groups correspond to a decrease in infection rates for these groups or if scientific discoveries changed investments in HIV treatment. This visualisation is meant to help journalists find answers to questions about how HIV has been covered in the past, what it’s impact may have been and also inform how to best cover HIV and other major health issues in the future.
A qualitative analysis of articles published by The Nation in the last three decades reveals that the terms high risk groups, stigma, church, women and cure appear frequently. But the context in which the words are used and the intensity vary over time.
High risk groups
In the 1980s, HIV was seen largely as a disease for gay men but by the mid 1980s prostitutes, bisexual men, haemophiliacs and people who inject drugs were included in this group. But as the decade came to an end it was clear that HIV was also a disease of the heterosexuals.
In the 1990s and 2000s media stories reflected the fact HIV in Africa had acquired a feminine face with high prevalence among women of reproductive age. During this time, scientists focused on the phenomenon of discordant couples. Now the HIV epidemic in Kenya is mixed with characteristics of both a generalised epidemic among the mainstream population, and a concentrated epidemic among specific most-at-risk populations and geographies.
Stigma and discrimination
![]()
In the first decade of HIV reporting there was a lot of misinformation, fear, stigma and discrimination surrounding Aids globally. In the media the fear was portrayed in the terms associated with the condition such as mysterious, fatal, baffling and crippling, disease. As HIV took its toll in the 1990s fear mounted in various sectors of society and calls to quarantine HIV positive people grew. Attitudes also shifted ‒ from rejection and stigma to overt discrimination. In reaction, organisations, activists and celebrities started to campaign for the rights of HIV positive people and to highlight the harmful effects of stigma and discrimination. They lobbied for the creation of laws on Aids and employment policies. As more people got HIV, a section of the Church broke its silence about the virus, more HIV positive people announced their status publicly but others including government leaders, were in denial about HIV and Aids. Discrimination and stigma including self stigma persisted.
In the 2010s articles in The Nation focused on the increased action on the policy front and on how stigma and discrimination hinders access to prevention, treatment and care services. There were also stories on how stigma is no longer a death sentence.
Cure
![]()
In the articles on HIV that the Nation published in the 1980s the term cure was used in the context of explaining that the condition had no cure or vaccine and was fatal. There were also reports of people seeking treatment from herbalist including scientists who claimed to have found the cure. Towards the end of the 1990s stories also focused on why an HIV cure or vaccine remained elusive. But the story turned more hopeful in the late 1990s when access antiretroviral drugs started to be scaled. By the 2000s a lot of attention was being given to the side-effects of anti-retroviral drugs. The stories also focused on the latest scientific developments in relation to treatment and the role of HIV testing in preventing the spread of the virus. Scientifics breakthroughs in the 2010s, near universal access to prevention of mother to child transmission of HIV and improved access to more effective antiretroviral drugs made governments, scientists and activists begin to talk about an HIV free generation but still the pilgrimage to ‘quack’ doctors and herbalists continue.
Church
![]()
The newspaper articles reveal that the church was largely absent from the HIV debate in the 1980s until towards the end of the decade, when the institution started to argue that HIV is a moral and openly shunned HIV positive people.
The Church maintained the same stance on HIV and Aids in the 1990s. Church leaders declared that they would not marry couples who have not gone for HIV tests and they came out strongly in their opposition of sex education and condom use. But as HIV continued to take a toll, by the end of the decade, many churches joined the HIV campaign. But over the years the Catholic Church has maintained its vocal opposition for condom use.
Women
By the end of the 1980s it became apparent that women are at a higher risk of HIV in Kenya and Africa than men because of biological and socioeconomic reasons.
In the 1990s many stories focused on reducing transmission of HIV from mother to child.
The people and places visualisation addresses the same Nation newspaper articles used in the Tableau visualisation series, but with a different approach to semantic analysis. Instead of counting and graphing the colloquial tags, this visualisation pulls the proper nouns: in this case, the people, locations, and organisations mentioned in The Nation's coverage of the HIV pandemic from 1981-2013. As visualised here, proper nouns are grouped by year (in light grey), by article (in dark grey) sub-circles, and categorised and coloured according to the people, place and institution groupings extracted from each article by Chambua.*
The design for the visualisation models after a nested circle-pack structure in D3, and conceptually mimics a petri dish, since the article topic modelling was meant to echo a macro-to-micro exploration of large body of articles on a bio-medical topic of study. To explore the interactive visualisation found here, click on the circles and generate the vocabulary associate with the articles selected. Articles are organised by publication date which displays on click-selection in the key to the right of the circle-pack structure, along with all of the associate vocabulary terms for that article.
From this visualisation, you can see the general increase in article volume (more articles populate years where many articles were published), the explosion of articles in 2004, the paucity of articles and early coverage for the as-yet nameless epidemic in the 1980s. To be clear, this illustrates proper noun mentions (so proper names, places identifiable as cities, locations, on a map, and organisations with a recognised authority or presence in the media). It should not be confused with a representation of HIV research depth; it only illustrates media coverage on HIV.
Still, media coverage remains interesting from both socio-cultural and geo-political perspectives. Whether or not The Nation mentioned foreign or domestic personalities, certain city-level locations or foreign research bodies, is significant to an understanding of how the conversation progressed in its 30 years of media treatment.
* Chambua is an Ushahidi open-source project to semantically tag text, it can be found here: https://github.com/ushahidi/Chambua
Code Repositories Here:
https://github.com/auremoser/hiv-30_cluster/
https://github.com/auremoser/hiv-30_zoom/
Explore how the prevalence of HIV and the uptake of HIV testing changed between 2007 and 2012 by clicking on the tabs below.
2014 marks 30 years since the discovery of HIV, the virus that causes Aids, and the first documented case of HIV in Kenya. This occasion inspired Internews in Kenya to create 30 years of HIV, an interactive digital project that explores the media coverage of the HIV epidemic in Kenya over time and visualises key milestones along the way. The project is intended to help journalists and society reflect on its own evolving understanding of the political, social, economic and human impact of HIV and whether the information provided over three decades has done justice to the complexity of one of the greatest challenges of our time.
The project does this through a 3D timeline, interactive visualisations of 30 years of coverage of HIV by the Nation newspaper, photographic essays of people living with HIV and multimedia pieces that share the experiences of experts who are and have been on the frontline of the epidemic.
A review of data reveals that the country’s HIV epidemic is not as generalised as previously thought. Hyper-epidemics persist in parts of the country and this means that to succeed the HIV response must give special attention to populations at higher risk of HIV. The insights from the data analysis are presented through interactive visualisations and powerful infographics.
You can further explore the trends in the media coverage of HIV by sampling the newspaper articles that were analysed and visualised for this project (give link to document cloud).
To answer your copyright question: yes, you may download and republish the infographics, data and other resources, but remember: attribution makes journalisms credible.
Nation coverage on HIV analysis visualisation with Tableau and Chambua
Internews in Kenya's retrospective on HIV in the Kenyan media considered equally the news content as well as the language associated with that coverage. While our approach to this project studied the general tag trends as seen in the Words Through Time section, we also considered the proper-noun taxonomy associated with 30 years of media coverage on HIV as illustrated in the People and Places section.
The Nation newspaper articles were sourced from the media house’s archives by querying articles tagged HIV and AIDS and from the Internews in Kenya library. The process that took several weeks resulted in 9,419 articles tagged HIV or Aids.
The PDF articles were then converted into word or text documents to enable analysis in Overview Project using one-note or by copy the PDF text and pasting into word. Eventually about 30 per cent of the articles were not analysed because of they were illegible following conversion into text.
Tableau Public Visualisation: Words through time
In order to derive meaning from the text of each of the Nation articles, we began with an online software, the Overview Project, which is an open-source tool originally designed to help journalists find stories in large numbers of documents. Overview Project goes through all the words in a document and identifies trends: words that appear often, words that appear frequently with other words, topics and subtopics. Overview enabled us to analyse the text of all the articles by year and begin to identify trends, both topics and then specific words under these topics, and track these changes over time. It displays findings in a tree structure like this with the size of the box corresponding to the number of articles that contain those words:
Once we ran all the articles for all the years through Overview Project, we had a good idea of the important words and topics in each year. There were some surprises, such as the frequency of women and education, which we had not expected. From this initial analysis we made our own list of words to search for. These fell into three categories.
1. Words: This included all words or groups of words (such as “commercial sex workers”) that we wanted to count in each year.
2. Terms: This allowed us to group words that are basically synonyms. For example, we grouped “commercial sex workers,” “prostitutes” and “female sex workers” into the same term since they are different words to describe the same thing.
3. Categories: This is where subjective analysis came into play as we started to make sense of the terms as they fell into certain topics. For example we included the terms: men who have sex with men, people who inject drugs, sex workers, truck drivers and women into the category high risk groups.
Once we had a list of words, terms and categories, we added a layer of 92 tags in Overview Project. This means, we typed in the specific word and Overview Project counted the number of mentions in the year and tagged it with a different colour. The result of tagging appears like this:
On the right hand side you can see a display of how many times each word tag appears in each article. If you click on one of the articles it will show you where it appears in the text. This makes it easier to quickly review the context the word was used in:
Once we searched for all the tags in all the years, we exported this list by year. This gave us a total count of how many times each of the 92 words were mentioned each year.
Next, we opened these counts in Excel and added a few new columns. We added a column to classify each word by term and each term by category. We also added a column for the year.
In order to show the relative importance of each word, term and category, we did a few calculations:
1. We calculated the percentage of each word under each term. For example, if prostitute were mentioned two times out of a total four words tagged under sex worker, the frequency of prostitute under sex worker would be 50 per cent.
2. We calculated the percentage of each term under each category. For example, if the term sex worker were mentioned four times in high risk groups, which had 16 total mentions for that year, then 25% of content under high risk groups related to sex workers.
3. Finally, to give us a broad overview, we calculated how many words were mentioned under five broad categories: High-risk groups, Prevention, Research, Stigma and Treatment. To do this, we took the total number of mentions of each word under each category and divided it by the total number of words. For example, if there were 50 words categorised under high-risk groups and a total of 500 words, then 10 percent of the content fell under the category high-risk groups.
Our next step was to visualise our findings. We broke down the visualisation into three parts.
1. The timeline gives a broad overview of how the discussion around HIV has evolved by tracking the popularity of categories over time. Therefore, the timeline displays the percentage popularity of each category by year.
2. The treemap displays more specific data: it shows the term frequency under the category you selected on the timeline. The bigger the square, the more frequently that term appeared under that category. Note, the treemap does not allow you to compare terms in different categories. You can only display the terms under one category at a time.
3 The most granular data is the word cloud. The word cloud shows the frequency of each word grouped under each term. The size of the word corresponds to the frequency of the word under the term.
A time slider allows you to visualise how term and word frequency change over time. The colours display the relationships between the categories, terms and words from each part of the visualization.
Access the data behind the visualisation here (access to documents in overview project to be inserted.
Chambua sentiment analysis visualisation: People and places
The proper-noun taxonomy analysis with Chambua tracks the HIV conversation according to the persons, locations (international and domestic), and organisations who were associated with the conversation around HIV as it grew over time.
To do this, Internews pulled and processed articles from The Nation, a Kenyan publication with 30 years of reporting history on the development of the HIV and Aids. We then extracted people, places, and organisations mentioned therein, and clustered these in a visualisation.
Methodology
Analysing the articles involved a multi-step process roughly divided into data collection, data processing, and data visualisation workflows.
Data collection
The data journalism team at Internews in Kenya tirelessly scoured the Daily Nation archives. They then exported these files with default names as .docx files and passed them to developers on the team to handle the post-processing.
In tandem, the team developers brainstormed ways to visualise the vocabulary from these articles.
Data processing
Data processing to extract proper-noun entities was designed to use Chambua, an Ushahidi open-source project that curls text files to extract people, place, and organisation terms and output a json object with these entities.
The project works cleanly for one file, we needed it to work for a batch of files, organised by year. To use Chambua in this way, we converted the files from .docx to .txt, wrapped them in a text object using wrap_json.py, and wrote Python scripts to run chambua over each article in a directory (named by year) and output a comparable people, place, and organisation object on a per-article basis using send_json.py.
Once the terms were extracted, they needed to be cleaned and reformatted to suit a flare structure for a D3 visualisation. For this, we developed a node process to run through the chambua articles in each year's directory, combine them all into one csv file and then clean the terms in Open Refine. Refine's macros were useful in eliminating unicode errors, miss- or similar- spellings and invalid terms. We then output the clean csv and processed it to map to a json file with a D3-appropriate structure.
The documentation for our cleaning process can be found in the following resources:
- Processing Notes - File Cleaning
- Processing Notes - Chambua
- Additional Notes - General Iterations
- Python Scripts
- Node Scripts
Data Visualisation
Data visualisation of these terms was originally designed as a force-directed node-edge map, where the shape and colour of the visualisation was meant to represent microscopic photographs of the HIV-Virus, and the external notes would represent glycoproteins on the periphery of each virus. From a user-experience perspective however, this was tough to explore and unintuitive, so we pivoted to representing the people, places, and institutions as amoebic sub-clusters in a larger 'petri dish' per year.
The result was the following small multiples representation, available live here, where red: people, orange: places and yellow: institutions and the size of each blog is determined by the number of terms within it.
From this, you can see the general increase in article volume, the explosion of articles in 2004, the paucity of articles and early coverage for the as-yet nameless epidemic in the 1980s. To be clear, this illustrates proper noun mentions (so proper names, places identifiable as cities, locations, on a map, and organisations with a recognised authority or presence in the media). It should not be confused with a representation of HIV research depth; it only illustrates media coverage on HIV.
Media coverage remains interesting from socio-cultural and geo-political perspectives. Whether or not the nation was mentioning foreign or domestic personalities, certain city-level locations or foreign research bodies, is significant to an understanding of how the conversation progressed in its 30 years of media treatment.
Design
Visualising a zoomed-out view of the conversation density was not enough; we wanted to build a tool to discover terms and explore the articles based on these Chambua extractions. From the above cluster visualisation, we build a larger dynamic 'petri dish' to explore all of the entities by date of article publication.
The zoomable visualisation is available here.
How to view
Click on any circle to zoom in or out. The 'petri dish' enables two layers of zoom, at the year-level, and at the article-level. Articles are organised by date, and the largest circles represent the years with the most articles. Each article contains a people, place, and organisation cluster, and on click the article circle, the key at right populates with affiliated terms and the publication date. You can scroll beneath each header in the key to reveal terms if there are more than six.
References
For more on this project, please consult the links below:
- Source articles in a .txt format can be found in this github repository: HIV-Sentiment Analysis
- Information on the data cleaning scripts can be found in these python and node folders: Python pre-process | Node batch process
- A cluster visualisation to view all petris separately
- A zoom visualization to view the terms in connection
- To learn more about zoomable circle packs in D3 see also this static circle packing example, for the template
Data analysis
The project scraped HIV related from PDF reports and publications, journals, government ministries, dump files and online data portals. The data was analysed using Stata and Excel and visualised using google fusion maps, infographics creacted using Adobe illustrator and isotype interactive charts created using d3 and polymer.
- KIAS 2012 Knowledge and disclosure of HIV
- KAIS 2012 TB STI cervical cancer
- KAIS 2012 Sexual risk behaviour and condom use
- KAIS 2012 response rate data
- KAIS 2012 reproductive health and PMTCT
- KAIS 2012 male circumcision
- KAIS 2012 Knowledge attitudes & beliefts
- KAIS 2012 Incidence between 2007 versus 2012
- KAIS 2012 Household characteristics data
- KAIS 2012 HIV prevalence data
- KAIS 2012 HIV incidence
- KAIS 2012 Couples and cohabiting partners
- KAIS 2012 children & youth
- KAIS 2012 Care and treatment
- KAIS 2012 Blood and injection safety
- KAIS 2012 HIV testing
- Prevention of mother to child transmission of HIV
- The picture of HIV in Kenya
- UNAIDS 2013 Datasets
- HIV risk behaviours
- MARPs Polling BOOTH survey 2009 data
Infographics
- Barrier to accessing HIV care
- High risk sex
- HIV's heavy toll on families
- Lets talk about HIV
- HIV unfolding
Media stories inspired by project
Over 50,000 men having sexual relations with men in Kenya
Reference material
- AIDs county profiles
- Associations between Intimate Partner Violence and Health among Men Who Have Sex with Men: A Systematic Review and Meta-Analysis
- Baseline Polling Booth Surveys among Male and Female Sex Workers in Nairobi and Mombasa NASCOP Learning Sites Ministry of Health National Aids and STI Control Program NASCOP
- Geographic mapping of most at risk populations in Kenya June 2012
- HIV Prevention Response and Modes of Transmission Analysis, March 2009.
- Housing and population census, Kenya - 2009
- Internews in Kenya media resource centre
- Kenya Aids Indicator Survey 2012
- UNAIDS Global Report 2013
- Kenya Aids epidemic update 2012
- Kenya Aids Indicator Survey supplement
- Kenya Aids Indicator Survey2007
- Kenya County factsheets June 2013
- Kenya Demographic and Health Survey 2008-09
- Kenya Economic Report 2013 by Kenya Institute of Public Policy Research and Analysis
- Kenya Health Information System Integrated Disease Surveillance Reports
- Kenya National Aids Strategic Plan 2009/10 ‒ 2012/13
- Kenya National Health Accounts
- Kenya National Bureau of Statistics, 2009 Census
- Male Sex Workers Who Sell Sex to Men Also Engage in Anal Intercourse with Women: Evidence from Mombasa, Kenya
- Monitoring the situation of women and children: Multiple indicator cluster survey, Kenya National Bureau of Statistics, 2011
- Nation Media Group library
- Official Development Assistance (ODA) for Health to Kenya by the World Health Organization
- Online archives of various Kenyan and International media houses
- Populations at Increased Risk for HIV Infection in Kenya: Results From a National Population-Based Household Survey, 2012
- 2010-2011 Integrated Biological and Behavioural Survey among Key Populations in Nairobi and Kisumu, Kenya