Public health surveillance
Almost every facet of online activity creates user-generated data; content that is created directly (or indirectly) by an online user. Popular examples include posts shared on social media, such as Facebook and Twitter, or more private records, such as logs of web browsing or search engine activity. A lot of research has been conducted on methods for automatically transforming this data to meaningful inferences and related applications in the domains of health, finance and politics, or even enhanced interpretations of human behaviour.
Our group, at the Department of Computer Science of University College London, (and funded by the interdisciplinary EPSRC project i-sense), has been focusing on this type of research primarily from a health-oriented perspective. The main driver for us is the development of methods that use online user-generated content to improve the standard of traditional health surveillance, and to provide alternative forms of collective health analysis. Our overall goal is a framework that can be used as a complementary information source for health agencies, assisting them in making better and timely decisions.
Google Flu Trends (GFT) was a platform for monitoring the prevalence of influenza-like illness by looking at the frequency of statistically related queries on Google’s search engine. However, the model behind Google Flu Trends was not perfect. In fact, it made severe mistakes, such as over-predicting the rate of flu in what appeared to be, in hindsight, a normal flu season. In collaboration with Google, we have managed to correct these mistakes by proposing a better-equipped approach (see Figure 1). The new method can remove ambiguity between different search query concepts and explores nonlinear trends in the data. Focusing on England, we have visualised the flu rate inferences on our flu-monitoring platform (Flu Detector), together with estimates that originate from Twitter content.
digital monitoring of infectious diseases useful?
Firstly, because online data provides a complementary, potentially additive signal to what traditional health surveillance already tells us. The latter is just based on records from visits to medical facilities or using medical services, whereas the former can access much broader parts of the population. Also, estimates from online data can be made instantly, overcoming the various delays that syndromic surveillance suffers from. Finally, such models do not necessarily depend on an established health system and could present a great resource for developing parts of the world.
Apart from monitoring an infectious disease, in collaboration with Microsoft Research and Public Health England (PHE), we developed a framework for assessing the impact of health intervention. Our case study was a pilot children vaccination programme for flu, launched in specific locations in England, and our online data sources consisted of a geo-located mix of tweets and anonymised Bing searches. Although PHE was unable to derive statistically significant estimates for the impact of the vaccination programme due to the small number of samples in their data, our online framework yielded statistically significant impact estimates that were in agreement with the ones coming from PHE (see Figure 2). Both analyses indicated that the vaccination programme succeeded in reducing the flu rates in the affected communities.
In our latest work, we are developing methods to better understand detailed health signals from different groups within the online population. For example, a particular occupation type may be more exposed to certain types of disease, or people from certain socioeconomic backgrounds may react differently to nationwide health interventions. The traditional way of understanding these trends is by conducting surveys. However, this is time consuming, potentially costly and a considerably biased process. Social media content, when mined in the correct way, could be used as a viable alternative. To this end, we have been co-developing supervised learning techniques for automatically inferring demographic attributes, such as the occupation, income and socioeconomic status, of social media users.
The current core members of our research group (excluding myself) are Prof. Ingemar J. Cox, Dr. Lukasz Olejnik, Bin Zou and Jens K. Geyti. The i-sense project is led by Prof. Rachel A McKendry.
Read other entries in the Research Councils Rio 2016 Olympics themed blogs.