Natural language processing
Natural Language Processing (NLP) is concerned with the exploration of computational techniques to learn, understand and produce human language content. NLP technologies can assist both human-human communication (e.g. machine translation) and human-machine communication (e.g. conversational interfaces and automated personal assistants), and can analyse and learn from the vast amount of textual data available online.
We aim for this area to grow as a proportion of the EPSRC portfolio. Recognising NLP's importance to the development of intelligent interfaces and to data science, this strategy notes the opportunities for increased activity and for maintaining our capability in mainstream statistical NLP within UK academia. As this area is a small part of our portfolio, it must grow if the foci on data science and intelligent interfaces are to be achieved in parallel with maintenance of mainstream statistical NLP capability.
By the end of the current Delivery Plan, we aim to have:
- A research and training portfolio that contributes to development of new intelligent interfaces with NLP at their core. NLP will increasingly serve as an interface for communicating between humans and systems (e.g. in the Internet of Things) and dialogue management will become increasingly important, linking NLP with the related fields of Speech Technologies and human-robot interaction. Researchers should also be encouraged to address challenges in multi-modal interfaces (e.g. by exploring and exploiting the links between language and vision)
- A portfolio of research and training that includes work on enabling extraction of knowledge from large-scale textual data. The opportunity exists for researchers to target interdisciplinary work in this area (e.g. textual analytics enabling analysis of medical records)
- Researchers working towards the goal of computing with meaning, contributing to the broad objective in Artificial Intelligence (AI) of developing computational methods for ascribing semantics to human behaviours (e.g. natural human interaction)
- A supply of people with high-level skills, reflecting increasingly acute demand as NLP technologies are used in an increasing number of applications
Researchers have the opportunity to play an important role in delivering the objectives of EPSRC's Future Intelligent Technologies and Data Enabled Decision Making cross-ICT priorities, and are well-placed to contribute to the other cross-ICT priorities. To maximise impact, they should ensure effective communication with researchers in areas such as AI Technologies, Visualisation and Human-Computer Interaction (HCI).
Responsible Innovation is a significant consideration. Researchers should be encouraged to address issues of trust, identity and privacy with regard to how NLP is used in social contexts and large-scale social networks.Highlights:
Although this area is only a small proportion of the EPSRC portfolio, the UK has a small number of world-leading NLP research groups and is considered internationally competitive (Evidence source 1). Capacity is currently low but we wish to support the future success of the research base as demand for capability to create and integrate intelligent interfaces increases (Evidence source 2). The UK has a particular strength in its depth of experience in combining NLP with machine learning (ML) methods. Other strengths are natural language understanding, machine translation and information extraction (Evidence source 3,4). A combination of factors (the application of ML methods to vast amounts of linguistic data, and a significant increase in computing power) has led to recent advances in NLP and related technologies and this is likely to continue. The UK is well-placed to capitalise, given its strengths in NLP, ML and related areas (e.g. Speech Technologies) provided there is increased capacity to do so.
UK strength at the interface of speech and language technologies and ML is evidenced by the significant investment being made by major IT companies (e.g. Amazon, Google and Apple), who have created or expanded UK-based research facilities and are heavily recruiting researchers with NLP expertise (Evidence source 5,6). There have been several recent, high-profile UK start-up acquisitions (e.g. Dark Blue Labs by Google, VocalIQ by Apple and SwiftKey by Microsoft) and a lot of growth in commercial interest in using NLP technologies (e.g. in conversational interfaces and automated personal assistants), (Evidence source 4-6).
There is a need to ensure a supply of people with high-level skills in NLP. As noted above, major IT companies are heavily recruiting staff with PhD and postdoctoral experience in NLP. However, retention of expertise and key capacity in academia beyond PhD level is a recognised problem that will become even more acute as NLP and related technologies are utilised in an increasing number of applications (Evidence source 4,7).
NLP is a significant research area for data science, as it enables management of unstructured data (e.g. patient records, medical literature), and for Robotics and Autonomous Systems (RAS), where NLPs use for human-robot interaction in integrated RAS systems is increasingly important. In general terms, NLP is important to the health of related disciplines (e.g. HCI, robotics and AI) as both a driver of research and a user/collaborator/magnifier for those disciplines’ research outputs (Evidence source 4).
This area is linked with a number of others (mainly within ICT). Those of most current relevance are: Artificial Intelligence Technologies, Human-Computer Interaction, Human Communication in ICT, Image and Vision Computing, Information Systems, Software Engineering and Speech Technologies.
NLP is expected to contribute significantly to the Connected Nation Outcome and, at a lesser level and/or over a longer timeframe, to the other Outcomes. Specific Ambitions of particular relevance are:
C1: Enable a competitive, data-driven economy
NLP will contribute to the interfaces that will be part of the smart tools and analytical techniques needed to generate actionable information from large and diverse datasets.
C2: Achieve transformational development and use of the Internet of Things
Communication between a wide range of sensors and devices, and their interaction with people, will lead to the next revolution in products and services. NLP will contribute to the way information can be intelligently assimilated and communicated.
C3: Deliver intelligent technologies and systems
NLP will contribute to the smart tools and intelligent technologies that will take the Connected Nation beyond data flows and turn data flows into physical action, and will increasingly serve as an interface for communicating between intelligent systems and the people using them.
C4: Ensure a safe and trusted cyber society
NLP researchers have the opportunity to contribute to development of new tools to analyse and interpret data for large-scale systems in order to detect crime and terrorism, as well as addressing other security issues, while ensuring citizens’ privacy and trust (e.g. understanding how those who would do us harm use the internet).
R3: Develop better solutions to acute threats: cyber, defence, financial and health
NLP can help ensure better ability to identify emerging threats or anomalous patterns within existing and future complex data environments.
- EPSRC, Analysis of Research Excellence Framework (REF) 2014 data and EPSRC Knowledge Maps, (2014)
- CITIA, CITIA Roadmap, (2016)
- C.D. Manning, (2015), Computational Linguistics and Deep Learning. ACL 41(4), 701-707
- Community and user engagement (individual input and group feedback)
- J. Hirschberg and C.D. Manning, (2015), Advances in Natural Language Processing. Science 349(6), 261-266
- EPSRC, Output from the Speech Technologies exceptions process, (2015)
- IT Jobs Watch, Tracking the IT Job Market, (2016)
Research area connections
This diagram shows the top 10 connections between Research Areas within the EPSRC research portfolio. The depth of the segment relates to value of grants and the width of the segment relates to the number of grants shared by those two Research Areas. Please click to see the related Research Area rationale.
We aim to grow this area as a proportion of the EPSRC portfolio.
We aim to grow this area as a proportion of the EPSRC portfolio.
Visualising our Portfolio (VoP)
Visualising our portfolio (VoP) is a tool for users to visually interact with the EPSRC portfolio and data relationships.
EPSRC support by research area in natural language processing (GoW)
Search EPSRC's research and training grants.