Synthetic data, digital realities: the next frontier for data-driven innovation?

techUK’s summary response to the Financial Conduct Authority’s, call for input: Synthetic data to support financial services innovation.

Data access, availability and privacy concerns remain a key challenge for organisations seeking to innovate and develop new products and services, such as by using large, sensitive personal datasets to train and deploy more ethical artificial intelligence models. From healthcare, to manufacturing to agriculture, almost every industry is set to benefit from greater use of data-driven insights and decision-making which could help to improve lives and tackle pressing societal challenges such as climate change.

The financial services industry is no exception, where data is driving efforts towards combatting financial fraud, protecting consumers, and improving financial inclusion across the UK. Recently, the Financial Conduct Authority (FCA) published a Call for input: Synthetic data to support financial services innovation with the aim to better understand the existing market maturity of synthetic data, which the FCA believes could help address existing challenges to data-driven innovation.

As well as exploring the uses, benefits, and challenges of synthetic data, the FCA call for input also raises ethical considerations on its use, as well as the role regulators can play in governing and encouraging its widespread adoption.

 

Demystifying synthetic data

According to the Alan Turing Institute, synthetic data refers to data that is generated using algorithms which preserves an original data sets’ statistical features while producing entirely new data points. Synthetic data generators enable users to share data, to work with data in safe environments, to fix structural deficiencies in data, to increase the size of data, and to validate machine learning systems by generating adversarial scenarios.

However, as thinking develops on synthetic data, Government and regulators must recognise the considerable work that still needs to be done to remove existing barriers to unlocking the full value of non-synthetic or ‘real data’ i.e., data generated by actual events. Failing to do so, risks these challenges persisting in the context of synthetic data if not addressed e.g., the data skills shortage.

Below, techUK has set out several recommendations which should be taken into consideration as Government, regulators, and bodies such as the Digital Regulation Cooperation Forum (DRCF) develops its thinking in this area.  If the FCA considers new initiatives or market interventions related to synthetic data, it should ensure engagement and discussion with a diverse range of stakeholders, including industry.

 

techUK’s recommendations on synthetic data

  1. Step up efforts to address existing barriers to data access and sharing, such as lack of data standards, data quality as well as privacy and security concerns.

    Some of the current challenges for accessing and sharing data will not be resolved through the generation of synthetic data and will remain a challenge for businesses seeking to innovate, such as lack of commercial incentive to share data. Government must tackle this head on by making progress in laying primary legislation for Smart Data Schemes and considering how these frameworks can be used to facilitate wider cross-sectoral data sharing.

    techUK also recognizes that synthetic data offers a promising way to address the tension between innovation and privacy, which is another key barrier to data sharing. Other mechanisms such as Privacy Enhancing Technologies (PETs) and a thriving data intermediary ecosystem will also play a key role in managing this tension and we encourage the Government and regulators to drive efforts on all these fronts.

     
  2. Outline a clear plan for the continued opening up of Government and public sector data sets, with the aim to move toward near real-time reporting of data.

    While synthetic data will be significant in providing businesses with more access to datasets, efforts must still be put in to understand existing types of real datasets, such as those held by Government and public services and how they can be best leveraged. This will be key for businesses to innovate, but to also generate their own synthetic datasets, by modelling synthetic datasets from public sector data.

    The Government could also consider generating and opening data as synthetic datasets to mitigate concerns or risks around privacy and ethics. For example, the Medicines and Healthcare Regulatory Agency (MHRA) used primary care data to produce entirely artificial data that did not contain any original data from “real” patients, to reduce risks to patient privacy. These synthetic data sets have helped medical researchers to develop cutting-edge medical technologies, such as medical devices to fight COVID-19 and cardiovascular disease.

     
  3. Ensure that synthetic data is deployed for the public good and its use underpinned by careful privacy considerations

    Given how significantly synthetic data could benefit the public, regulators should continue to facilitate collaboration around the creation of synthetic datasets. For example, the FCA should work with the Home Office to look specifically at how synthetic data could be used within the Economic Crime Plan and the new Online Fraud Strategy which are currently being developed.

    techUK also welcomes further exploration on the role regulators could play in hosting and providing access to synthetic data against a fee, which should be based on fair relationships with industry. Regulators should also consider providing some synthetic data for free or at discounted rates if used for purposes that are of particular benefit for the public.

     
  4. Leverage the Digital Regulation Cooperation Forum (DCRF)to ensure joined up and consistent approaches to synthetic data

    As the FCA explores the use of synthetic data in the financial services industry, it is vital that key findings and outputs from the consultation are shared with the DRCF to ensure a collaborative approach and culture of knowledge sharing between regulators on this nascent topic. This should include engagement with the Competition and Markets Authority (CMA) who will play a key role in assessing the impact of synthetic data on competition.

    With businesses navigating complex and oftentimes overlapping regulatory regimes, the DRCF plays a vital role in ensuring consistent approaches, shared visions, and cooperation between regulators on topics with mutual interest. Since the uptake synthetic data will cut across multiple sectors, a joined up and consistent approach between regulators will be essential.

    This will be especially true if regulator adopt the role of coordinating, generating and/or hosting synthetic data, which will require further consultation with industry to determine  what  types  of  synthetic  datasets should be  prioritised for generating  and sharing, and how access to this data will be facilitated.
     
  5. Narrow the data skills gap and combat skills shortages by investing in training, upskilling, and reskilling of the UK’s workforce

    If businesses and organisations are to begin generating synthetic data through the deployment of algorithms, digital and data skills will be vital. Synthetic data is generated programmatically and depends on highly skilled computer scientists with expertise in deep and machine learning models. There is already an existing and considerable data skills gap, in part due to businesses, organisations, and Government lacking the technical skills to manage data, as well as the skills to think creatively about data.

    One way the Government can address digital skill shortages to boost growth is by expanding the coverage of the Help to Grow: Digital Scheme, supporting SMEs to invest in digital reskilling through a Digital Skills Tax Credit and continuing to reform the Apprenticeship Levy.  The tech industry, working with Government and key stakeholders, has a responsibility to tip  the  scales  so  that  motivations  for  learning outweigh any barriers faced.

 


Dani Dhiman

Dani Dhiman

Policy Manager, Artificial Intelligence and Digital Regulation, techUK

Dani is Policy Manager for Artificial Intelligence & Digital Regulation at techUK, and previously worked on files related to data and privacy. She formerly worked in Vodafone Group's Public Policy & Public Affairs team supporting the organisation’s response to the EU Recovery & Resilience facility, covering the allocation of funds and connectivity policy reforms. Dani has also previously worked as a researcher for Digital Catapult, looking at the AR/VR and creative industry.

Dani has a BA in Human, Social & Political Sciences from the University of Cambridge, focussing on Political Philosophy, the History of Political Thought and Gender studies.

Email:
[email protected]
LinkedIn:
https://www.linkedin.com/in/danidhiman,https://www.linkedin.com/in/danidhiman

Read lessmore