The Altxerri cave in Aia, Spain, contains cave paintings estimated to be roughly 39,000 years old. Some of the oldest-known in existence, these drawings depict bison, reindeer, aurochs, antelopes and other animals and figures.
It is what Xabi Uribe-Etxebarria calls one of the first forms of “data storage.”
But, we’ve obviously come a long way from cave drawings. Data collection has accelerated over millennia; in just the last decade, its collection and storage has grown at a pace never before seen — as have attacks on it.
As such, “our privacy is at risk,” said Uribe-Etxebarria. “So, we must take action.”
Learn the critical role of AI & ML in cybersecurity and industry specific case studies. Watch on-demand sessions today.
Uribe-Etxebarria’s company, Sherpa, is doing so via federated learning, a machine learning (ML) technique that trains algorithms across multiple decentralized servers containing local data — but without sharing or unintentionally sharing that data.
The company today announced the launch of its “privacy-preserving” artificial intelligence (AI) model-training platform.
Uribe-Etxebarria, founder and CEO, said that the company considers data privacy “a fundamental ethical value,” and that its platform “can be a key milestone in how data is used in a private and secure way for AI.”
Privacy holding back advancement
Standard ML techniques require centralizing training data on one machine or in a data center. By contrast, federated learning — which was coined and introduced by Google in 2016 — allows users to remotely share data to train a deep learning model.
Each user can download the model from a data center in the cloud, train it on their private data, summarize and encrypt its new configuration. It is then sent back to the cloud, decrypted, averaged and integrated into the centralized model.
“Iteration after iteration, the collaborative training continues until the model is fully trained,” explained IBM researchers.
However, the challenge is that useful and accurate predictions require a wealth of training data — and many organizations, especially those in regulated industries, are hesitant to share sensitive data that could evolve AI and ML models.
Sharing data without exposing it
This is the problem Sherpa seeks to address. According to Uribe-Etxebarria, its platform enables AI model training without the sharing of private data. This, he said, can help improve the accuracy of models and algorithm predictions, ensure regulatory compliance — and, it can also help reduce carbon footprints.
Uribe-Etxebarria pointed out that one of the major problems with AI is the significant amount of energy it uses due to the high amounts of computation needed to build and train accurate models. Research has indicated that federated learning can reduce energy consumption in model training by up to 70%.
Sherpa claims that its platform reduces communication between nodes by up to 99%. Its underlying technologies include homomorphic encryption, secure multiparty computation, differential privacy, blind learning and zero-knowledge proofs.
The company — whose team includes Carsten Bönnemann from the National Institutes of Health in the US Department of Health and Human Services and Tom Gruber, former CTO and founder of Siri — has signed agreements with the NIH, KPMG and Telefónica. Uribe-Etxebarria said NIH is already using the platform to help improve algorithms for disease diagnosis and treatment.
Use cases aplenty for federated learning
IBM researchers said that aggregating customer financial records could allow banks to generate more accurate customer credit scores or detect fraud. Pooling car insurance claims could help improve road and driver safety; pulling together satellite images could lead to better predictions around climate and sea level rise.
And, “local data from billions of internet-connected devices could tell us things we haven’t yet thought to ask,” the researchers wrote.
Uribe-Etxebarria underlined the importance of federated learning in scientific research: AI can be leveraged to help detect patterns or biomarkers that the human eye cannot see. Algorithms can safely leverage confidential data — such as X-rays, medical records, blood and glucose tests, electrocardiograms and blood pressure monitoring — to learn and eventually predict.
“I’m excited about the potential of data science and machine learning to make better decisions, save lives and create new economic opportunities,” said Thomas Kalil, former director of science and technology policy at the White House, and now Sherpa’s senior advisor for innovation.
He noted, however, that “we’re not going to be able to realize the potential of ML unless we can also protect people’s privacy and prevent the type of data breaches that are allowing criminals to access billions of data records.”
Uribe-Etxebarria agreed, saying, “this is only the beginning of a long journey, and we still have a lot of work ahead of us.”