Beneath the never-ending headlines about data breaches today is the simple fact that companies are collecting, curating and hoarding ever larger and richer archives of intimate information about their customers. Log files and transaction records that once would have been deleted the moment they were no longer needed are now being retained indefinitely. The default stance at many companies is to archive absolutely everything about their customers in the off chance that it becomes valuable someday in the distant future. Unfortunately, this data hoarding mentality is creating a new era of cyber risks to businesses.
The “big data” era has ushered in a newfound recognition for the value of data. Whereas certain types of data have always been viewed as valuable and routinely bought and sold by data brokers or used to attract advertisers, the plethora of new analytic tools and techniques have shown companies that even the most mundane data can have enormous value. Companies that once wholeheartedly embraced mandatory deletion schedules and rigorous risk review processes before any data stream could be archived have today pivoted towards keep-everything mindsets.
Log files, transaction records and other highly sensitive records were once kept by businesses only for as long as they were needed for regulatory compliance. Even then they were typically minimized as best possible, removing any datapoint not absolutely necessary. Today those datasets are viewed as incalculably valuable, recording the intimate behavioral and interest profiles of users with microscopic precision.
Whereas data archiving was once the domain of specialized IT divisions in many companies, stored in centralized and heavily audited and secured data warehouses and carefully overseen by corporate legal and security divisions, today’s corporate data management frequently resembles a Wild West free-for-all.
Each product group is often left to create their own myriad local archives, while data extracts from centralized warehouses are frequently routinely shared company-wide.
Log files that were once carefully controlled, regulated in terms of contents and audited continuously can now be created by developers, store passwords in cleartext and then simply be forgotten, all without centralized security teams ever realizing what’s happening. Most worryingly, such oversights aren’t relegated to small startups or legacy companies, even Web behemoths have fallen afoul.
The need for annotation and curation means immensely sensitive customer data is increasingly shared with a growing army of third-party contractors to review, creating an ever-expanding number of exposure risks.
Indeed, the recent controversies around contractor review of smart speaker and digital assistant audio recordings came about not because company employees divulged the practice, but rather because concerned contractors at these third-party companies reached out to media outlets to publicize practices they disagreed with.
Archiving all of this customer data comes with great risks, making companies attractive cybersecurity targets and vastly increasing their reputational, legal and monetary costs when breaches do occur.
A company that stores only regulatory-mandated data like transaction records in centrally managed hardened repositories minimized to contain only legally required information can minimize their reputational, legal and monetary risks if a breach does occur.
Companies in which absolutely every datapoint is collected and archived in myriad log files, repositories, warehouses, desktops and random servers all across the company faces a much greater risk of breaches, much greater legal risks from those breaches and much greater uncertainty in even ascertaining what was potentially taken in any given breach.
Add to this mix the IT sprawl of the myriad contractors that companies rely upon to perform everything from data management and analytics to human annotation and the modern enterprise has surprisingly little visibility into the lifecycle of a typical customer datapoint.
Companies that lack their own in-house analytics or deep learning staff can face even greater risks if they outsource those tasks in ways that require contractors to maintain local archives of vast swaths of private customer data.
Deep learning projects present unique novel risks for companies that must outsource their development. Their specialized hardware requirements and need to run on actual real-world data mean few companies have the necessary data center capacity and cannot simply swap out their data for synthetic training data. In short, they often must relinquish their most intimate customer data into the hands of strangers beyond their view.
Eager to benefit from AI and analytics advances, companies are also frequently prioritizing ease of development over rigorous security, allowing their deep learning and data analytics teams greater freedom in accessing and copying data into external services and platforms to ease their workflows without fully considering their ramifications.
In fact, in many companies it is not uncommon for security teams to uncover well-meaning developers or even entire teams that have spun up an external cloud account using a purchase card and shipped immensely sensitive customer data out the door without even the most basic of security safeguards in place or relying upon misconfigured world-readable storage directories to ease developer data sharing without understanding the risk exposure of such configurations.
In the end, the hoarding mentality of today’s data-driven enterprise presents ever-greater cyber risks that even the biggest companies do not always fully appreciate.
By Kalev Leetaru
Authors propose an iterative cycle of activity that helps companies optimize for both business value and social value in a single business model.
90 Day Finn programme is an intriguing example of how smaller hubs can think outside the box to compete with the San Franciscos, Londons and Singapores of the world.
The lean startup approach for developing new products or businesses has become popular among companies large and small. But is the approach really effective?