Data Retention Policy considerations and use cases
What is a Data Retention Policy and why should one be used?
In a nutshell, Data Retention Policies are a set of document storage rules that an organization adheres to for how long it is supposed to withhold data for regulatory purposes. How these rules are defined, what data they entail and any exceptions to the rules are where things start to get a bit complicated. Legislative and regulatory compliance are the primary drivers behind organizations creating and implementing Data Retention Policies, but many organizations retain data to enhance the “corporate memory,” using the ability to look back at previous client communications and project-specific documentation created by the organization’s employees to leverage the knowledge contained therein and build their future initiatives and plans, as well as support their client base with a rich history of past experience.
Having a Policy defined benefits the organization in that it provides a “hard” cutoff for most data as well; any data that isn’t on any sort of Legal Hold that goes over that Policy’s duration is safe to delete, which can reduce liability for some types of organizations. While we at Tangent are of the mind that having more data is beneficial since information often equates to knowledge and understanding of the past, this isn’t always the experience for all organizations, and some are just happy to meet compliance requirements and then be done with that data as soon as feasible.
Without a declared Retention Policy, organizations that are subject to document retention are in a very vulnerable legal position in case of a Freedom of Information Act (FOIA) requests (also known as Public Records Requests or PRR) or lawsuits, as eDiscovery (the discovery of Electronic Documents, such as emails) is almost always part of an opening salvo to garner more information to then dig through. Not having an official policy means people can request data going back to the beginning of time, adding tons of potential data to any request that can take time to produce and be potentially damaging to an organization. Policies that are too short can also wind up causing separate legal penalties to be assessed against an organization for not meeting their retention requirements as required by law.
A Data Retention Policy for an organization is wider in scope than just the emails, social media posts and direct messages that DataCove archives; paper records, text messages and any other communication systems an organization uses both for its employees and for its customers (or the public, for our many government clients) are also items that fall under such a policy and should also be retained accordingly.
Is there guidance available for which Data Retention Policy my organization should be using?
In every situation, these are questions best answered by the legal counsel of the organization and it is strongly recommended to work with them to craft a formal retention policy for the organization.
Tangent can help offer some resources and considerations for crafting a Data Retention Policy, but the full decision, research and weight of this is something that must be decided by the organization itself. Those resources are below, including a state-by-state listing of official regulations to help provide some solid baselines.
How long are organizations required to retain records?
These vary depending on the type of organization (for regulated entities), types of data being held and the location of that organization. Those data types are discussed below in detail.
With that said, there are a few very common retention policies Tangent sees used by organizations all over the country. While Tangent does not validate that these policies are accurate for every organization that uses them or that they are safe to use within their industry or location, we do see a great deal of commonality between the organizations who use these policies to get an idea of what the general consensus has come out to be.
Three and Five year policies, used commonly by K-12 school districts and community colleges when student emails are being archived. Seven year policies are commonly used for faculty and staff, with students receiving the shorter policies due to their rotation rates.
Seven year policies, used by nearly all commercial organizations due it’s wide breadth of coverage and as a default for essentially any non-medical organization, including universities both public and private. These policies are by far the most common policy type since they cover the vast majority of considerations that most organizations will deal with.
Ten year policies, often used by medical, charity and non-profit organizations of any kind, including mental health facilities, clinics, hospital systems and more.
Forever policies, for organizations who haven’t yet defined a retention policy, have frequent data lookup requests or long-lasting data, like that used by utility companies, cities, counties, state and federal government agencies. Due to Tangent’s preponderance of government clientele, we see these policies as the second most common type in play.
Are there legislative considerations for my organization’s industry? How about for financial service companies, banks, publicly traded corporations, medical organizations, educational institutions or government institutions?
Certain industries are subject to regulation, and certain regulations come with specific data retention requirements.
For example, in the financial services industry, the Securities and Exchange Commission (SEC) Rule 17a-4 requires broker-dealers to retain and index electronic correspondences, including email, with immediate access for a period of two years and with non-immediate access for at least six years. Firms that fail to comply with SEC Rule 17a-4 are subject to investigation and penalization by the Financial Industry Regulatory Authority (FINRA).
Other commonly encountered legislative requirements are below:
Sarbanes-Oxley (SOX) Act: Passed into U.S. federal law in 2002, SOX created financial record keeping and reporting requirements for publicly traded corporations to protect investors from fraudulent activity. Those requirements include a five year retention period for customer invoices, a seven year retention period for tax returns and receivable or payable ledgers and an indefinite retention period for payroll records and bank statements.
Gramm-Leach-Bliley Act (GLBA): GLBA, which became law in 1999, requires financial institutions to be transparent with consumers about their information-sharing practices and to make an additional effort to secure consumer data. Although GLBA does not stipulate a specific retention period, the general rule of thumb is to retain all financial records for a period of seven years.
Health Information Portability and Accountability Act (HIPAA): Although HIPAA — the regulation designed to protect patients’ private data against fraud and theft — does not set specific retention periods of medical records, it does specify how long healthcare organizations must retain HIPAA-related documents. According to CFR § 164.316, healthcare organizations (known as “Covered Entities”) are required to retain HIPAA compliance documentation for a minimum of six years from when it was created or, in the event of a policy, from when it was last in effect.
Family Educational Rights and Privacy Act (FERPA): FERPA is a data security regulation that applies specifically to educational institutions and agencies. FERPA does not specify retention periods. However, it does require schools to produce and present a student’s educational records to their parent or legal guardian upon request, which means academic institutions would do well to retain these records for at least a few years after a student has graduated or is no longer enrolled.
Freedom of Information Act (FOIA): Similar to FERPA and GLBA, FOIA — which gives members of the public the right to request records from federal agencies — does not have any hard-and-fast records retention requirements. With that said, FOIA does require federal agencies to establish records management programs and “identify records that should be preserved.” As a result, any federal agency’s record management program should include records retention schedules for different paper and electronic documents.
What types of records are being held onto currently?
Certain records, such as tax documents (and supporting documents), employment records, sales receipts, expense reports and insurance policies, take precedence over others when it comes to records retention. It may not be necessary to retain a company-wide email announcing the date of the annual holiday party, but it is definitely important to retain any emails pertaining to your legal, financial or human resources departments.
In some cases, these high priority records come with their own set of retention requirements. For example, the Internal Revenue Service requires organizations to retain employment tax records for a minimum of four years; the Occupational Health and Safety Administration requires businesses to retain records on workplace injuries for five years; and the Equal Employment Opportunity Commission requires employers to retain all personnel or employment records for one year.
Business specific information, ranging from student intake data to client correspondence and contracts and all the way up to engineering blueprints for construction or products should all be given due consideration for their value over time to help decide on how long they should be held.
How practical is it to retain disparate data types?
A question we frequently receive at Tangent, especially after an organization has perused their state government guidance on retention of specific data types, is how is this to be managed? Most documentation points to different retention schedules for different types of data, and the onus of classifying data is ever-so-often placed upon those making that data. As any IT Administrator will tell you, relying on your end users to remember that when they send emails out that contain certain types of data, they also have to check a box or flag it so that it has a long retention policy is a pretty unlikely scenario, let alone doing it consistently every time over who knows how many years. And let’s not even get into employee turnover situations and training new users to do this work!
What most organizations wind up doing is taking a ‘blanket’ approach to email retention, as in they’ll take the longest applicable policy that doesn’t already have any other sort of retention guideline for it, and then cover all emails with that policy.
For example, a city might use a seven year policy for all of their employee emails globally, since that covers all of their finance data, human resources information, public relations and emergency communications along with lesser retention-value items like general day to day employee communication that would otherwise need to only be retained for three years. This seven years also covers most of the practical needs of their Building and Zoning division (how often do requests come in for architectural data more than seven years old, after all?), but the Building division also retains their blueprints and permit authorizations in a separate database that holds data indefinitely, since knowing where the gas mainlines are in a city are pretty important even fifty years in the future.
In this scenario, a blanket policy covers almost everything for the city safely, and the only items they’d need to hold longer are in a separate database that is already being retained indefinitely.
Not all organizations will meet these requirements of where their data is held separately, or the IT Administrators may not have access to those other storage locations to verify (let alone asking the question of backups of that data and backup validity, a la Schrodinger’s Backup), so these blanket policies really do wind up being nearly universal in real life, and more often than not, wind up trending longer to cover as many bases as possible. It’s not uncommon to see systems with over twelve years of data on it even from organizations who don’t necessarily need to hold data that long, simply because it’s both simpler and safer for all involved.
With that said, DataCove does support tiered retention policies with a great deal of granularity, and some of the more complex options that are not blanket policies will be discussed below. Benefits and risks of those policies, along with best practices for implementation are also covered in detail.
State Retention Schedules and Recordkeeping Laws
Tangent has taken the liberty of putting together a comprehensive list of the document retention guidelines by state to help provide a solid foundation in deciding your organization’s retention policy. Please note that while all links were live and active at the time of writing, these website links are managed by their respective States and may change over time. If a link no longer works, searching the State’s website for wording similar to the title of the guide will take you to their current webpage for the guide.
Alabama
State Records Disposition Authorities
Alaska
Records & Information Management Services
Arizona
Arkansas
Department of Finance and Administration Records Retention Schedule
California
Records Management and Appraisal Program
Colorado
Connecticut
Records Retention Schedule (Municipalities)
Records Retention Schedule (State Agencies)
Delaware
Records Retention Schedules (Agency Specific)
Records Retention Schedules (General)
Florida
Georgia
Records Management Requirements
Records Retention Schedules (Municipalities)
Records Retention Schedules (State Agencies)
Hawaii
Idaho
Illinois
Indiana
Iowa
Records Management Guide & Schedule
Kansas
Kentucky
Louisiana
Maine
Maryland
Massachusetts
Michigan
Minnesota
Mississippi
Records Retention Schedules (Municipalities)
Missouri
Department of Records Management
Records Retention Schedules (Municipalities)
Montana
Nebraska
Nevada
New Hampshire
Division of Archives and Records Management
Records Retention Schedules (Municipalities)
New Jersey
New Mexico
New York
North Carolina
Records Retention Schedules (Municipalities)
North Dakota
Ohio
Oklahoma
Oregon
Pennsylvania
Rhode Island
South Carolina
South Dakota
Tennessee
Texas
Utah
Vermont
Virginia
Washington
Records Retention Schedules (Municipalities)
Records Retention Schedules (State Agencies)
West Virginia
Records Retention Schedules (County)
Wisconsin
Statewide General Records Schedules
Wyoming
How does DataCove’s Data Retention Policy feature work?
DataCove’s Retention Policy operates in either a “simple” global configuration that works for most organizations or as a “complex” configuration that ties in multiple other features and layers to make advanced Retention Policies that are extraordinarily flexible and granular.
In either configuration, the core functionality is the same: DataCove is given a cutoff period in terms of a ‘rolling’ date where emails older than that date are deleted from the system and from the backup location. Any extensions to this policy are observed and those emails are left unmolested until they exceed their extensions (if ever).
As an example, let’s say we’re working with a DataCove that has emails from March 2005 to November 2022.
A seven year global retention policy is applied, which deletes emails older than seven years from the date that the policy runs.
The policy is set to run automatically with a frequency of monthly on the last Saturday of the month.
On November 26, the last Saturday of November 2022, the DataCove will initiate a process called the DataAging Purge.
This process deletes all emails from the system older than November 26, 2015, and then reaches out to the backup location to likewise delete all messages older than November 26, 2015 from there as well.
If there are any extensions or exceptions to this policy, like a Legal Hold or certain email addresses or groups of email addresses that are meant to be held for a longer period of time (Forever, in this example), those emails will not be removed.
Once the process finishes, the DataCove now only contains emails from November 2015 to November 2022, as well as any emails held for special extension.
On December 31, the last Saturday in December 2022, the DataAging Purge will run again and delete all emails older than December 31, 2015, again leaving any emails with extensions alone.
This process continues on indefinitely into the future, keeping the system only holding emails that meet the organization’s declared Retention Policy and those that are under extension or Legal Hold for investigative purposes.
The individual phases of the DataAging process, colloquially known as the Purge, are described below. These phases are fairly complex, but automated and always operate in a specific order:
DeIndexing, wherein emails to be deleted are first removed from the DataCove’s search indices so they no longer show up in search results.
Attachment Repointing, for emails with attachments that exist on multiple emails. If the original source email with the original source attachment is going to be deleted, the pointer for that attachment will now be redirected towards the next oldest email containing that attachment. This keeps the deduplication of attachments intact and ensures storage space consumption remains low.
Reforging of Indices, a reindexing process that applies to emails that may have existed within a certain period-of-time based index that was now partially emptied by the DeIndexing process. These emails will be reread and then placed into a new Index in order to maintain the most efficient size of an index, keeping searches expedient in the future. This process runs contemporaneously with the DeIndexing process so that emails are made immediately searchable again, avoiding any search gaps.
Local System Deletion, when the emails and their attachments are fully removed from the system and their space reclaimed for reuse by new emails.
Remote Backup Deletion, which is the process of DataCove reaching out to the Remote Backup location and deleting the same emails and indices from the system’s backups in order to avoid “backup liability” by still possessing those no-longer-needed emails in a backup location, even if no longer on the live system. This process often gets postponed to the next regular Remote Backup run depending on scheduling of that process, resulting in a longer-than-usual run the day after a Purge executes.
Note: ‘Backup Liability” is the eDiscovery concept that backups of data are still discoverable in an investigation, and if they exist, the data from them must be restored and made available for the requester. Many organizations keep backups of data longer than the production system would maintain it, including on tape, cloud or offsite locations for building or regional redundancy; that data is discoverable by virtue of it’s existence. DataCove mitigates this with its own backup by purging the backup of the now-destroyed data within short order of the system’s Purge to prevent unnecessary Backup Liability from occurring.
Configuring the Data Retention Policy and Extension Options
To setup a Data Retention Policy on DataCove, log into the web interface and navigate to Maintenance in the top header bar, then select Data Retention Policy on the left hand side menu.
The default setting for this page is a Forever Policy, meaning the DataCove will not delete any emails at any time until a policy shorter than Forever is configured.
Selecting the Custom radio button will expand the options available.
DataCove’s Retention Policy options fall into two categories: Simple and Complex.
Simple policies are comprised of a single, all-encompassing Retention Policy that does not receive any deviation, exception or extension unless an investigation or lawsuit changes that policy and forces extensions or Legal Holds to be implemented. This is useful for sites that do not have different groups of users who need different retention schedules, or where a blanket policy will be used to cover all users at the longest possible duration necessary for any sorts of documents retained.
Complex policies consist of the same all-encompassing Retention Policy, but then possess further layers that allow for different domains, groups, departments, individual email addresses or individual emails to have longer policies than that of the global policy. These granular policies allow for tighter scheduling and limited liabilities by only retaining emails for as long as different users or groups may need to hold them, and trimming off any emails that don’t meet the criteria for longer retention much earlier.
The common function between these two types is the “Retain email contents on system hard drive” global policy, and this must be configured in order to use either type of policy.
This policy fills the foundation of either Retention Policy type, and selecting from the dropdown boxes of years and months allows the setting of a baseline policy that will remove emails older than the defined duration, barring any emails that fall under one of the extension options. This should be set to the organization’s defined Retention Policy for emails and chat messages, and if implemented, Social Media posts.
For a Simple Retention Policy, this is the only parameter that needs to be configured for email withholding duration. If no extensions are needed, skip over the Complex Retention Policies section and onto Purge Frequency Scheduling.
Note: It is important to keep in mind that the Retention Policy functions on the basis of Email Sent Date, as in it takes effect on emails based on when they were Sent originally, not when DataCove received them. As an example, under a seven year policy, an email that came in on present day November 21, 2022 would not be flagged for deletion until November 21, 2029. However, historical emails that were uploaded to the DataCove in October 2022 from some Enduser .PST files that date back from 2022 to 2012 would have numerous emails being flagged for deletion much sooner, potentially immediately, and those emails would be purged on the next scheduled DataAging run.
Complex Retention Policies
When configuring a Complex Retention Policy, the “Retain email contents on system hard drive” setting must always be the shortest policy on the system, as extensions to the policy will logically be longer than the global policy that will apply to all other emails, hence the term “extension.”
Extensions to a Retention Policy mean that emails that fall within their spheres of influence will be subject to a longer policy than that of other emails, and these extensions can be stacked, layered and automated to create very refined policies to reach the desired data withholding goals.
Department Extensions allow for independent retention policies for different departments within your organization. Human Resources and Finance/Accounting are very common departments who receive a longer retention policy than the average employee due to the types of data they work with, but many organizations also place their C-Suite or other heavies in longer term retention policies to ensure that valuable data is preserved. Multiple departments can be configured for different groups, each having their own policy.
Departments can be be created manually in DataCove via upload of email addresses, but these can become inaccurate over time as employee turnover occurs or if they are otherwise not kept up to date.
Departments can be automatically created and updated via LDAP synchronization and import of existing Active Directory groups; this is highly recommended to automate the onboarding and offboarding of employees over time and is usually already in play at most organizations.
While uncommon, employees who move between departments should have their group affiliation left configured for both departments if the retention policies are longer in the original department, as work products that user created while in Department A would be subject to deletion sooner than desired after the user moves to Department B. Alternatively, configuring an individual Address extension for that user (described further below) is a good option.
Users who are entirely removed from a group, such as an employee no longer with the organization who is being outright deleted from the Directory system, will automatically fall under the global retention policy of the system. To that end, it’s best to create longer term “retention groups” for each department that has a policy extension. For example, an employee in Human Resources, which has a seven year retention policy, leaves the company. Their Directory entity should then be placed into a newly created “Seven Year Retention” group which has been synchronized to DataCove so that their emails won’t be deleted prematurely, but instead still held for seven years. Deleting the user outright would render retention policies that apply to them based on their membership invalid, but the usual disabling of access and changing of password will perform the same without potentially losing any data earlier than desired.
Domain Extensions are useful for organizations who are merging with another organization, or for school districts/universities that provide student email at a different domain or subdomain. These policies can be used to retain their emails for a different length of time than the main organization’s data, effectively allowing for multiple organization’s emails to be independently retained to their specific needs.
Simply emplacing the “@domainname.com” in the extension field is sufficient for applying the policy to the entire domain. DataCove will automatically add the “%” Percentage wildcard in front of the domain to ensure that any email address from that domain gets reserved for this policy.
Address Extensions are arguably the most useful extension, as they allow for individual email addresses to have their own independent retention schedule.
These are the very first step to take in the event of a Legal Hold, eDiscovery event or other investigation on a user, as preservation of their data in a manner that does not impede the retention schedules of other users and groups is critical. These are often used as “Forever” policies for users under litigation, as litigation can take many years to settle and that data should be withheld during that time.
These can also be used for various “troublemakers” at organizations who have had a past history of getting into legal or administrative trouble, or ones that are likely to encounter such issues in the future.
Some organizations also use these for individuals in the C-Suite or other high level employees who are more likely to receive a lawsuit or investigation as a result of their position rather than any wrongdoing.
Tag Extensions are where we start reaching the truest forms of granularity in retention; these are retention schedules that can be applied to individual emails.
Tags are a means of marking an email for easy retrieval in the future, and they can automated to apply themselves to emails when certain words are used in an email, when emails are sent or received by certain addresses or nearly any other search criteria that the DataCove can work with. In some of the retention schedules that the state’s provide, they often recommend using keyword matching for retention, by instructing employees to use an “Archive” or “Retain” keyword in their emails that designates them as important for future retention. In this use case, configuring an automated search and tag in DataCove will trigger emails matching that keyword into being placed into this extended retention policy.
These can be used for Legal Holds as well, by tagging any emails of relevance and assigning a “Forever” policy to them, but it is recommended to use the dedicated Legal Hold feature for that purpose.
Folder Extensions provide an extended retention period for all emails found within an end user’s folders that match a certain name. All emails contained within that folder will be subject to the extension, as well as any subfolders contained within.
Folder-based retention is another variation of instructing users to store emails of value in particular locations within their mailbox for future eDiscovery, and in situations where this method of document retention has been employed already, this particular feature can effectively tie in with the in-built corporate culture and preserve the emails as intended.
Folder Extensions do require the Microsoft EWS Services to be configured and active for this to operate. Binding against either a local Exchange server or Office 365 is supported.
Purge Frequency Scheduling
After a Retention Policy has been configured, the DataCove will need instruction on how frequently these policies should be enforced. The DataAging Purge process is extraordinarily resource intensive and is best run after work hours have ended and over weekends. Depending on how many emails the organization receives that are subject to deletion, the process can last as short as a couple of hours or as long as several days. While the DataCove is entirely usable during this window, the Purge process generally puts about a 20% overhead on system performance while running, which will impact search and export performance while live.
The Frequency is best recommended to be run Monthly on a Friday night after the work day has ended so that the system has the largest window of time available for it to get through it’s workload before the start of business on Monday.
Higher frequencies like Weekly and Daily are useful for organizations who are running low on space on the DataCove or who otherwise need to remove data rapidly after it passes its retention schedule. These likewise should be run on a day and time that maximizes the amount of time DataCove will be unused by the organization.
Note: If a DataCove has been in production use for some time and a Retention Policy is being implemented for the first time, it’s highly recommended to stagger the DataAging Purge runs into 6 month or 1 year chunks while working down to the intended policy duration, rather than instituting a 5 year policy on 10 years of data all in one go. In this example context of 10 years of data being live on a DataCove but a 5 year policy having been decided on, setting the Retention Policy to 9 years and then executing it, waiting for the Purge to finish and then setting it to 8 years, executing it again, etc, until the duration reaches the intended 5 year policy is the best method of stepping the policy to the desired state without creating a huge amount of work for the system all at once.
Purge Notification Email sends a notification of completion to an email address (or multiple email addresses; semicolon separated), along with a report of what the policy was set for, how many emails were determined to over the threshold for the policy and any emails that were excepted from the purge via an extension or Legal Hold that applied to them. This is simply informational and helps administrators keep track of when purges are running and what the new outer limit of the DataCove’s email range is.
Purge Backed Up Email Only defines whether the emails held on DataCove should be subject to a safety mechanism of not only being live on the DataCove’s disk array, but also backed up somewhere off of the system’s disk array, before they can be removed from the system. This particular feature is a legacy element of the days when DataCove’s used tape drives for long term backups and off-system archival and has since been superseded by the Remote Backup. Given that the Remote Backup is also affected by the system’s Retention Policy, it is recommended to always set this to “Purge emails whether or not they are backed up” to avoid potentially limiting the DataAging Purge from running in situations where a Remote Backup is not configured or not scheduled to run.
Lastly, checking the Disclaimer box at the bottom of the page is necessary to save any changes. The Disclaimer is a reminder that this function is designed to permanently and irrevocably remove data from the DataCove, which has the potential to cause data spoliation from the compliance and legal standpoints if performed incorrectly or at an inappropriate time (such as after immediately receiving a subpoena or after an undesirable public event).
Once a policy has been configured as desired, select the Save button at the bottom of the page.
At the next interval for the DataAging Purge to initiate, as dictated by the Frequency setting, the Purge will kick off and begin removing emails older than the defined global Retention Policy sans any emails that are subject to extensions. It will repeat on that same frequency for every run thereafter unless unscheduled.
This concludes the setup guide for the DataCove Retention Policy feature.