As a data analyst, you might face all types of data requests from different teams every day. Imagine that your company’s sales team asks to see the information about the new customers, then you will need to write SQL queries to extract data from the database, cleanse the data, save the result as a CSV file and email it to them a few days later. If the sales team requests the latest data about the newly registered users, then you will need to repeat the process and send the data to them again. As time goes by, your laptop can store a bunch of duplicate files with different names and versions.
In fact, data analysts spend most of their time on these repeated works. Due to security issues, not everyone can access the database. People need permissions and technical skills to retrieve data. Consequently, whoever needs data has to rely on analysts and engineers. Furthermore, when data is exported as CSV files, it is typically static. That means it will not be updated automatically. However, there is always new data coming in from your production systems. Data analysts or engineers have to send files with the latest data all the time.
Although distributing data is time-consuming, a good set of tools and strategies can help you save a lot of time.
Here we provide 4 strategies to improve efficiency in sharing data.
Strategy 1: Centralized data in the same place
Often you need to extract data from several different sources and combine them together for your stakeholders. Data may come from SQL databases or third-party apps (such as CRM or ERP software). Some apps support exporting data to each other, whereas some of them only support exporting as .csv or .xlsx files. In any case, acquiring data from different sources can be a hassle.
Using a cloud-based data management platform can help you bring these data sources together onto the cloud. These platforms allow you to connect to SQL databases or third-party apps so that you can manage these datasets in one unified format. You can retrieve data easily, and avoid creating duplicate data copies.
Strategy 2: Track the history of data editing
Rarely do we share the entire database with someone outside of your team. The raw data includes all kinds of information, but usually, people only need to see certain metrics or columns. That is one of the reasons why we need to segment the database.
To segment the database, you will need to define conditions to filter out unrelated information after aggregating or summarizing your data. It is typically better to remember what you have done with the data. The data recipient may have some questions about the data transformation, or you may need to edit the transformation process one day.
Some platforms have a built-in tool to allow you to check the edit history. Take Acho Studio as an example. It provides a feature called Timeline. It records all editing actions and presents the relationship between tables. Thus, you can see what you have done with these tables. When you share projects with others, the receivers can see these actions as well. Thus, they can know what kind of filters you use and how you aggregate the data.
Strategy 3: Manage permissions
Using flat files as a tool to share data is certainly convenient. However, it’s hard to track your recipients’ usage after sharing it. They may edit the dataset by changing the row values, or column names. They may also share the data with other people as well. Since you can’t really limit their behavior, security and compliance issues may occur.
A cloud-based data platform can help you to define what your receivers can do with the data. For example,
- Can they edit data, such as applying filters or formulas?
- Can they import new tables or delete existing tables?
- Can they share data with others?
- Can they download or export data to BI tools?
Moreover, it is better to have a list of people that are able to access the data and what they can do with it. Thus, you can manage and adjust their permissions easily.
Strategy 4: Set up a scheduler to update data automatically
Another problem with flat files is that they cannot update automatically. Every time your database updates, you need to send a new table to your teammates or stakeholders. Your local folders accumulate more and more data files with different versions. They occupy lots of space in your computer and make your folder look messy. Managing these files becomes harder.
One way to solve this problem is to set up schedulers to update data automatically. The schedulers can retrieve data at intervals, so your table always has the latest data. You will not have duplicate datasets in your folder anymore.
For a data analyst, segmenting databases is perhaps one of the many tasks in your daily routines. It isn’t excessively hard but it is very trivial and fragmented. These strategies and useful tools can save you a lot of effort and allow you to focus on the truly important tasks.
Hope this article helps. If you need to segment your database more efficiently, sign up for Acho and give it a try. Should you have any questions, chat with us or email us at firstname.lastname@example.org.