Skip to main content

Command Palette

Search for a command to run...

Data Warehouse Introduction

Published
•5 min read
Data Warehouse Introduction

Hello argonauta in recent days I have been studying the creation of the data warehouse and I bring a summary of the most important points that you should know as well as some considerations at the end to take into account when creating our business intelligence projects can be successful.

Business Intelligence

It is a process that is performed in order to obtain, analyze, and present information to users. This process is performed iteratively in order to continuously update the information we obtain.

Elements

  • Graphic information systems.
  • Customer relationship management systems.
  • Knowledge management systems.
  • Knowledge management systems.
  • Decision making systems.
  • Data warehouse.
  • OLAP (Online Analytical Processing).

Data Warehouse

It is a warehouse of large amounts of information that is of utmost importance to the company, through this information that must be timely, reliable and concise, it is possible to perform analysis of the same and allow to obtain results that help to improve decision making within the company.

The data source is updated daily and is available in various formats, including relational databases, excel, csv and many others.

It seeks to answer questions about the current state of a business.

Data Mart

A data warehouse is composed of one or several data mart, which is a view of the data warehouse, whose main function is oriented to a specific aspect of the business, examples of data mart:

  • Sales data mart.
  • Human resources data mart.
  • Organizational performance data mart.
  • Budget data mart.

Facts And Dimensions Table

In order to build a data mart, two important concepts are required:

  • The fact table, is that table that contains all the concrete aspects of the business that you want to measure, for example: total quantity of sales of a product, quantity of items shipped to a region budgets, etc.

    The fact table can only have the following fields:

    • The primary key.
    • Foreign keys.
    • Metric (numeric value).
  • The dimension table is the one that represents the information through which you want to measure the facts. Examples: time, regions, products, models, etc.

    It must include all the information that gives meaning to the numerical value represented by the metric.

Data Warehouse Models

Star Model

The central table is the fact table and at the end of the model represents the dimension tables.

Star model

Snowflake Model

Unlike the star model the snowflake is that each dimension table can be related to another dimension table. And that other dimension table is not going to relate to the fact table.

Star model

It is important to mention that standardization is not necessary since it is not relevant for the design of these models.

ETL

This process is known as Extraction, Transformation and Loading of data.

It is the process that consists of the extraction of data from different sources, the extracted data is transformed to the data required by the business and finally the data is loaded into each of the data marks that make up the data ware house.

OLAP cubes

It is a structure that allows to visualize the information from different perspectives, and according to the perspective of the cube, the information can be interpreted in one way or another.

Star model

According to the perspective or view you have of the cube, it will be analyzed in one way or another.

Set of tools to build a data warehouse

Among the tools that can be used to build a data warehouse, the following can be used:

Integration Services → Tool to build ETL.

Analysis Services → Tool to build the OLAP cube

Report Services → Tool to build the interactive reports.

Decision Support System (DSS).

It is a BI (Business Intelligence) tool which can be used to perform the analysis services of a data warehouse and through which reports can be visualized interactively based on the desired perspectives of the OLAP cube.

Building a Data Mart

To build the data warehouse it is necessary to carry out the following steps:

  • Obtain the questions to be answered for the business.
  • Identify the metric(s) that generate the business questions.
  • Identify the dimension tables.
  • Decide which model is used to design the data mart.
  • From the OLTP database, identify the tables that will be used to populate the fact tables and the dimension tables.
  • Analyze with which fields of the OLTP database the metrics will be obtained.

Considerations

  • It is not convenient that the data warehouse coexists in the same environment as our transaction systems such as an ERP or POS.

    Since it can lower the performance of the server when making queries to large volumes of data or also run the risk of not delivering the information in an agile way.

  • The nature of the data warehouse must be multipurpose, its data must be in a format that supports any and all possible forms of BI analysis.

  • The amount of time in which the data warehouse will be executed must be defined, this can vary according to the requirement of the company since it can be defined between a day, a week, a month or even a year.

    This is due to the fact that the ETL process can take a long time and must continue from the last point where the process was carried out, because if the previous records could certainly be eliminated and the process would not be efficient due to the increase in the volume of records.

  • It is important to define well the questions for the data mart since this will be the main axis for the creation of our model and if we do not have a good model we can face problems such as long response times, inconsistent information, problems to display information among others.