The evolution of data architecture has been driven by the growing importance of data in organisations. From traditional data warehouses to modern data fabric and data mesh approaches, these architectural approaches have overcome specific challenges and opened up new opportunities.   

  • The 70s: Hierarchical and network databases

    In the 1970s, computer systems were mainly dominated by centrally managed mainframes. Data was organised using hierarchical or network database models. These models offered the possibility of organising data in a database in different ways - whether in a hierarchical structure that represented a relationship from one element to another, or in a network that linked many elements together. 

  • The 80s: The client-server model

    In the late 1980s and early 1990s, a new paradigm of data architecture emerged with the advent of the client-server model. This model meant a move away from centralised mainframe systems towards a distributed system in which responsibilities were divided between servers (providers of resources or services) and clients (users of these services). In the area of databases, this meant that the database software (DBMS) could be installed on a server, while users or applications could access the data from client computers. This approach revolutionised scalability and accessibility and simplified the management of growing volumes of data and an increasing number of users. 

  • The 90s: Traditional Data Warehousing

    In the late 1990s, the concept of data warehousing fundamentally changed how companies approached the storage and analysis of data. At its core, a data warehouse is a large, centralized repository for data from various sources.

    The architecture uses a three-tier structure: the data source layer, the data warehouse layer, and the front-end client layer. ETL processes (Extract, Transform, Load) were used to pull data from different operational databases, convert it into a consistent format, and then load it into the data warehouse. The data was typically stored in a relational database and organized based on an OLAP cube model (Online Analytical Processing), which allowed for complex analytical and ad-hoc queries.

    Architektur Data Warehouse Informatec
  • The 2000s: Big Data and Hadoop

    In the 2000s, the proliferation of the internet, social media, and IoT devices led to a drastic increase in data volume, variety, and velocity—giving rise to what is known as "Big Data." Traditional data warehouses could no longer effectively handle these heterogeneous, large volumes of data generated at high speeds.

    The open-source framework Hadoop revolutionized data architecture starting in 2005. It was specifically designed for processing massive amounts of data in computer clusters. The framework introduced the concept of distributed storage and processing, meaning that data was no longer confined to a single storage location but could be stored and processed across multiple nodes.

  • The 2010s: Cloud and Data Lake Architectures

    In the 2010s, the concept of cloud computing emerged as a new paradigm, providing scalable resources as a service over the internet. This development had significant impacts on data architecture and led to the creation of data lakes. Unlike traditional data warehouses, which use the ETL process (Extract, Transform, Load) to ingest data, data lakes employ an Extract-Load-Transform (ELT) process. Data extracted from various sources is first loaded into cost-effective BLOB storage, then transformed, and finally transferred to a data warehouse using expensive block storage.

    The need to process large volumes of data in real time gave rise to the Lambda and Kappa architecture models. The Lambda architecture employs a hybrid approach, utilizing both batch and stream processing to gain accurate and up-to-date insights. All incoming data is captured and stored as an append-only log, creating an immutable historical record. This architecture is divided into three layers: the Batch Layer, the Speed Layer, and the Serving Layer. In the Kappa architecture, all data is ingested and processed as an unbounded stream of events. This architecture consists of three main components: stream ingestion, stream processing, and long-term storage. 

    Architektur Data Lake Informatec
  • The 2020s: Data Lakehouse

    Data Lakehouses represent a new generation of data platforms: a Data Lakehouse combines the advantages of Data Lakes and Data Warehouses to store structured, semi-structured, and unstructured data in a unified Data Lake. This eliminates the need for separate data silos and allows data teams to perform analyses and derive insights directly from raw data without the need to move or duplicate data. The Medallion Architecture, also known as the "Multi-Hop" architecture, is used for the logical organization of data in a Lakehouse. Its goal is to gradually and progressively improve the structure and quality of data as it flows through each layer of the architecture (Bronze – Silver – Gold).

    Architektur Data Lakehouse Informatec
  • The 2020s: Data Fabric

    The Data Fabric represents the fourth generation of data platform architecture. Its goal is to make data available anytime and anywhere. A Data Fabric consists of a network of data platforms such as Data Warehouses, Data Lakes, IoT/Edge devices, and transactional databases that interact with each other and are distributed across the enterprise's computing ecosystem. One node in the fabric can supply raw data to another, which then performs analyses. These analyses can be provided as REST APIs within the fabric, allowing them to be used by transactional systems for decision-making. Data assets can be published in various categories, enabling the creation of an enterprise-wide data marketplace. 

    Informatec Architektur Data Fabric
  • Future Concept: Data Mesh 

    Data Mesh is an architectural concept for organizing data in large enterprises. Instead of storing and managing data centrally, it is decentralized in a Data Mesh. This means that data remains within individual domains or business areas, and mechanisms are introduced to enable access and exchange between these domains.

    Data Mesh is typically based on four principles: domain orientation, self-service, data productization, and infrastructure automation. By implementing a Data Mesh, companies can respond more flexibly to changes, as data management is tailored to the specific needs of individual business areas, while simultaneously increasing the scalability and reusability of data.

    Architektur Data Mesh Informatec

Comparison of Data Architectures

Data Warehouse remains the most common Data Architecture Model

Although new architectures like Data Lakes and Data Meshes are gaining importance, Data Warehouses remain the most common data architecture variant today. They have established themselves as a proven method for centrally storing and analyzing large volumes of structured data. Companies value their reliability and stability, which they have demonstrated over the years. Additionally, Data Warehouses are closely integrated with Business Intelligence (BI) and analytics tools, enabling seamless analysis of stored data.

Another important aspect is the ability of Data Warehouses to efficiently store and process historical data. This allows companies to identify trends, patterns and changes over time and make informed decisions. The centralized storage and management of data in Data Warehouses also support high data quality and consistency, which is crucial for businesses.

Modern Data Warehouse technologies also offer scalability options that allow companies to expand their infrastructure as needed to keep up with the growth of data volumes. 

Architecture Selection must be based on the Needs

There is no universal architecture suitable for all use cases and every company. Rather, the choice of the appropriate architecture is determined by a variety of factors. These include both current and future use cases, the diversity of the data landscape, as well as the technologies and platforms used. Each organization has its own requirements and challenges that may necessitate a tailored architecture. Therefore, it is essential to develop an architecture that meets both current and future needs while being flexible enough to adapt to changing requirements.

  • DATA WAREHOUSE

    Technology: DBMS

    Platforms: On-prem or Cloud

    Data sources: Structured 

    Data Integration: Batch 

    Data models: Dimensional, data vault 

    Data quality: Secured

    Data Governance: Centralized

    Importance of Metadaten: Medium 

    Usage: Standard reports, ad-hoc analysis

  • DATA LAKE

    Technology: Object Stores

    Platforms: Cloud

    Data sources: All Data

    Data Integration: Copy

    Data models: Schema-less

    Data quality: Unverified

    Data Governance: Undefined

    Importance of Metadaten: Low

    Usage: Data Science

  • Lambda/Kafka

    Technology: Streaming

    Platforms: On-prem and/or Cloud

    Data sources: Structured and semi-structured

    Data Integration: Stream und Batch

    Data models: Stream and modeled

    Data quality: Monitoring of streams

    Data governance: Minimally defined

    Importance of metadata: Low to medium

    Usage: AI-driven real-time analytics

  • DATA LAKEHOUSE

    Technology: DBMS and Object Stores

    Platforms: Cloud

    Data sources: Hybrid

    Data integration: Copy and Batch

    Data models: Hybrid

    Data quality: Partially secured 

    Data governance: Central 

    Importance of metadata: Medium 

    Usage: Standard reports, ad hoc analysis, data science

  • DATA FABRIC

    Technology: Data Virtuality

    Platforms: On-prem and/or Cloud

    Data sources: Structured

    Data integration: Virtual

    Data models: Dimensional, data vault

    Data quality: Monitoring

    Data governance: Hybrid

    Importance of metadata: High

    Usage: Standard reports, ad hoc analysis

     

  • DATA MESH

    Technology: Various formats and data catalogs

    Platforms: On-prem and/or Cloud

    Data sources: Hybrid

    Data integration: Copy, Batch, Stream

    Data models: Hybrid

    Data quality: Decentralized

    Data governance: Decentralized

    Importance of metadata: High

    Usage: Standard reports, ad hoc analysis, Data Science, AI-driven real-time analysis

     

Data architectures overview

Datenarchitektur Geschichte Informatec

As experienced Modern Intelligence experts for holistic end-to-end Data Intelligence, we support companies in the selection and construction of a customized and future-proof data architecture. Get in touch!

Our data management services