Exploring Different Types of Databases

Data Engineering Database

Just a couple days ago I was lucky enough to attend the after party for Microsoft Ignite 2025 event with my dear friend. The event hosted with CockroachDB about the limitation of the current stack of technologies that hold up AI, and the future prospect of architectural changes of AI-native solutions. Before attending, we wanted to be prepared for some questions and answers so we studied about different databases and how they are being used right now. I found out that there were much more variety of types of databases than I knew and found it very interesting, so this post will not be much so about the architecture, but rather an overview of different types of databases and how they are being used in different scenarios.

(I swear there was a hell lot more people…we just sat early).

afterparty with my friend :)

Preliminary: What is a database?

Before getting into different types of databases, let’s first define what a database is. In a strict term, a database is defined as

A structured set of data held in a computer, especially one that is accessible in various ways.

As defined, database refers to the container that wraps your dataset residing in a computer, while the DBMS is what you interact to call and modify the data. Let’s take PostgreSQL as an example. Say you installed PostgreSQL to manage your data, and you create your first database through CREATE DATABASE my_first_db;. Then, PostgreSQL is the DBMS (the software you interact with to create, query, and modify data), while the my_first_db you created (the structured collection of tables, indexes, and data stored on disk) would be your actual database.

It is common to mix up between the term database with database management system (DBMS), as they are used quite interchangeably. When talking about a database generally people are referring to either one or both the DBMS and database, so just keep in mind that very strictly speaking those two are different and just move on (don’t be that a**hole saying “uh strictly speaking that’s not true blah blah blah”). From hereafter I’ll also be referring to both DBMS and database just as a “database.”

The Category of Databases

There are many, MANY different kind of databases. Even from a simple google search you can get a long list of “top 15 databases and when to use which.” But those posts never satisfied me in being able to understand the broader categorization of the different types as they just gave a long list of different databases. Here, I categorize the different databases in three different categorization: by data model (how data is structured inside the DB), by workload patterns (demand or primary use case of the DB), and by architecture (how the DB itself is structured). Note that the broader categories are not a strict set; i.e., a database can be in both (or more) categories. For example PostgreSQL is both a relational database and a OLTP database.

Categorization by Data Model

Relational

Relational databases are probably the most common type of database you will run into. Relational DB organizes data into structured tables with rows and columns, where data are linked as relations to each other. It is also commonly referred to as SQL databases as it uses Structured Query Language (SQL) for defining, manipulating, and querying the data.

Properties:

  • Data is formatted in a structured, tabular format, with fixed data types. I.e., has a fixed schema.
  • ACID (Atomicity, Consistency, Isolation, and Durability) properties to ensure reliability and data integrity across transactions.
  • Easier to scale vertically (more CPU, RAM, or SSD).

Use cases:

  • Financial systems like banking.
  • E-commerce applications.
  • CRM (Customer Relationship Management) applications.

Some common examples of relational DBs are PostgreSQL, MySQL, Microsoft SQL Server, SQLite, etc…

Non-Relational

Also commonly referred to as NoSQL databases, as the name suggests, refers to a broad category of databases that are non-relational. Probably the most important characteristic of NoSQL databases is that they are schema-less which enables high flexibility when structuring your data. Below are some of the important types of NoSQL DBs.

Document

Document databases store data in a flexible, semi-structured data (referred to as documents). They often use open, human-readable formats like JSON/BSON or XML as their document format.

Properties:

  • Flexible schema.
  • High performance and scalability.
  • Easy to scale horizontally (adding more servers).

Use cases:

  • Social media and blogging
  • IoT applications
  • Real-time data applications

The most common document databases are MongoDB and Amazon DocumentDB.

Key-Value

Key-value databases are probably the simplest type of database in terms of its data model. As the name suggests, it stores data as a collection of key-value pairs, where the key is a unique identifier and the value can be anything from a simple string to a complex object. Think of it as a giant hash map or dictionary.

Properties:

  • Extremely fast read/write operations due to its simple structure.
  • Highly scalable horizontally.
  • No complex queries; you retrieve data by its key, and that’s pretty much it.

Use cases:

  • Session management and caching.
  • User preferences and settings.
  • Shopping carts in e-commerce.

Common examples are Redis, Amazon DynamoDB, and Memcached.

Graph

Graph databases store data as nodes (entities) and edges (relationships between entities). Unlike relational databases where relationships are expressed through foreign keys and joins, graph DBs treat relationships as first-class citizens, making traversal between connected data extremely efficient.

Properties:

  • Optimized for querying relationships and traversing connected data.
  • Flexible schema for nodes and edges.
  • Uses specialized query languages like Cypher (Neo4j) or Gremlin.

Use cases:

  • Social networks (friends, followers, connections).
  • Recommendation engines.
  • Fraud detection (finding suspicious patterns in transaction networks).
  • Knowledge graphs.

Neo4j is the most well-known graph database. Amazon Neptune is another popular option if you’re in the AWS ecosystem.

Vector

Vector databases are the new cool kid on the block, gaining massive popularity with the rise of AI and machine learning applications. They store data as high-dimensional vectors (embeddings) and are optimized for similarity searches—finding vectors that are “close” to each other in vector space.

Properties:

  • Optimized for similarity/semantic search using algorithms like approximate nearest neighbor (ANN).
  • Handles high-dimensional data efficiently.
  • Often integrates with ML pipelines and embedding models.

Use cases:

  • Semantic search (search by meaning, not just keywords).
  • Recommendation systems.
  • RAG (Retrieval-Augmented Generation) for LLM applications.

If you’re building anything with LLMs these days, you’ve probably heard of Pinecone, Milvus, Weaviate, or Chroma. PostgreSQL also supports vector operations through the pgvector extension, which is nice if you want to stick with a familiar relational DB.

Spatial / Geo-Spatial

Spatial databases (also called geospatial databases) are optimized for storing and querying geographic or location-based data—points, lines, polygons, and other geometric shapes.

Properties:

  • Supports spatial data types (points, polygons, geometries).
  • Specialized spatial indexes for efficient geographic queries.
  • Spatial query operations like “find all points within this polygon” or “find the nearest neighbors to this location.”

Use cases:

  • Mapping and navigation applications.
  • Location-based services (finding nearby restaurants, stores, etc.).
  • Urban planning and GIS (Geographic Information Systems).
  • Logistics and delivery route optimization.

PostGIS (an extension for PostgreSQL) is the most widely used spatial database. MongoDB also has decent geospatial support built-in.

Time Series

Time-series databases are optimized for, as the name suggests, handling data points indexed by time. While you could store time-series data in a regular relational database, time-series DBs are specifically built to handle the high write throughput and time-based queries that come with this type of data.

Properties:

  • Optimized for high-volume writes with timestamps.
  • Efficient time-range queries and aggregations (e.g., “average CPU usage over the last hour”).
  • Often includes built-in data retention policies and downsampling.

Use cases:

  • IoT sensor data.
  • Financial market data (stock prices, trades).
  • Event logging and analytics.

InfluxDB and TimescaleDB are the go-to options here. TimescaleDB is actually built on top of PostgreSQL, so if you already know Postgres, the learning curve is minimal.

Categorization by Workload Patterns

OLTP

OLTP stands for Online Transactional Processing. OLTP databases are optimized for handling a large number of short, atomic transactions—think inserts, updates, and deletes that need to happen fast and reliably. Your typical web application backend is an OLTP workload.

Properties:

  • Optimized for fast read/write operations on individual rows.
  • Strong ACID compliance to ensure data integrity.
  • Handles many concurrent users/transactions.
  • Typically row-oriented storage.

Use cases:

  • E-commerce order processing.
  • Banking transactions.
  • User authentication and session management.
  • Any application where users are constantly creating and updating records.

Most relational databases like PostgreSQL, MySQL, and SQL Server are designed with OLTP workloads in mind.

OLAP

OLAP stands for Online Analytical Processing. While OLTP is about processing individual transactions quickly, OLAP is about running complex analytical queries across large datasets—aggregations, groupings, and crunching numbers across millions (or billions) of rows.

Properties:

  • Optimized for complex read-heavy queries across large datasets.
  • Often uses columnar storage (stores data by column rather than row) for efficient aggregations.
  • Typically handles fewer concurrent queries but each query touches a lot of data.
  • Often fed by data from OLTP systems through ETL/ELT pipelines.

Use cases:

  • Business intelligence and reporting dashboards.
  • Data warehousing.
  • Historical trend analysis.
  • Ad-hoc analytical queries.

Common OLAP databases include cloud-native data warehouses like Snowflake, Google BigQuery, Amazon Redshift, and ClickHouse.

HTAP

HTAP stands for Hybrid Transactional/Analytical Processing. As the name suggests, it is a single system that supports both fast, real-time operational tasks (like OLTP) and complex large-scale analytical queries (like OLAP) simultaneously. Since we covered functionalities and properties of both OLTP and OLAP above, HTAP sorta speaks for itself and I won’t go much in depth about them.

The appeal of HTAP is avoiding the traditional architecture where you have separate OLTP and OLAP systems with ETL pipelines syncing data between them. With HTAP, you can run analytics directly on your operational data in real-time.

Examples include CockroachDB and SingleStore. Whether HTAP actually delivers on its promise or is just marketing buzz is still debated, but it’s worth knowing about.

Search databases (or search engines, though that term is overloaded) are optimized for full-text search and complex text-based queries. While you can do basic text search with LIKE or even using a full-text search in PostgreSQL, dedicated search databases takes it to another level.

Properties:

  • Inverted indexes for fast full-text search.
  • Support for fuzzy matching, relevance scoring, and ranking.
  • Often handles semi-structured data (JSON documents).
  • Features like tokenization, stemming, and synonym handling.

Use cases:

  • Site-wide search functionality (ex: search engines like Google and DuckDuckGo).
  • Product search in e-commerce.
  • Log aggregation and analysis.
  • Auto-complete and suggestion features.

Elasticsearch is the dominant player here, often paired with Kibana for visualization (the “ELK stack” with the addition of Logstash). Apache Solr is another option, though Elasticsearch has largely taken over the market.

Categorization by Architecture

Centralized vs. Distributed

This one’s pretty straightforward. Centralized databases run on a single server or node. All your data lives in one place, which makes consistency easy but creates a single point of failure and limits scalability.

Distributed databases spread data across multiple nodes or servers. This gives you better fault tolerance (one node goes down, others pick up the slack) and horizontal scalability, but introduces complexity around data consistency and network partitions. You’ll often hear about the CAP theorem here—the tradeoff between Consistency, Availability, and Partition tolerance.

Most traditional relational databases like PostgreSQL and MySQL started as centralized systems, though they now support various replication and clustering setups. Databases like Cassandra, CockroachDB, and MongoDB were built from the ground up to be distributed.

On-premise vs. Cloud-native

On-premise means you’re running the database on your own hardware (or at least hardware you manage). You handle installation, configuration, scaling, backups, and everything else. More control, more responsibility.

Cloud-native databases are designed to run in cloud environments and are typically offered as managed services (like Saas or PaaS services). You don’t worry about the underlying infrastructure—just use the database. They often have built-in auto-scaling, automatic backups, and pay-as-you-go pricing.

Examples of cloud-native databases are: Amazon RDS, Amazon Aurora, Google Cloud Spanner, PlanetScale, and Supabase. You can also run traditional databases like PostgreSQL on the cloud yourself, but then you’re back to managing infrastructure. The trend nowadays is clearly moving toward managed cloud databases due to its simplicity, scalability and elasticity, and low costs.

Disk based vs. In-Memory

Disk-based databases store data primarily on disk (SSD or HDD). This is the traditional approach—data persists even if the server restarts, and you can store way more data than would fit in RAM.

In-memory databases keep all (or most) data in RAM for extremely fast access. The tradeoff is that RAM is more expensive and volatile—if the server crashes, you might lose data unless you have persistence mechanisms in place.

Redis is the classic in-memory database example, though it does have persistence options. Many disk-based databases also use memory heavily for caching, so the line can be blurry. Use in-memory databases when speed is critical and your dataset fits in memory—caching like session storage or real-time leaderboards.

Embedded vs. Client-Server

Client-server databases run as a separate process (or multiple processes) that your application connects to over a network or socket. PostgreSQL, MySQL, MongoDB—these all run as servers that your application talks to as a client.

Embedded databases run directly within your application process. No separate server, no network calls—the database is just a library linked into your application. This makes deployment simpler (just ship your app) and eliminates network overhead, but means the database is tied to that single application instance.

SQLite is the poster child for embedded databases—it’s everywhere, from your phone to your browser and even in CLI applications. DuckDB is gaining popularity as an embedded OLAP database.

Embedded databases are great for mobile apps, desktop applications, edge devices, or anywhere where running a separate database server would be a overkill.

Ending notes

That’s about it! Overall I had a great time learning about the different types of databases and discussing about the future architectural requirements for AI native applications. The Microsoft afterparty event with CockroachDB mainly covered about the importance of OLAP and OLTP databases, and the necessity for the combination of those two through Real-Time streaming. Here, we covered above and beyond just OLAP and OLTP. Whatever your use case might be, I hope that this post helped you learn the strengths and weaknesses of different databases so you know when to use which. Remember, there’s no one-stop-for-all, perfect solution. Every decision comes with a cost and sacrafice; it is ultimately up to you to decide which fits the best for your own use case.

Lastly, here’s a quick summary table:

CategoryTypeWhat It IsWhen to Use
Data ModelRelationalTables with rows/columns, linked by relations. Fixed schema, ACID compliant.Financial systems, e-commerce, CRM—anywhere data integrity matters.
DocumentSemi-structured data (JSON/BSON). Flexible schema.Social media, IoT, real-time apps—when your schema evolves frequently.
Key-ValueSimple key-value pairs. Like a giant hash map.Caching, session management, shopping carts—when you just need fast lookups by key.
GraphNodes and edges. Relationships are first-class citizens.Social networks, recommendations, fraud detection—when relationships are the focus.
VectorHigh-dimensional embeddings. Optimized for similarity search.Semantic search, RAG for LLMs, image similarity—anything AI/ML related.
Time-SeriesData indexed by time. High write throughput, time-range queries.Monitoring, IoT sensors, financial market data, event logging.
SpatialGeographic data (points, polygons). Spatial indexes and queries.Mapping, location-based services, logistics, GIS.
WorkloadOLTPFast, short transactions. Row-oriented. Many concurrent users.Web app backends, order processing, banking—operational day-to-day work.
OLAPComplex analytical queries across large datasets. Column-oriented.BI dashboards, data warehousing, historical analysis, reporting.
HTAPHybrid of OLTP and OLAP in one system.When you need real-time analytics on operational data without ETL pipelines.
SearchFull-text search with relevance scoring, fuzzy matching.Product search, log analysis, site search, auto-complete.

Thanks for reading my blog post.