The History of Databases

Databases

The appearance of the term “database” coincided with the availability of direct-access storage (mid-60’s). The term represented a contrast with the tape-based systems of the past, allowing shared interactive use rather than batch processing.

Early computer operator

Network databases

General-purpose database systems emerged during that time.

CODASYL: primary key, navigation relationships, or scanning.
IBM IMS (Information Management System), running on the System/360, initially developed for the Apollo program. Strictly hierarchical.

Both IMS and CODASYL classify as network databases.

1970’s relational databases

Edgar Codd, at the time working for IBM on storage, became frustrated by the limitations imposed by network databases.

Initial idea: tables of fixed-length records. Different table for different types of identity.

The relational part comes from the capability of entities referring to other entities.

A relational database can express both hierarchical or navigational models, as well as its native tabular model.

To query it, the author of the relational algebra proposed a set-oriented language, which would later spawn SQL.

Based on Codd’s paper, INGRES was created by two people at Berkeley (Eugene Wong and Michael Stonebraker).

Late 1970’s: SQL DBMSs

IBM started working on an implementation of Codd’s paper named System R, first single-table and then later, in the late 70’s, on a multi-table implementation.

Later, multi-user versions were developed and were tested by customers in 1978 and 1979, by which time a standardized language named SQL had been added.

The success of these experiments led IBM to create a true production of System R known as SQL/DS and later, Database 2 (DB2).

By around the same time, Larry Ellison developed the Oracle database, based off IBM’s papers on System R, and released Oracle version 2 in 1979.

Stonebraker took his learnings from INGRES and started a project named Postgres (now known as PostgreSQL), since then used in many mission-critical applications.

1980’s on the desktop

Early computer system

Spreadsheet software (like Lotus 123) and database software (like dBASE) appeared for desktop computers. dBASE became a huge success during the 80’s and 90’s.

Document databases

Document databases emerged to meet the needs of applications requiring flexible schemas and fast iteration cycles—typical of modern web and mobile development. Instead of rigid tables and rows, data is stored as JSON or BSON documents, which can easily evolve without requiring migrations.

By avoiding join operations and relying on nested data structures, these databases can provide performance benefits for read-heavy workloads. Popular document stores include MongoDB, CouchDB, and Amazon DocumentDB. Their design also makes them well-suited to horizontal scaling, often via sharding.

Now

Distributed computing

In recent years, the explosive growth of data and the ubiquity of cloud computing have driven demand for massively distributed databases with high availability and fault tolerance. However, the CAP theorem highlights a fundamental tradeoff: in the presence of a network partition, a distributed system can guarantee either consistency or availability, but not both.

As a result, many modern systems favor eventual consistency—a model in which all updates will propagate through the system over time, and all replicas will eventually become consistent, assuming no new updates are made. This approach allows systems to remain available even when parts of the network are unreachable.

Examples of distributed databases embracing this model include:

Cassandra – known for high write throughput and tunable consistency.
Amazon DynamoDB – inspired by the Dynamo system described in Amazon’s influential whitepaper.
Riak – designed for fault tolerance and ease of scaling.

The future

Future possibilities

The landscape of databases continues to evolve. Trends shaping the future include:

NewSQL: efforts to combine the consistency and usability of traditional SQL systems with the scalability of NoSQL. Examples include CockroachDB, Google Spanner, and TiDB.
Multi-model databases: databases like ArangoDB and OrientDB that support documents, graphs, key-value pairs, and relational data, all in one system.
Cloud-native and serverless databases: platforms like Firebase, Fauna, and PlanetScale are offering fully managed, globally distributed databases tailored for modern development workflows.
AI and vector databases: the rise of AI and large language models has driven demand for databases that support high-dimensional vector storage and similarity search, such as Pinecone, Weaviate, and FAISS.

Conclusion

From batch-processed tapes to globally distributed, AI-powered systems, databases have undergone radical transformations—always driven by the needs of applications and the capabilities of the underlying hardware. While no single model fits all use cases, the diversity of database systems today reflects the richness of modern computing and the continuing push for performance, flexibility, and scale.

As data continues to grow in size and importance, the evolution of databases remains one of the most critical stories in the history of computing.