Skip to main content

Data Persistence

Table of Contents

  1. Introduction
  2. Project Structure
  3. Core Components
  4. Architecture Overview
  5. Detailed Component Analysis
  6. Dependency Analysis
  7. Performance Considerations
  8. Troubleshooting Guide
  9. Conclusion
  10. Appendix

Introduction

This document targets data persistence developers, systematically organizes Sparrow data persistence system: repository pattern implementation, multi-datasource support, transaction management mechanism, as well as characteristics and applicable scenarios of various storage backends (memory, Redis, PostgreSQL, MongoDB, BadgerDB). The document also provides repository interface design principles, data access patterns, query optimization strategies, migration and backup recovery ideas and performance tuning recommendations.

Project Structure

Sparrow's persistence layer adopts "generic repository + multi-backend adaptation" architecture:

  • usecase layer defines generic repository interfaces and query models
  • entity layer defines entity contracts
  • persistence/repo contains concrete implementations for various storage backends
  • bootstrap and config provide container and configuration assembly capabilities

Core Components

Repository Interface and Query Model

  • Repository interface uniformly defines CRUD, batch operations, pagination, conditional query, random sampling and other capabilities
  • Query model includes conditions, sorting, pagination parameters

Base Repository

  • Provides default implementation skeleton and error handling template, concrete backend implementations reuse

Entity Contract

  • Unified entity interface, requires ID, create/update time field access and setting capabilities

Architecture Overview

Sparrow's persistence architecture follows "interface segregation + backend adaptation" design:

  • Shields underlying differences through generic repository interface
  • Each backend independent implementation, ensures extensibility and replaceability
  • Transaction management explicitly controlled by implementation layer in SQL/PostgreSQL scenarios
  • Configuration and container responsible for instance assembly and lifecycle management

Detailed Component Analysis

Repository Interface Design Principles

  • Clear Responsibility: Single interface carries complete data access capability, avoid excessive subdivision
  • Unified Convention: All implementations follow same context, error and return specifications
  • Extensibility: Extend new capabilities through generics and query models without breaking existing contracts
  • Consistency: Pagination, conditions, sorting maintain consistent behavior expectations across multiple backends

Core Method Classification

  • Single Entity: Save/FindByID/Update/Delete/Exists
  • Batch: SaveBatch/DeleteBatch/FindByIDs
  • List: FindAll/FindWithPagination/Count
  • Conditional Query: FindByField/FindByFieldWithPagination/CountByField/FindWithConditions/CountWithConditions
  • Random Sampling: Random

Query Optimization Strategy

Index and Scan

  • Redis/MongoDB/Badger: No native index, conditional query is full table scan, recommend combining with pagination and filter range
  • PostgreSQL: Supports field index and composite index, conditional query and sorting can use index to improve performance

Pagination and Sorting

  • Uniformly use Limit/Offset or cursor approach, avoid loading full data at once
  • PostgreSQL defaults to descending by creation time, can build index by business field

Batch Operations

  • Redis uses Pipeline
  • PostgreSQL/SQL adopts transaction-wrapped batch insert/update

Random Sampling

  • PostgreSQL uses RANDOM()
  • MongoDB uses $sample
  • Redis/Badger through random key selection or full scan then random extraction

Transaction Management

PostgreSQL

  • SaveBatch/batch insert/update execute within transaction, automatic rollback on failure
  • Soft delete: Implemented by marking deleted_at field

SQL Generic Repository

  • Save/SaveBatch/batch operations all execute within transaction
  • Supports soft delete (DeletedAt field exists) or hard delete

Redis/Badger/MongoDB

  • No built-in transaction abstraction, need to coordinate consistency in upper business use case
  • Redis supports Pipeline to improve batch throughput and atomicity

Storage Backend Characteristics and Applicable Scenarios

Memory Storage (Memory Cache/Temporary Data)

  • Characteristics
    • Complete CRUD and batch operations
    • Concurrent safe (read-write lock)
    • Supports random sampling, pagination, conditional query (full table scan)
  • Applicable Scenarios
    • Test environment, temporary data, cache layer (non-persistent)

Redis Storage

  • Characteristics
    • Key space prefix isolates entity types
    • TTL support, Pipeline batch
    • Full table scan conditional query
  • Applicable Scenarios
    • Cache, session, message queue event bus, event storage (ESDB)

PostgreSQL Storage

  • Characteristics
    • Complete ACID transaction, soft delete, index and complex query
    • Supports batch operations and conditional query
  • Applicable Scenarios
    • Master data, audit log, strong consistency required data

MongoDB Storage

  • Characteristics
    • Collection naming rules, BSON document storage
    • Supports aggregation and random sampling
  • Applicable Scenarios
    • Log, event sourcing, semi-structured data

BadgerDB Storage

  • Characteristics
    • LSM-Tree storage, key prefix scan
    • No soft delete, full table scan conditional query
  • Applicable Scenarios
    • Local event storage, edge computing, embedded systems

Generic SQL Database Storage

  • Characteristics
    • Automatically identifies soft delete field, supports soft/hard delete
    • Transactional batch operations
  • Applicable Scenarios
    • Compatibility requirements, legacy databases

Key Implementation Flow

PostgreSQL Save Flow

Redis Batch Save Flow

Memory Conditional Query Flow

Dependency Analysis

Interface and Implementation

  • usecase/Repo interface is implemented by all concrete repositories
  • BaseRepository provides default skeleton for each implementation

Entity Contract

  • All repository implementation generic parameters implement Entity interface

Configuration and Container

  • bootstrap/Database uniformly wraps Redis, SQL, Badger clients
  • bootstrap/Container provides dependency injection and singleton cache

Performance Considerations

Redis

  • Use Pipeline for batch writes, reasonably set TTL
  • Control key prefix length, avoid overly long prefixes affecting Keys/MGET performance

PostgreSQL

  • Build indexes for high-frequency query fields
  • Use transaction batch operations, reduce round-trip overhead
  • Pagination query avoid deep pagination (Offset too large), consider cursor-based pagination

MongoDB

  • Use aggregation pipeline and indexes, avoid full table scan
  • Reasonably use $sample for random sampling

BadgerDB

  • Implement pagination through key prefix scan, pay attention to full traversal cost
  • Adjust Compactor and threshold parameters to balance write amplification and read amplification

General

  • Batch operations preferentially use transaction or Pipeline
  • Conditional query尽量缩小范围, combine with pagination and sorting

Troubleshooting Guide

Common Error Types

  • RepositoryError: Entity ID empty, entity not exists, operation failed, etc.
  • SQL/Redis/MongoDB Client Error: Connection failed, timeout, key not exists

Troubleshooting Steps

  • Verify entity ID and entity type
  • Check backend connection configuration and network connectivity
  • Observe transaction status and rollback logs (PostgreSQL)
  • Analyze batch operation Pipeline/transaction boundaries

Recommendations

  • Add retry and circuit breaker for critical paths
  • Record context information for critical operations (traceId, entity ID)

Conclusion

Sparrow's persistence system achieves unified data access experience and flexible deployment choices through generic repository interface and multi-backend adaptation. In actual engineering, should choose appropriate backend according to business requirements for consistency, availability and performance, and combine with index, pagination and batch operation strategies to improve overall performance.

Appendix

Data Migration and Backup Recovery

PostgreSQL

  • Use logical backup tools for periodic snapshots
  • Pay attention to soft delete field and index rebuild during migration

Redis

  • RDB/AOF persistence strategy and AOF rewrite
  • Pay attention to key space prefix and TTL consistency during migration

MongoDB

  • Use replica set and sharding, periodic backup
  • Pay attention to collection naming and indexes during migration

BadgerDB

  • File system level backup and recovery
  • Pay attention to directory permissions and threshold configuration during migration

Operation and Maintenance Recommendations

Configuration Management

  • Manage connection parameters uniformly through config/*
  • Environment isolation: Development/Test/Production use different configurations

Monitoring and Alerting

  • Key indicators: Connection count, QPS, slow query, error rate
  • Backend specific: Redis memory usage, PG WAL, MongoDB Oplog, Badger Compaction

Container and Assembly

  • Use bootstrap/Container to manage dependencies and lifecycle
  • Database uniformly wraps multiple clients, convenient for replacement and testing