Data Persistence
Table of Contents
- Introduction
- Project Structure
- Core Components
- Architecture Overview
- Detailed Component Analysis
- Dependency Analysis
- Performance Considerations
- Troubleshooting Guide
- Conclusion
- Appendix
Introduction
This document targets data persistence developers, systematically organizes Sparrow data persistence system: repository pattern implementation, multi-datasource support, transaction management mechanism, as well as characteristics and applicable scenarios of various storage backends (memory, Redis, PostgreSQL, MongoDB, BadgerDB). The document also provides repository interface design principles, data access patterns, query optimization strategies, migration and backup recovery ideas and performance tuning recommendations.
Project Structure
Sparrow's persistence layer adopts "generic repository + multi-backend adaptation" architecture:
- usecase layer defines generic repository interfaces and query models
- entity layer defines entity contracts
- persistence/repo contains concrete implementations for various storage backends
- bootstrap and config provide container and configuration assembly capabilities
Core Components
Repository Interface and Query Model
- Repository interface uniformly defines CRUD, batch operations, pagination, conditional query, random sampling and other capabilities
- Query model includes conditions, sorting, pagination parameters
Base Repository
- Provides default implementation skeleton and error handling template, concrete backend implementations reuse
Entity Contract
- Unified entity interface, requires ID, create/update time field access and setting capabilities
Architecture Overview
Sparrow's persistence architecture follows "interface segregation + backend adaptation" design:
- Shields underlying differences through generic repository interface
- Each backend independent implementation, ensures extensibility and replaceability
- Transaction management explicitly controlled by implementation layer in SQL/PostgreSQL scenarios
- Configuration and container responsible for instance assembly and lifecycle management
Detailed Component Analysis
Repository Interface Design Principles
- Clear Responsibility: Single interface carries complete data access capability, avoid excessive subdivision
- Unified Convention: All implementations follow same context, error and return specifications
- Extensibility: Extend new capabilities through generics and query models without breaking existing contracts
- Consistency: Pagination, conditions, sorting maintain consistent behavior expectations across multiple backends
Core Method Classification
- Single Entity: Save/FindByID/Update/Delete/Exists
- Batch: SaveBatch/DeleteBatch/FindByIDs
- List: FindAll/FindWithPagination/Count
- Conditional Query: FindByField/FindByFieldWithPagination/CountByField/FindWithConditions/CountWithConditions
- Random Sampling: Random
Query Optimization Strategy
Index and Scan
- Redis/MongoDB/Badger: No native index, conditional query is full table scan, recommend combining with pagination and filter range
- PostgreSQL: Supports field index and composite index, conditional query and sorting can use index to improve performance
Pagination and Sorting
- Uniformly use Limit/Offset or cursor approach, avoid loading full data at once
- PostgreSQL defaults to descending by creation time, can build index by business field
Batch Operations
- Redis uses Pipeline
- PostgreSQL/SQL adopts transaction-wrapped batch insert/update
Random Sampling
- PostgreSQL uses RANDOM()
- MongoDB uses $sample
- Redis/Badger through random key selection or full scan then random extraction
Transaction Management
PostgreSQL
- SaveBatch/batch insert/update execute within transaction, automatic rollback on failure
- Soft delete: Implemented by marking deleted_at field
SQL Generic Repository
- Save/SaveBatch/batch operations all execute within transaction
- Supports soft delete (DeletedAt field exists) or hard delete
Redis/Badger/MongoDB
- No built-in transaction abstraction, need to coordinate consistency in upper business use case
- Redis supports Pipeline to improve batch throughput and atomicity
Storage Backend Characteristics and Applicable Scenarios
Memory Storage (Memory Cache/Temporary Data)
- Characteristics
- Complete CRUD and batch operations
- Concurrent safe (read-write lock)
- Supports random sampling, pagination, conditional query (full table scan)
- Applicable Scenarios
- Test environment, temporary data, cache layer (non-persistent)
Redis Storage
- Characteristics
- Key space prefix isolates entity types
- TTL support, Pipeline batch
- Full table scan conditional query
- Applicable Scenarios
- Cache, session, message queue event bus, event storage (ESDB)
PostgreSQL Storage
- Characteristics
- Complete ACID transaction, soft delete, index and complex query
- Supports batch operations and conditional query
- Applicable Scenarios
- Master data, audit log, strong consistency required data
MongoDB Storage
- Characteristics
- Collection naming rules, BSON document storage
- Supports aggregation and random sampling
- Applicable Scenarios
- Log, event sourcing, semi-structured data
BadgerDB Storage
- Characteristics
- LSM-Tree storage, key prefix scan
- No soft delete, full table scan conditional query
- Applicable Scenarios
- Local event storage, edge computing, embedded systems
Generic SQL Database Storage
- Characteristics
- Automatically identifies soft delete field, supports soft/hard delete
- Transactional batch operations
- Applicable Scenarios
- Compatibility requirements, legacy databases
Key Implementation Flow
PostgreSQL Save Flow
Redis Batch Save Flow
Memory Conditional Query Flow
Dependency Analysis
Interface and Implementation
- usecase/Repo interface is implemented by all concrete repositories
- BaseRepository provides default skeleton for each implementation
Entity Contract
- All repository implementation generic parameters implement Entity interface
Configuration and Container
- bootstrap/Database uniformly wraps Redis, SQL, Badger clients
- bootstrap/Container provides dependency injection and singleton cache
Performance Considerations
Redis
- Use Pipeline for batch writes, reasonably set TTL
- Control key prefix length, avoid overly long prefixes affecting Keys/MGET performance
PostgreSQL
- Build indexes for high-frequency query fields
- Use transaction batch operations, reduce round-trip overhead
- Pagination query avoid deep pagination (Offset too large), consider cursor-based pagination
MongoDB
- Use aggregation pipeline and indexes, avoid full table scan
- Reasonably use $sample for random sampling
BadgerDB
- Implement pagination through key prefix scan, pay attention to full traversal cost
- Adjust Compactor and threshold parameters to balance write amplification and read amplification
General
- Batch operations preferentially use transaction or Pipeline
- Conditional query尽量缩小范围, combine with pagination and sorting
Troubleshooting Guide
Common Error Types
- RepositoryError: Entity ID empty, entity not exists, operation failed, etc.
- SQL/Redis/MongoDB Client Error: Connection failed, timeout, key not exists
Troubleshooting Steps
- Verify entity ID and entity type
- Check backend connection configuration and network connectivity
- Observe transaction status and rollback logs (PostgreSQL)
- Analyze batch operation Pipeline/transaction boundaries
Recommendations
- Add retry and circuit breaker for critical paths
- Record context information for critical operations (traceId, entity ID)
Conclusion
Sparrow's persistence system achieves unified data access experience and flexible deployment choices through generic repository interface and multi-backend adaptation. In actual engineering, should choose appropriate backend according to business requirements for consistency, availability and performance, and combine with index, pagination and batch operation strategies to improve overall performance.
Appendix
Data Migration and Backup Recovery
PostgreSQL
- Use logical backup tools for periodic snapshots
- Pay attention to soft delete field and index rebuild during migration
Redis
- RDB/AOF persistence strategy and AOF rewrite
- Pay attention to key space prefix and TTL consistency during migration
MongoDB
- Use replica set and sharding, periodic backup
- Pay attention to collection naming and indexes during migration
BadgerDB
- File system level backup and recovery
- Pay attention to directory permissions and threshold configuration during migration
Operation and Maintenance Recommendations
Configuration Management
- Manage connection parameters uniformly through config/*
- Environment isolation: Development/Test/Production use different configurations
Monitoring and Alerting
- Key indicators: Connection count, QPS, slow query, error rate
- Backend specific: Redis memory usage, PG WAL, MongoDB Oplog, Badger Compaction
Container and Assembly
- Use bootstrap/Container to manage dependencies and lifecycle
- Database uniformly wraps multiple clients, convenient for replacement and testing