Design Word Dictionary.

Jul 2, 2025

Requirement

The service exposes endpoints for getting the meaning given the word.
The dictionary is weekly updated through a changelog which has the words and meanings that needs to be updated and this changelog will contain at max 1000 words.
The total size of the dictionary is 1TB and it holds 171476 words.

No traditional db.

5M request per minute. ~83,333 QPS

Weekly the data update at max 1000 word. Maintains strong consistency for lookups after each changelog.

Additional Requirement - We look into the product vision as well.
Sub-ms lookups for common words.
99.999% read availability.
Full rollback if a bad changelog is pushed.
Versioned storage for audit & compliance.
Stateless deploys with rolling updates.
Easy to maintain, small operational footprint.

Name	Number of Bytes	Power of 10
Kilobytes (KB)	1000	10^3
Megabytes (MB)	1,000,000	10^ 6
Gigabytes (GB)	1,000,000,000	10^9
Terabyte (TB)	1,000,000,000,000	10^12

Large data size vs small word count - 1TB for 171K words. It means 10^12 / 10^5 = 10^7bytes. 1MB = 10^6 bytes. 10^7 bytes = 10 MB. It means each word + meaning is ~10MB on average — likely due to detailed meanings, usage examples, synonyms, antonyms, etc.

Low write frequency, high read performance - Needs an architecture that favors read-optimized storage and caching.

Atomic batch update - Updates must be atomic — either all changes apply or none.

HIgh Level Architecture

API Layer - GET /word/{word} and POST /changelog

Service Layer - Orchestrate lookup, caching, validation. Alies changelog updates.

Storage Layer - Primary storage for word data. Write ahead log or versioning for rollback.

Cache Layer - Hot words (most queried) cached in RAM.

Batch Processor - Processes changelog every week.

storage Option.

Relational DB (e.g., PostgreSQL) - ACID, good for versioning, transactional batch updates. Might not scale well for large blob data NoSQL (e.g., Cassandra, DynamoDB) - High read throughput, easy scaling. Harder to guarantee strict consistency for batch updates. Object Store (e.g., S3) + Index - Cheap, scales well, good for large text. Needs a fast index for lookups.

Hybrid - Primary data in an Object Store (S3) — 1TB dictionary blobs. Word Index in a fast key-value store (e.g., DynamoDB, RocksDB).

Maps word → object store location.

Small enough to fit in memory for blazing fast lookups.

Client → API → Cache → Word Index → Object Store → Response.

First check in-memory cache (Redis). If cache miss, lookup index to get location. Fetch word meaning blob from Object Store. Return to client. Update cache for next time.

Admin uploads changelog → stored as a file in staging area. Batch Processor: Validates changelog. Creates a new version of affected blobs. Updates index atomically. On success, old blobs can be deleted later. If error → rollback index pointer to previous version.

Use object versioning in the store (e.g., S3 versioning). Word index should store version pointer. Rollback = point index back to previous version. No partial update risk.

No traditional db.

Use a modern file-based storage with index + object store pattern — inspired by how search engines & CDNs work.

👉 Key Idea:

Store meanings as immutable files (JSON, Parquet, Avro).

Use LSM-tree style index: a compact key-value index built in embedded storage, or even a memory-mapped file.

Use object storage (e.g., Amazon S3) for durability.

Cache hot words in memory or on SSDs close to compute.

High Level.

API layer — Stateless HTTP servers (e.g., Spring Boot / FastAPI).
Embedded KV Store — Use RocksDB, BadgerDB, LMDB.

Stores word → blob pointer or offset.

Local to each node or replicated by consensus.

BlobStore — Stores big JSON/Parquet files for meanings.

In-Memory Cache — Redis or local LRU for super-hot words.

No traditional DB. RocksDB is an embedded, file-backed key-value store.

Component	Role
🚪 API Nodes	Stateless REST servers.
🗂️ Local Index	RocksDB or LMDB, local-only.
☁️ Blob Store	S3 (immutable).
🧊 CDN	Cache blobs at edge.
⚡ In-memory Cache	Local LRU + optional Redis Cluster.
📦 Changelog Job	Validates, builds new blobs, new index, swaps pointer.
🧩 Version Pointer	Tiny file in S3 or etcd.
🔄 Config Sync	Each node pulls version pointer on boot & refreshes every minute.

Aspect - Choice
Index - RocksDB on local disk (EBS/SSD) Meaning storage - Chunked JSON/Parquet blobs in S3
Updates - New versioned blobs → pointer in index updated
Versioning - Immutable blobs, index switch is atomic.
Rollback - Roll back pointer to older blob version

Example:

Word: “apple”

Index: apple → blob_uri: s3://dict/words_v2/abc123.json, offset: 205

Server fetches only that slice.

Local Index:

RocksDB or LMDB — embedded KV store, file on disk:

Key: "serendipity"
Value: {
"blob": "shard-0042.parquet",
"offset": 9583
}

offset means byte offset or row ID.

This index is tiny (a few GB). It fits fully in memory if needed.

How does RocksDB help here?

RocksDB is just an embedded key-value store.

It lives on local disk.

Reads are local, low-latency.

It handles SSTable compaction, LSM-tree mechanics.

You do NOT need a DB server — the process owns the files.

Why not just a flat file index?

A sorted flat file works too!

But with 170K entries it’s easier to use RocksDB to handle:

Binary search for lookups.

Fast batch rebuild on changelog.

Atomic compaction.

Example:

lookup(word) → RocksDB .get(key) → returns blob + offset.

If hot: cache the resolved meaning.

If cold: read slice from blob in S3 or disk.

Read Path - If hit in local cache → sub-ms. ✅ If hit in RocksDB → index lookup takes ~1ms, blob read depends on local vs remote. ✅ Cold blob → fallback to S3 → possible CDN edge hit. ✅ For super-hot blobs → pre-fetch shard locally or use SSD scratch disk.

Realistic Weekly Update:

Upload changelog: JSON or CSV. Example:


[
  {
    "word": "serendipity",
    "meaning": "... new meaning ..."
  },
  {
    "word": "effervescent",
    "meaning": "... new word ..."
  }
]

Validation Job:

Runs in CI/CD or Airflow.

Checks for syntax errors, duplicate words.

Verifies no accidental deletion of unrelated words.

Batch Processor:

Generates new blob files (Parquet or JSON).

Builds new RocksDB index:

Bulk import all mappings.

Uploads blobs to S3.

Writes new version.json with new version ID.

Cutover:

Health checks: spin up a shadow read service → hit sample words → verify new version.

If healthy → do atomic pointer swap: update version.json.

Clients read pointer on startup → pick new index.

Old version is untouched → rollback = switch pointer back.

Failure	Mitigation
Blob read slow	Use CDN + regional edge cache
Index corruption	Keep previous version + RocksDB snapshot
Bad changelog	Detect in validation, catch in shadow deploy
Partial deploy	Impossible: pointer switch is atomic
Node disk full	Monitor IOPS & capacity, scale horizontally
Hotspot word overload	Cache layer with LRU & Redis Cluster

Hot word % → plan Redis RAM size.

Cold word % → plan S3 bandwidth.

Peak QPS → plan node count.

Test with Locust / K6.

Code Structure.

index/build_index.py: Build RocksDB index.

api/app.py: API server, load index, serve /word/{word}.

blobs/: Local blobs for prototype — simulate S3.

version.json: Points to current index version.

v1 - Single node, Local Disk Only, Prototype with RockDb and Json file.
v2 - USe S3 for blob store. Use CDN for edge hits.
v3 - Add Redis layer for super-hot words. Add local SSD scratch disk for blobs.
v4 - Fully stateless API fleet. Self-healing index sync - on deploy, node pulls latest version index.
v5 - Multi-region deploy. Version pointer via global config store.

Read-heavy, immutable design - Immutable blobs → no corruption. Single pointer switch → atomic versioning.

No DB vendor lock-in - Pure file + object store + embedded index → simple infra.

Cheap scale - Only cost is compute & storage. CDN offloads 80–90% traffic.

Extremely simple ops - One pointer swap → all done. Rollback is trivial.

No deadlocks - LSM index is local & read-only. No distributed locks needed.

High availability by design - Any API node can serve any word if it has index + blob access.

Minimal moving parts -Only deploy new blobs and index files.

GET /word/{word} 1️⃣ User hits /word/{word}. 2️⃣ API checks local LRU. 3️⃣ If miss → query RocksDB → get {shard, offset}. 4️⃣ Check local SSD for shard:

If local: open file, seek, read.

Else: fetch from CDN → store on disk → read. 5️⃣ Cache meaning in LRU. 6️⃣ Return JSON.

Latency:

Hot LRU: ~0.5ms

RocksDB + local shard: ~1ms

CDN pull: 10–50ms

S3 fallback (should be rare): 50–100ms

📌 POST /changelog Admin upload only.

1️⃣ Upload changelog.json:

json Copy Edit [ {“word”: “serendipity”, “meaning”: “…”}, {“word”: “fugacious”, “meaning”: “…”} ] 2️⃣ Validate

Dedupes, syntax, diff old vs new.

Dry-run build → test lookup keys.

3️⃣ Build

New blob files: chunk new & old words.

Bulk-generate new RocksDB index.

4️⃣ Store

Upload blobs to S3 under v43/.

Upload new index words.db to S3.

5️⃣ Smoke Test

Spin up shadow API node → load new index → sample requests → compare.

6️⃣ Swap

Update version.json → active_version: “v43”

7️⃣ Nodes pick up

Pull version.json → if changed, hot-swap index, discard old.

8️⃣ Rollback

Flip version.json back

File-based index = cheap, fast, resilient.

Object store = infinite capacity, high durability.

CDN = global scale for hot words.

Changelog = versioned, safe, simple.

Ops = push files, swap pointer.

Using Db.

With File/No DB	With DB
Immutable files, RocksDB index	Normal relational DB table
Pointer swap for versioning	Schema versioning, transactional updates
Read path must slice files	DB handles read, cache holds hot words
Rollback = pointer flip	Rollback = transaction + version flag
Local index for speed	DB handles index + query plan
CDN only for blob files	CDN optional (or used for static docs)

The data Model. A relational table - dictionary_word -

Column	Type	Notes
`word`	VARCHAR(128)	PK, indexed
`meaning`	TEXT	JSON blob or large text
`version`	INT	Versioning if needed
`created_at`	TIMESTAMP	For audit
`updated_at`	TIMESTAMP	For audit

word must be UNIQUE.

B-Tree index on word → fast WHERE word = ? lookup.

Optional GIN index if meaning is JSON and you want partial JSON queries later.

GET /word/{word}

1️⃣ Request hits API Gateway → Service → Cache Layer. 2️⃣ If word is in Redis → instant return. 3️⃣ If miss → query DB: SELECT meaning FROM dictionary_word WHERE word = ?. 4️⃣ Populate Redis with word → next request is cache hit.

✅ POST /changelog

1️⃣ Admin uploads JSON or CSV. 2️⃣ Batch validator: check syntax, required fields. 3️⃣ Batch job: run in transaction:

sql Copy Edit BEGIN; FOR EACH word IN changelog: UPSERT (INSERT ON CONFLICT DO UPDATE) COMMIT; 4️⃣ If any word fails → full rollback. 5️⃣ After commit, invalidate cache keys for updated words.

Consistent.

DB guarantees ACID — no partial updates.

Changelog runs inside transaction.

If you want multi-version: add version column, mark active = true → swap flags at once.

If DB size grows:

Postgres → use table partitioning or hash-shard by first letter (A → Z).

DynamoDB → auto-partitions by hash key (word).

For massive scale: use read replicas to spread read load.

Pareto applies: top 10% of words get 90% of traffic.

Redis: keep top 10K words fully cached.

Expire with TTL (or no TTL, invalidate on changelog push).

Use LRU or LFU eviction.

API layer: stateless → auto-scale pods.

Redis: must run as a cluster for 100K QPS+.

DB:

99.9% reads hit cache.

Remaining reads hit DB.

Use read replicas.

Scale with connection pooling.

Use prepared statements → no query parsing cost.

If changelog has bugs:

Keep backup table: dictionary_word_backup.

Or use version flags → old rows stay in DB.

Rollback = flag old rows active, new inactive.

Failure Handing.

Failure	Solution
Cache down	Fall back to DB, hits higher latency
DB write fails	Transaction rollback
DB read slow	Add replicas, index tuning
Deadlocks	Careful bulk upserts, proper isolation
Conflicts	Use `INSERT ON CONFLICT` with clear primary key rules

Throughput and Deadlock. GET is pure read, no lock contention.

POST is a single batch — single writer per changelog.

DB can do row-level locks, but no deadlock if only one batch job runs at once.

Keep changelog small (< 1K rows) → writes complete in seconds.

Cose Model.

Layer	Cost Impact
DB	More expensive than pure blob store but simpler ops
Redis	Adds infra, but worth it for hot reads
API nodes	Scale linearly
Changelog	Cheap batch job

Factor	Plan
DB IOPS	Sized for 0.1% read misses → 1K–2K QPS to DB
Redis RAM	Sized for top ~50K words fully cached
Write throughput	1K row upsert per week — trivial
Read replicas	Add replicas if QPS > single node capacity

Sample Code.

CREATE TABLE dictionary_word (
  word VARCHAR(255) PRIMARY KEY,
  meaning TEXT NOT NULL,
  created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
  updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

Spring Boot Dependencies - spring-boot-starter-web, spring-boot-starter-data-jpa, postgresql, spring-boot-starter-data-redis

<dependency>
  <groupId>org.springframework.boot</groupId>
  <artifactId>spring-boot-starter-web</artifactId>
</dependency>

Entity and Repository.


@Entity
@Table(name = "dictionary_word")
public class DictionaryWord {
    @Id
    private String word;

    @Column(columnDefinition = "TEXT")
    private String meaning;

    @Column
    private Timestamp createdAt;

    @Column
    private Timestamp updatedAt;

    // getters and setters
}

@Repository
public interface DictionaryWordRepository extends JpaRepository<DictionaryWord, String> {
}

Redis Configuration.


@Configuration
public class RedisConfig {

    @Bean
    public RedisConnectionFactory redisConnectionFactory() {
        return new LettuceConnectionFactory();
    }

    @Bean
    public RedisTemplate<String, String> redisTemplate() {
        RedisTemplate<String, String> template = new RedisTemplate<>();
        template.setConnectionFactory(redisConnectionFactory());
        return template;
    }
}

Service Layer.


@Service
public class DictionaryService {

    @Autowired
    private DictionaryWordRepository repository;

    @Autowired
    private RedisTemplate<String, String> redisTemplate;

    public String getMeaning(String word) {
        String cachedMeaning = redisTemplate.opsForValue().get(word);
        if (cachedMeaning != null) {
            return cachedMeaning;
        }

        DictionaryWord wordEntity = repository.findById(word)
                .orElseThrow(() -> new RuntimeException("Word not found"));

        redisTemplate.opsForValue().set(word, wordEntity.getMeaning());
        return wordEntity.getMeaning();
    }

    @Transactional
    public void applyChangelog(List<DictionaryWord> newWords) {
        for (DictionaryWord w : newWords) {
            repository.save(w);
            redisTemplate.delete(w.getWord()); // clear cache
        }
    }
}

Controller.


@RestController
@RequestMapping("/api")
public class DictionaryController {

    @Autowired
    private DictionaryService service;

    @GetMapping("/word/{word}")
    public ResponseEntity<String> getWord(@PathVariable String word) {
        return ResponseEntity.ok(service.getMeaning(word));
    }

    @PostMapping("/changelog")
    public ResponseEntity<String> uploadChangelog(@RequestBody List<DictionaryWord> changelog) {
        service.applyChangelog(changelog);
        return ResponseEntity.ok("Changelog applied");
    }
}

Application properties.

spring.datasource.url=jdbc:postgresql://${POSTGRES_HOST:localhost}:5432/dictionarydb
spring.datasource.username=${POSTGRES_USER:postgres}
spring.datasource.password=${POSTGRES_PASSWORD:password}
spring.redis.host=${REDIS_HOST:localhost}
spring.redis.port=6379

Docker Compose.

version: '3.8'
services:
  app:
    build: .
    ports:
      - "8080:8080"
    environment:
      - POSTGRES_HOST=postgres
      - POSTGRES_USER=postgres
      - POSTGRES_PASSWORD=example
      - REDIS_HOST=redis
    depends_on:
      - postgres
      - redis

  postgres:
    image: postgres:15
    restart: always
    environment:
      POSTGRES_DB: dictionarydb
      POSTGRES_USER: postgres
      POSTGRES_PASSWORD: example
    ports:
      - "5432:5432"

  redis:
    image: redis:alpine
    ports:
      - "6379:6379"

Kubernetes Deployment.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: dictionary-app
spec:
  replicas: 5
  selector:
    matchLabels:
      app: dictionary-app
  template:
    metadata:
      labels:
        app: dictionary-app
    spec:
      containers:
        - name: dictionary-container
          image: gcr.io/YOUR_PROJECT_ID/dictionary-app:latest
          ports:
            - containerPort: 8080
          env:
            - name: POSTGRES_HOST
              value: "YOUR_PG_HOST"
            - name: POSTGRES_USER
              value: "postgres"
            - name: POSTGRES_PASSWORD
              valueFrom:
                secretKeyRef:
                  name: pg-secret
                  key: password
            - name: REDIS_HOST
              value: "YOUR_REDIS_HOST"
---
apiVersion: v1
kind: Service
metadata:
  name: dictionary-service
spec:
  type: LoadBalancer
  selector:
    app: dictionary-app
  ports:
    - port: 80
      targetPort: 8080

Use Cloud SQL for managed PostgreSQL.

Use Cloud Memorystore for Redis.

Use GKE (Google Kubernetes Engine) for your pods.

Use Container Registry (gcr.io) for your Docker images.

Use Secrets Manager for DB credentials.

docker build -t gcr.io/YOUR_PROJECT_ID/dictionary-app:latest .

Docker push gcr.io/YOUR_PROJECT_ID/dictionary-app:latest 3️⃣ kubectl apply -f k8s-deployment.yaml 4️⃣ Setup Cloud SQL Proxy if needed. 5️⃣ Redis can be Cloud Memorystore, with redis.host = private IP.

5M requests/min → Redis covers hot reads, DB covers cold. ✅ Transactional changelog → Strong consistency. ✅ Stateless pods → scale horizontal in GKE. ✅ Kubernetes health checks → rolling deploys. ✅ Atomic upserts → no partial updates. ✅ Rollback? Just reapply previous changelog.

LLD.

Core Flow	Operations
`GET /word/{word}`	1) Validate input 2) Check Redis 3) If miss → DB 4) Populate Redis 5) Return
`POST /changelog`	1) Validate file 2) Parse new words 3) Store in DB transactionally 4) Clear cache keys

Object	Responsibility
`Word`	Domain entity: word + meaning.
`DictionaryService`	Business logic (lookups, changelog).
`WordRepository`	DB operations (CRUD).
`CacheService`	Redis abstraction.
`ChangelogProcessor`	Parses & validates changelog.
`WordValidator`	Ensures valid input.
`Controller`	Exposes REST API.

SOLID Principle	Where Applied	Real Pattern Used	Why It Fits
S — Single Responsibility	`WordRepository` → DB only `RedisCacheService` → Cache only `WordValidator` → Validation only	Repository, Proxy/Decorator	Each class does one clear job
O — Open/Closed	New file formats → plug `CSVProcessor`, `JSONProcessor`	Factory	Add new processor without changing core logic
L — Liskov Substitution	`IWordRepository` `ICacheService`	Repository, Strategy	Replace `WordRepository` with any storage backend
I — Interface Segregation	`ICacheService`, `IWordRepository`, `IChangelogProcessor`	Repository, Factory	Interfaces expose only needed ops
D — Dependency Inversion	`Controller` depends on `IDictionaryService` `DictionaryService` depends on interfaces	Repository, Strategy, Decorator	High-level modules use abstractions, not concrete classes

SOLID shapes the structure.

Patterns implement the shape.

Pattern	Where Used	Why
Repository	`WordRepository`	Abstract DB.
Factory	`ChangelogProcessorFactory`	Can create `JSONProcessor`, `CSVProcessor` later.
Strategy	Cache lookup strategy (`ReadThroughCache`)	Swappable cache policy.
Singleton	`RedisConnection`	One Redis pool.
Builder	`Word` for complex JSON payloads.	Future synonyms/examples.

+---------------------+
|      Controller     |
+---------------------+
| - IDictionaryService|
+---------------------+
| + getWord()         |
| + uploadChangelog() |
+---------------------+

          |
          v

+------------------------+
|  IDictionaryService    |
+------------------------+
| + getMeaning()         |
| + applyChangelog()     |
+------------------------+

          |
          v

+------------------------+
|  DictionaryService     |
+------------------------+
| - WordRepository       |
| - CacheService         |
| - ChangelogProcessor   |
+------------------------+
| + getMeaning()         |
| + applyChangelog()     |
+------------------------+

 +---------------+         +----------------+
 | WordValidator |         | ChangelogProcessorFactory |
 +---------------+         +----------------+
 | + isValid()   |         | + getProcessor() |
 +---------------+         +----------------+

 +---------------------+
 | IWordRepository     |
 +---------------------+
 | + findWord()        |
 | + saveWord()        |
 +---------------------+

         |
         v

 +---------------------+
 | WordRepository      |
 +---------------------+
 | - DB Connection     |
 +---------------------+
 | + findWord()        |
 | + saveWord()        |
 +---------------------+

 +---------------------+
 | ICacheService       |
 +---------------------+
 | + get()             |
 | + set()             |
 | + evict()           |
 +---------------------+

         |
         v

 +---------------------+
 | RedisCacheService   |
 +---------------------+
 | - RedisConnection   |
 +---------------------+
 | + get()             |
 | + set()             |
 | + evict()           |
 +---------------------+

 +---------+
 | Word    |
 +---------+
 | word    |
 | meaning |
 +---------+

Word — Builder Pattern.

public class Word {
    private final String word;
    private final String meaning;

    private Word(Builder builder) {
        this.word = builder.word;
        this.meaning = builder.meaning;
    }

    public String getWord() { return word; }
    public String getMeaning() { return meaning; }

    public static class Builder {
        private String word;
        private String meaning;

        public Builder word(String word) {
            this.word = word;
            return this;
        }

        public Builder meaning(String meaning) {
            this.meaning = meaning;
            return this;
        }

        public Word build() {
            return new Word(this);
        }
    }
}

public interface IWordRepository {
    Optional<Word> findWord(String word);
    void saveWords(List<Word> words);
}

@Repository
public class WordRepository implements IWordRepository {
    // Spring Data JPA or JdbcTemplate
    public Optional<Word> findWord(String word) {
        // SELECT * FROM dictionary WHERE word = ?
        return Optional.empty(); // example
    }

    public void saveWords(List<Word> words) {
        // Transactional batch UPSERT
    }
}

Proxy/Decorator Pattern for Cache

public interface ICacheService {
    String get(String key);
    void set(String key, String value);
    void evict(String key);
}

@Service
public class RedisCacheService implements ICacheService {
    private final RedisTemplate<String, String> redis;

    public RedisCacheService(RedisTemplate<String, String> redis) {
        this.redis = redis;
    }

    public String get(String key) {
        return redis.opsForValue().get(key);
    }

    public void set(String key, String value) {
        redis.opsForValue().set(key, value);
    }

    public void evict(String key) {
        redis.delete(key);
    }
}

Factory Pattern for Changelog Processor.

public interface IChangelogProcessor {
    List<Word> parseChangelog(File file);
}

public class JSONProcessor implements IChangelogProcessor {
    public List<Word> parseChangelog(File file) {
        // Parse JSON changelog
        return List.of();
    }
}

public class CSVProcessor implements IChangelogProcessor {
    public List<Word> parseChangelog(File file) {
        // Parse CSV changelog
        return List.of();
    }
}

public class ChangelogProcessorFactory {
    public static IChangelogProcessor getProcessor(String fileType) {
        switch(fileType) {
            case "json": return new JSONProcessor();
            case "csv": return new CSVProcessor();
            default: throw new IllegalArgumentException("Unsupported file type");
        }
    }
}

Strategy Pattern (Optional) for Cache Strategy.

public interface CacheStrategy {
    String get(String key);
    void put(String key, String value);
}

public class ReadThroughCacheStrategy implements CacheStrategy {
    private final ICacheService cache;
    private final IWordRepository repository;

    public ReadThroughCacheStrategy(ICacheService cache, IWordRepository repository) {
        this.cache = cache;
        this.repository = repository;
    }

    public String get(String key) {
        String cached = cache.get(key);
        if (cached != null) return cached;

        Optional<Word> word = repository.findWord(key);
        word.ifPresent(w -> cache.set(key, w.getMeaning()));
        return word.map(Word::getMeaning).orElse(null);
    }

    public void put(String key, String value) {
        cache.set(key, value);
    }
}

Dictionary Service with Dependency Inversion

public interface IDictionaryService {
    String getMeaning(String word);
    void applyChangelog(File file, String fileType);
}

@Service
public class DictionaryService implements IDictionaryService {
    private final IWordRepository repo;
    private final ICacheService cache;

    public DictionaryService(IWordRepository repo, ICacheService cache) {
        this.repo = repo;
        this.cache = cache;
    }

    public String getMeaning(String word) {
        String cached = cache.get(word);
        if (cached != null) return cached;

        Word w = repo.findWord(word)
                     .orElseThrow(() -> new RuntimeException("Word not found"));
        cache.set(word, w.getMeaning());
        return w.getMeaning();
    }

    public void applyChangelog(File file, String fileType) {
        IChangelogProcessor processor = ChangelogProcessorFactory.getProcessor(fileType);
        List<Word> words = processor.parseChangelog(file);
        repo.saveWords(words);
        words.forEach(w -> cache.evict(w.getWord()));
    }
}

Controller.

public interface IDictionaryService {
    String getMeaning(String word);
    void applyChangelog(File file, String fileType);
}

@Service
public class DictionaryService implements IDictionaryService {
    private final IWordRepository repo;
    private final ICacheService cache;

    public DictionaryService(IWordRepository repo, ICacheService cache) {
        this.repo = repo;
        this.cache = cache;
    }

    public String getMeaning(String word) {
        String cached = cache.get(word);
        if (cached != null) return cached;

        Word w = repo.findWord(word)
                     .orElseThrow(() -> new RuntimeException("Word not found"));
        cache.set(word, w.getMeaning());
        return w.getMeaning();
    }

    public void applyChangelog(File file, String fileType) {
        IChangelogProcessor processor = ChangelogProcessorFactory.getProcessor(fileType);
        List<Word> words = processor.parseChangelog(file);
        repo.saveWords(words);
        words.forEach(w -> cache.evict(w.getWord()));
    }
}

S: Repo, Cache, Processor each do one job O: Adding XMLProcessor? Zero changes to DictionaryService L: Any IWordRepository works — Postgres, DynamoDB I: Interfaces cleanly split responsibilities D: Controller depends on IDictionaryService only — low coupling

✔️ Repository → DB ✔️ Proxy → Redis Cache ✔️ Factory → Changelog Processor ✔️ Strategy → Pluggable cache logic ✔️ Builder → Word object ✔️ Singleton → Spring handles Redis pool ✔️ Decorator → You can wrap IWordRepository for logging / metrics easily

@Transactional in saveWords ensures atomic changelog.

Rollback on exception.

Cache evicts after commit to avoid stale reads.

Sample test file will be like this.

@ExtendWith(MockitoExtension.class)
class DictionaryServiceTest {

    @Mock
    private IWordRepository repo;

    @Mock
    private ICacheService cache;

    @InjectMocks
    private DictionaryService service;

    @Test
    void testGetMeaning_CacheHit() {
        when(cache.get("apple")).thenReturn("A fruit");

        String result = service.getMeaning("apple");

        assertEquals("A fruit", result);
        verifyNoInteractions(repo);
    }

    @Test
    void testGetMeaning_CacheMiss_DBHit() {
        when(cache.get("banana")).thenReturn(null);
        when(repo.findWord("banana")).thenReturn(Optional.of(
            new Word.Builder().word("banana").meaning("Yellow fruit").build()
        ));

        String result = service.getMeaning("banana");

        assertEquals("Yellow fruit", result);
        verify(cache).set("banana", "Yellow fruit");
    }

    @Test
    void testGetMeaning_NotFound() {
        when(cache.get("kiwi")).thenReturn(null);
        when(repo.findWord("kiwi")).thenReturn(Optional.empty());

        assertThrows(RuntimeException.class, () -> service.getMeaning("kiwi"));
    }

    @Test
    void testApplyChangelog_ProcessorAndRepo() {
        // Mock changelog file and processor
        File dummyFile = new File("dummy.json");
        IChangelogProcessor mockProcessor = mock(IChangelogProcessor.class);
        List<Word> mockWords = List.of(
            new Word.Builder().word("pear").meaning("Juicy fruit").build()
        );
        when(mockProcessor.parseChangelog(dummyFile)).thenReturn(mockWords);

        // Replace factory for test
        try (MockedStatic<ChangelogProcessorFactory> mockedFactory = 
                mockStatic(ChangelogProcessorFactory.class)) {
            mockedFactory.when(() -> ChangelogProcessorFactory.getProcessor("json"))
                    .thenReturn(mockProcessor);

            service.applyChangelog(dummyFile, "json");

            verify(repo).saveWords(mockWords);
            verify(cache).evict("pear");
        }
    }
}

Basic Docker Compose to run:

Spring Boot App

Redis

PostgreSQL

version: "3.8"

services:
  dictionary-app:
    build: .
    ports:
      - "8080:8080"
    environment:
      - SPRING_DATASOURCE_URL=jdbc:postgresql://postgres:5432/dictionary
      - SPRING_DATASOURCE_USERNAME=postgres
      - SPRING_DATASOURCE_PASSWORD=postgres
      - SPRING_REDIS_HOST=redis
    depends_on:
      - postgres
      - redis

  postgres:
    image: postgres:15
    restart: always
    environment:
      POSTGRES_USER: postgres
      POSTGRES_PASSWORD: postgres
      POSTGRES_DB: dictionary
    ports:
      - "5432:5432"
    volumes:
      - pgdata:/var/lib/postgresql/data

  redis:
    image: redis:7
    restart: always
    ports:
      - "6379:6379"

volumes:
  pgdata:

Here’s a starter GKE-ready deployment.yaml for the Spring Boot app.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: dictionary-service
spec:
  replicas: 3
  selector:
    matchLabels:
      app: dictionary-service
  template:
    metadata:
      labels:
        app: dictionary-service
    spec:
      containers:
      - name: dictionary-app
        image: gcr.io/YOUR_PROJECT_ID/dictionary-app:latest
        ports:
        - containerPort: 8080
        env:
        - name: SPRING_DATASOURCE_URL
          value: jdbc:postgresql://POSTGRES_HOST:5432/dictionary
        - name: SPRING_DATASOURCE_USERNAME
          value: postgres
        - name: SPRING_DATASOURCE_PASSWORD
          value: postgres
        - name: SPRING_REDIS_HOST
          value: REDIS_HOST
---
apiVersion: v1
kind: Service
metadata:
  name: dictionary-service
spec:
  selector:
    app: dictionary-service
  ports:
    - protocol: TCP
      port: 80
      targetPort: 8080
  type: LoadBalancer

A production-ready Dockerfile for your Spring Boot JAR ✅ A Helm chart starter for GKE ✅ A basic GitHub Actions workflow to build, push to Google Container Registry (GCR), and deploy to GKE automatically.

This Dockerfile builds your Spring Boot JAR and runs it with OpenJDK 17.

# Use a minimal builder image
FROM eclipse-temurin:17-jdk as build

WORKDIR /app

# Copy Maven files and download dependencies first (layer caching)
COPY mvnw .
COPY .mvn .mvn
COPY pom.xml .
RUN ./mvnw dependency:go-offline

# Copy source and build
COPY src src
RUN ./mvnw package -DskipTests

# ===========================
# Use a slim base image
FROM eclipse-temurin:17-jre-alpine

WORKDIR /app

COPY --from=build /app/target/*.jar app.jar

EXPOSE 8080

ENTRYPOINT ["java", "-jar", "app.jar"]

Best Practice:

Multi-stage build keeps image small.

No Maven runtime in final image.

Runs with minimal JVM footprint.

Helm Chart Starter Helm makes it easy to template values for staging/prod. Here’s the folder structure and basic chart files. helm/dictionary/Chart.yaml

apiVersion: v2
name: dictionary
description: A Helm chart for the Dictionary Service
type: application
version: 0.1.0
appVersion: "1.0"

helm/dictionary/values.yaml

replicaCount: 3

image:
  repository: gcr.io/YOUR_PROJECT_ID/dictionary-app
  tag: latest
  pullPolicy: IfNotPresent

service:
  type: LoadBalancer
  port: 80

env:
  SPRING_DATASOURCE_URL: jdbc:postgresql://<POSTGRES_HOST>:5432/dictionary
  SPRING_DATASOURCE_USERNAME: postgres
  SPRING_DATASOURCE_PASSWORD: postgres
  SPRING_REDIS_HOST: <REDIS_HOST>

helm/dictionary/templates/deployment.yaml

apiVersion: apps/v1
kind: Deployment
metadata:
  name: {{ .Release.Name }}-deployment
spec:
  replicas: {{ .Values.replicaCount }}
  selector:
    matchLabels:
      app: {{ .Release.Name }}
  template:
    metadata:
      labels:
        app: {{ .Release.Name }}
    spec:
      containers:
      - name: {{ .Release.Name }}
        image: "{{ .Values.image.repository }}:{{ .Values.image.tag }}"
        imagePullPolicy: {{ .Values.image.pullPolicy }}
        ports:
        - containerPort: 8080
        env:
        - name: SPRING_DATASOURCE_URL
          value: "{{ .Values.env.SPRING_DATASOURCE_URL }}"
        - name: SPRING_DATASOURCE_USERNAME
          value: "{{ .Values.env.SPRING_DATASOURCE_USERNAME }}"
        - name: SPRING_DATASOURCE_PASSWORD
          value: "{{ .Values.env.SPRING_DATASOURCE_PASSWORD }}"
        - name: SPRING_REDIS_HOST
          value: "{{ .Values.env.SPRING_REDIS_HOST }}"

helm/dictionary/templates/service.yaml

apiVersion: v1
kind: Service
metadata:
  name: {{ .Release.Name }}-service
spec:
  type: {{ .Values.service.type }}
  selector:
    app: {{ .Release.Name }}
  ports:
    - protocol: TCP
      port: {{ .Values.service.port }}
      targetPort: 8080

Now deploy helm install dictionary ./helm/dictionary

GitHub Actions Workflow Here’s a simple .github/workflows/deploy.yml -

name: CI/CD Pipeline

on:
  push:
    branches: [ main ]

jobs:
  build:
    runs-on: ubuntu-latest

    steps:
    - name: Checkout code
      uses: actions/checkout@v3

    - name: Set up JDK
      uses: actions/setup-java@v3
      with:
        distribution: 'temurin'
        java-version: '17'

    - name: Build JAR
      run: ./mvnw package -DskipTests

    - name: Set up Docker
      uses: docker/setup-buildx-action@v3

    - name: Authenticate to GCP
      uses: google-github-actions/auth@v2
      with:
        credentials_json: ${{ secrets.GCP_SA_KEY }}

    - name: Configure Docker to use gcloud
      run: gcloud auth configure-docker

    - name: Build & push Docker image
      run: |
        docker build -t gcr.io/$GCP_PROJECT_ID/dictionary-app:latest .
        docker push gcr.io/$GCP_PROJECT_ID/dictionary-app:latest        
      env:
        GCP_PROJECT_ID: ${{ secrets.GCP_PROJECT_ID }}

    - name: Set up kubectl
      uses: azure/setup-kubectl@v3

    - name: Set up Helm
      uses: azure/setup-helm@v3

    - name: Deploy with Helm
      run: |
        helm upgrade --install dictionary ./helm/dictionary \
          --set image.repository=gcr.io/$GCP_PROJECT_ID/dictionary-app \
          --set image.tag=latest        
      env:
        GCP_PROJECT_ID: ${{ secrets.GCP_PROJECT_ID }}

Secrets needed:

GCP_SA_KEY → Service Account JSON Key (base64 encoded).

GCP_PROJECT_ID → Your GCP Project ID.

What You Now Have ✔️ Spring Boot LLD + SOLID + Design Patterns ✔️ Real unit tests ✔️ Docker build & Compose ✔️ Helm chart starter ✔️ GitHub Actions to auto build & deploy.

✅ 1️⃣ Terraform setup to provision GCP infra:

GKE cluster

Cloud SQL instance (PostgreSQL)

Memorystore (Redis)

IAM service account for GitHub Actions

✅ 2️⃣ Kubernetes Secrets for credentials (best practice)

✅ 3️⃣ Prometheus + Grafana add-on for live monitoring.

Below is a clean main.tf snippet. Organize into.

infra/
 ├── main.tf
 ├── variables.tf
 ├── outputs.tf
 ├── versions.tf

version.tf

terraform {
  required_providers {
    google = {
      source  = "hashicorp/google"
      version = "~> 5.0"
    }
  }
  required_version = ">= 1.3"
}

provider "google" {
  project = var.project_id
  region  = var.region
}

main.tf

# GKE Cluster
resource "google_container_cluster" "primary" {
  name     = var.gke_cluster_name
  location = var.region

  remove_default_node_pool = true

  initial_node_count = 1
}

resource "google_container_node_pool" "primary_nodes" {
  name       = "primary-node-pool"
  location   = var.region
  cluster    = google_container_cluster.primary.name

  node_config {
    machine_type = "e2-medium"
    oauth_scopes = [
      "https://www.googleapis.com/auth/cloud-platform",
    ]
  }

  initial_node_count = 3
}

# Cloud SQL (Postgres)
resource "google_sql_database_instance" "postgres" {
  name             = var.db_instance_name
  database_version = "POSTGRES_15"
  region           = var.region

  settings {
    tier = "db-f1-micro"
    backup_configuration {
      enabled = true
    }
  }
}

resource "google_sql_user" "postgres_user" {
  name     = "postgres"
  instance = google_sql_database_instance.postgres.name
  password = "change-me-strong"
}

# Memorystore (Redis)
resource "google_redis_instance" "redis" {
  name           = var.redis_instance_name
  tier           = "STANDARD_HA"
  memory_size_gb = 1
  region         = var.region
}

output.tf

output "gke_endpoint" {
  value = google_container_cluster.primary.endpoint
}

output "db_instance_connection_name" {
  value = google_sql_database_instance.postgres.connection_name
}

output "redis_host" {
  value = google_redis_instance.redis.host
}

output "redis_port" {
  value = google_redis_instance.redis.port
}

How to deploy.

cd infra
terraform init
terraform plan -out tfplan
terraform apply tfplan

Save the output connection strings for your Helm chart’s values.yaml.

Kubernetes Secrets Store DB credentials, Redis credentials, API keys — never plain env.

secret.yaml

apiVersion: v1
kind: Secret
metadata:
  name: dictionary-secrets
type: Opaque
data:
  POSTGRES_USER: cG9zdGdyZXM= # base64(postgres)
  POSTGRES_PASSWORD: Y2hhbmdlLW1lLXN0cm9uZw== # base64(change-me-strong)
  REDIS_HOST: cmVkaXMuaG9zdA== # base64(your Redis host)

Reference them in Helm -


env:
  - name: SPRING_DATASOURCE_USERNAME
    valueFrom:
      secretKeyRef:
        name: dictionary-secrets
        key: POSTGRES_USER

Use kubectl create secret or manage via Helm.

Prometheus + Grafana Add to Helm or use the community chart -

helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo add grafana https://grafana.github.io/helm-charts
helm repo update

helm install prometheus prometheus-community/prometheus
helm install grafana grafana/grafana

Your Spring Boot app exposes /actuator/prometheus? Add spring-boot-starter-actuator + micrometer.

management:
  endpoints:
    web:
      exposure:
        include: prometheus

Piece	What You Have
LLD	SOLID + patterns
Code	Service + Repo + Cache + Factory + Strategy
Tests	JUnit + Mockito
CI/CD	GitHub Actions → GCR → GKE
Infra	Terraform → GKE, Cloud SQL, Redis
Secrets	K8s Secret YAML
Observability	Prometheus + Grafana

“This system is 12-Factor, microservices ready, observable, secrets-managed, and fully automatable through GitOps — a real-world design for high QPS, low-latency lookups and easy changelog updates.”