Chapter 25 Flashcards — Compute as a Service

flashcards seg compute infrastructure

What is Compute as a Service (CaaS), and what problem does it solve?
?
CaaS is a managed infrastructure model in which a scheduling and orchestration system allocates compute resources to workloads on demand, abstracting away which physical or virtual machine a workload runs on. It solves the scaling problems of manual machine management: configuration drift, deployment toil, poor resource utilization, and operational overhead that grows linearly with fleet size.

What two properties do containers provide that make CaaS possible?
?
Portability: a container bundles an application and its dependencies into a single artifact that runs consistently on any host with a compatible runtime, eliminating “works on my machine” problems. Isolation: containers use Linux kernel primitives (cgroups for resource limits, namespaces for process/network isolation) to prevent workloads from interfering with each other on shared hardware. Together, these properties make multitenancy practical.

What is multitenancy in the context of CaaS?
?
Running workloads from multiple teams or services on shared physical hardware, managed by a scheduler that allocates resources and enforces isolation. Multitenancy dramatically improves utilization — a cluster running diverse workloads (serving, batch, ML) can fill idle capacity with work from different teams, whereas per-team dedicated fleets leave most hardware idle most of the time.

What is the foundational design principle for software running in a managed compute environment?
?
Assume your process will die at any time and design accordingly. The scheduler can kill any instance to reclaim resources, rebalance the fleet, or recover from machine failures. Software must handle graceful shutdown (respond to SIGTERM), avoid instance-local state that cannot be reconstructed, avoid instance affinity, and support horizontal scaling so the loss of any single instance does not affect correctness.

What is the Instance Affinity anti-pattern, and why is it dangerous in managed compute?
?
Instance affinity is designing a service so that a specific client must always reach a specific server instance — typically because session state is stored in that instance’s memory. It is dangerous because the scheduler cannot honor this constraint across rescheduling events: when the instance is killed, that client’s session state is lost. The solution is to externalize session state to a distributed cache (e.g., Redis, Memcached) so any instance can handle any request.

What are the key differences between a serving job and a batch job?
?
Serving jobs run indefinitely, respond to incoming requests, optimize for latency and availability, and are intolerant of preemption (losing an instance affects live users). Batch jobs run to completion, process a defined input set, optimize for throughput and efficiency, and are tolerant of preemption if work is checkpointed. The right design choices — health checks, graceful shutdown, checkpointing, idempotency — differ substantially between the two.

What two design properties must a well-designed batch job have to survive preemption in a managed compute environment?
?
Checkpointability: the job saves its progress periodically so that if preempted, it can restart from the last checkpoint rather than from the beginning — making long-running batch jobs practical on preemptible resources. Idempotency: running the job twice (due to retry after failure) produces the same result as running it once — critical when the job has side effects such as writing to a database, sending notifications, or charging accounts.

Why are health check endpoints mandatory for serving jobs in a CaaS environment?
?
The scheduler needs to know whether a serving instance is ready to receive traffic (readiness check) and alive (liveness check). Without health checks, the scheduler has no way to distinguish a healthy instance from one that is deadlocked, out of memory, or waiting for a dependency. An instance that fails health checks is removed from the load-balancer pool, preventing live traffic from being routed to broken instances.

What is the principle for managing state in a managed compute environment?
?
Externalize all state that must survive beyond a single process lifetime. Acceptable external stores include: distributed caches for session state, distributed databases for durable application data, object storage for large artifacts, and centralized config services for configuration. Local state is acceptable only for read-only caches (the authoritative data is elsewhere), in-flight request work, and ephemeral scratch space used within a single request lifecycle.

Why can you not hardcode IP addresses for service dependencies in a managed compute environment?
?
Because the instances serving a dependency are constantly being scheduled, rescheduled, and replaced — their IP addresses change as the scheduler moves them across the fleet. Instead, applications must use service discovery: DNS-based discovery (the scheduler registers services under a stable DNS name), a load balancer virtual IP (VIP) that routes to current healthy instances, or a service mesh (a sidecar proxy that handles routing transparently). All of these provide a stable indirection layer that survives instance churn.

What is declarative configuration in CaaS, and how does it differ from imperative configuration?
?
Declarative configuration describes the desired state (e.g., “10 replicas, 2 CPU, 4 GB RAM, container image v42”). The CaaS system takes responsibility for reaching and maintaining that state. Imperative configuration describes steps to take (“start machine, install library, copy binary, start service”). Declarative is superior because it is self-healing (the system reconciles drifted state automatically), auditable (the spec is version-controllable), and idempotent (applying it twice has the same effect as once).

What is the Configuration Drift anti-pattern in declarative CaaS environments?
?
Configuration drift occurs when the live state of a cluster is modified directly (e.g., via kubectl exec to edit a running container) without updating the declarative configuration. The live state diverges from the spec; the next deployment overwrites the manual change, losing both the change and the reasoning behind it. Best practice: all changes to desired state must go through the configuration file, committed to version control.

What is Borg, and what is its relationship to Kubernetes?
?
Borg is Google’s internal cluster management system, developed in the early 2000s, that runs workloads across Google’s entire fleet of hundreds of thousands of machines across multiple clusters. Kubernetes was created by Google engineers who built and operated Borg, directly applying Borg’s design lessons. Borg concepts map to Kubernetes: Borg cell = K8s cluster; Borg task = K8s Pod; Borg job = K8s Deployment/ReplicaSet; Borg alloc = K8s multi-container Pod; Borg Borglet = K8s Kubelet.

What is Borg resource overcommitment, and why does it improve utilization?
?
Borg overcommits resources — it schedules jobs as if the cluster has more capacity than physically exists, because most jobs request more resources than they actually use, and not all jobs hit their reserved peaks simultaneously. This fills hardware that would otherwise sit idle, dramatically improving utilization. The tradeoff: the scheduler must be able to preempt low-priority jobs (batch, development) when high-priority production serving jobs need resources back.

What are the benefits of a unified compute platform compared to per-team infrastructure?
?
A unified platform achieves: higher utilization (diverse workloads fill each other’s idle capacity); simpler operations (one platform to monitor, patch, upgrade); consistent tooling (engineers switching teams find familiar infrastructure); and economy of scale in procurement and capacity planning. The cost is that the shared platform imposes constraints — teams with unusual requirements may find it cannot accommodate them. Google’s guidance is that for the vast majority of workloads, centralization savings outweigh those constraints.

What is serverless computing, and where does it sit on the abstraction spectrum?
?
Serverless is the highest-abstraction CaaS model: the engineer specifies a function and an event trigger; the platform handles all resource allocation, scaling, and billing (per invocation or per compute unit consumed). The engineer does not specify CPU, memory, or replica count. It is above container-based CaaS on the abstraction ladder, removing all capacity planning but adding cold-start latency, platform lock-in, and limitations on workload complexity and duration.

What workloads are a good fit for serverless, and which are not?
?
Good fit: event-driven processing, infrequent or unpredictable workloads (scale-to-zero is economically valuable), simple transformation/routing functions, teams with limited operational capacity. Poor fit: latency-sensitive serving (cold-start latency is unacceptable), stable high-throughput workloads (per-invocation pricing is more expensive than reserved capacity), long-running tasks, and workloads requiring persistent local state or complex inter-process communication.

What is the centralization vs. customization trade-off in choosing compute infrastructure?
?
Centralization (shared platform): lower operational burden, consistent tooling, higher utilization through resource sharing, economy of scale — at the cost of constraints on workload configuration. Customization (own infrastructure): full control over hardware, networking, OS — at the cost of full operational burden, no economy of scale, and siloed operational knowledge. The book’s guidance: strongly prefer centralization for the vast majority of workloads; customization is justified only when requirements are genuinely unusual and the shared platform cannot accommodate them.

How does containerization change the contract between applications and infrastructure?
?
Without containers, an application depends on a specific OS version, library versions, filesystem layout, and user accounts — tightly coupling it to a specific machine configuration. With containers, the application depends only on the container runtime (a thin, stable interface), and the infrastructure depends on nothing about the application’s internals. This enables independent evolution: the cluster operator can upgrade hardware and OS without coordinating with application teams, and application teams can change dependencies and runtimes without coordinating with cluster operators.

What does “architecting for failure” mean concretely in a managed compute context?
?
Specifically: (1) no durable instance-local state — any state that must survive process death is externalized to a distributed store; (2) graceful shutdown on SIGTERM — the process finishes in-flight requests and closes connections cleanly within the scheduler’s termination window; (3) no instance affinity — any instance can handle any request, so rescheduling is transparent; (4) horizontal scalability — the service can run with N replicas, so losing one instance reduces capacity but not correctness.

What is the role of one-off code in a CaaS environment, and how should it be handled?
?
One-off code (admin scripts, data migrations, ad-hoc analysis) still needs to run somewhere. In a CaaS environment, it should run as an interactive job submitted to the cluster using the same container infrastructure as production, giving it the same network access and credentials without requiring direct SSH access to production machines. This maintains a clean audit boundary (everything that runs in production is a scheduled container) and eliminates the security and consistency risks of ad-hoc machine access.

What key operational lesson did Borg validate about hardware failures at scale?
?
At the scale of tens of thousands of machines, hardware failures are not exceptional events — they are the normal operating condition. Every machine in the fleet will fail; the question is when, not whether. Software that is not designed to survive process termination will experience routine outages. Borg’s operational history proved that treating failure as exceptional is a design error at scale; treating it as a routine condition that the software handles gracefully is the only viable approach.

What is the primary advantage of version-controlling declarative CaaS configuration?
?
The configuration file becomes the complete, auditable record of what is deployed: what container image, how many replicas, how much CPU and memory, what health-check endpoints. Version-controlling it provides a full history of every deployment — what was deployed when, by whom, and (via commit messages) why. Combined with the idempotency of declarative configuration, it makes rollbacks straightforward (revert to the previous spec and apply it) and post-incident analysis tractable.

Total Cards: 23
Review Time: ~20 minutes
Priority: MEDIUM
Last Updated: 2026-06-02

Study Notes by Niladri & AI

Explorer

ch25-flashcards

Chapter 25 Flashcards — Compute as a Service

Graph View