[{"data":1,"prerenderedAt":3898},["ShallowReactive",2],{"blog--docker-security-under-the-hood":3},{"title":4,"description":5,"date":6,"tags":7,"body":12},"Docker Security Under the Hood: Runtime Security Model Explained","A deep dive into Docker's security model: Linux capabilities, seccomp, AppArmor, user namespaces, and how defense in depth protects containers in production.","2026-07-02",[8,9,10,11],"Docker","Security","Linux","Containers",{"type":13,"value":14,"toc":3846},"minimark",[15,20,24,27,51,54,65,75,79,82,87,90,93,107,110,113,136,139,144,147,162,165,168,188,191,195,198,246,255,258,261,266,275,278,281,285,288,293,296,301,304,307,311,314,321,324,341,344,351,354,389,392,396,399,404,407,410,413,416,420,423,426,470,473,476,479,482,485,489,492,495,499,502,516,519,522,525,528,602,606,609,623,626,675,682,695,702,714,729,735,738,758,765,771,790,793,807,810,817,823,826,865,870,877,880,886,889,896,902,905,922,928,934,940,943,963,969,973,988,993,1039,1044,1172,1175,1178,1192,1195,1199,1252,1256,1372,1375,1379,1382,1385,1388,1393,1409,1413,1449,1453,1473,1476,1480,1483,1486,1506,1509,1516,1519,1522,1541,1546,1550,1571,1575,1622,1626,1716,1722,1726,1729,1732,1751,1754,1758,1761,1764,1781,1785,1788,1791,1794,1797,1804,1807,1814,1817,1821,1843,1847,1886,1890,1911,1917,1923,1932,1949,1955,1964,1972,1978,1980,1995,1998,2004,2011,2020,2025,2033,2036,2040,2052,2065,2071,2089,2098,2109,2113,2116,2119,2122,2135,2139,2142,2145,2148,2151,2155,2158,2184,2187,2191,2194,2197,2200,2204,2213,2216,2219,2223,2226,2229,2234,2237,2241,2244,2251,2254,2258,2261,2264,2271,2275,2278,2298,2308,2311,2315,2318,2321,2328,2331,2335,2338,2345,2358,2364,2380,2383,2391,2397,2438,2444,2447,2450,2454,2457,2460,2463,2466,2469,2473,2476,2480,2537,2541,2586,2589,2596,2599,2602,2637,2640,2644,2647,2650,2654,2707,2711,2753,2756,2760,2767,2770,2787,2790,2844,2853,2859,2880,2884,2887,2913,2918,2925,2931,2934,2938,2941,2951,2954,2957,2973,2979,2985,2989,2992,2995,2998,3001,3024,3027,3030,3034,3273,3279,3502,3508,3814,3817,3821,3827,3830,3833,3836,3842],[16,17,19],"h1",{"id":18},"introduction","Introduction",[21,22,23],"p",{},"Container security has improved dramatically over the past decade, but the advice surrounding it often hasn't.",[21,25,26],{},"If you've searched for Docker security best practices, you've probably seen recommendations like:",[28,29,30,34,37,40,43],"ul",{},[31,32,33],"li",{},"Don't run as root.",[31,35,36],{},"Drop unnecessary capabilities.",[31,38,39],{},"Use a read-only filesystem.",[31,41,42],{},"Enable seccomp.",[31,44,45,46,50],{},"Never use ",[47,48,49],"code",{},"--privileged",".",[21,52,53],{},"These are all solid recommendations, however they're often presented as rules to follow rather than concepts to understand.",[21,55,56,57,64],{},"One of my favorite resources for security best practices is ",[58,59,63],"a",{"href":60,"rel":61},"https:\u002F\u002Fcheatsheetseries.owasp.org\u002Fcheatsheets\u002FDocker_Security_Cheat_Sheet.html",[62],"nofollow","OWASP Docker Cheat Sheet",". It contains an excellent collection of recommendations. This post builds on that foundation by explaining the Linux security mechanisms behind each recommendation, what they do, why they exist, and the trade-offs involved. By the end, you'll understand not just which security options to use, but why they matter and where to apply them.",[66,67,68],"blockquote",{},[21,69,70,74],{},[71,72,73],"strong",{},"TL;DR:"," This post covers Docker's runtime security model, Linux capabilities, seccomp, AppArmor, user namespaces, read-only filesystems, resource limits, and daemon security. Each section explains the mechanism, why it matters, and how to configure it in Docker, Compose, and Kubernetes. The focus is on limiting what an attacker can do after gaining code execution in a production container. Supply chain, secrets, and network security are not covered here.",[16,76,78],{"id":77},"threat-model","Threat Model",[21,80,81],{},"Before discussing Docker's security features, we need to answer a simple question:",[66,83,84],{},[21,85,86],{},"Who are we defending against?",[21,88,89],{},"Security recommendations only make sense within the context of a threat model. A container running on your development machine has very different security requirements than a public-facing production API.",[21,91,92],{},"For the rest of this article, we'll assume the following scenario:",[28,94,95,98,101,104],{},[31,96,97],{},"A production application is running inside a Docker container.",[31,99,100],{},"The application is accessible to untrusted users over the network.",[31,102,103],{},"Due to an application vulnerability, an attacker successfully gains remote code execution (RCE) inside the container.",[31,105,106],{},"The attacker can now execute arbitrary commands with the same privileges as the application process.",[21,108,109],{},"At this point, Docker has not failed. Preventing vulnerabilities such as SQL injection, command injection, or insecure deserialization is the responsibility of the application, not the container runtime. Docker's role begins after the application has been compromised.",[21,111,112],{},"The attacker's objectives may include:",[28,114,115,118,121,124,127,130,133],{},[31,116,117],{},"Reading sensitive data such as API keys, credentials, or mounted secrets.",[31,119,120],{},"Modifying the application or its configuration.",[31,122,123],{},"Establishing persistence so access survives application restarts.",[31,125,126],{},"Escalating privileges within the container.",[31,128,129],{},"Escaping the container and compromising the host.",[31,131,132],{},"Accessing or interfering with other containers.",[31,134,135],{},"Consuming excessive system resources to cause a denial of service.",[21,137,138],{},"The purpose of container hardening is to limit the attacker's capabilities after a successful compromise, making each of these objectives more difficult or, ideally, impossible.",[140,141,143],"h2",{"id":142},"a-real-world-example","A Real-World Example",[21,145,146],{},"This isn't a hypothetical scenario. Critical remote code execution vulnerabilities in containerized applications are discovered every year.",[21,148,149,150,155,156,161],{},"One recent example was the React Server Components vulnerability (",[58,151,154],{"href":152,"rel":153},"https:\u002F\u002Fnvd.nist.gov\u002Fvuln\u002Fdetail\u002FCVE-2025-55182",[62],"CVE-2025-55182","), which also affected downstream Next.js applications (initially tracked as ",[58,157,160],{"href":158,"rel":159},"https:\u002F\u002Fnextjs.org\u002Fblog\u002FCVE-2025-66478",[62],"CVE-2025-66478"," for Next.js). Under certain conditions, an unauthenticated attacker could execute arbitrary code on the server by sending a crafted HTTP request to a vulnerable application.",[21,163,164],{},"Imagine your application is running inside a Docker container. The attacker successfully exploits the vulnerability and gains code execution inside the application process.",[21,166,167],{},"Now ask yourself:",[28,169,170,173,176,179,182,185],{},[31,171,172],{},"Can they modify your application?",[31,174,175],{},"Can they steal mounted secrets?",[31,177,178],{},"Can they establish persistence?",[31,180,181],{},"Can they access other containers?",[31,183,184],{},"Can they escape to the host?",[31,186,187],{},"Can they exhaust the host's resources?",[21,189,190],{},"These are the questions Docker's security features are designed to answer. Container hardening isn't about preventing the initial vulnerability. That responsibility belongs to the application and its dependencies. Instead, it's about limiting what an attacker can do after they've gained code execution.",[16,192,194],{"id":193},"the-docker-security-boundary","The Docker Security Boundary",[21,196,197],{},"To understand what Docker can and cannot protect against, you first need to understand where Docker fits in the system stack.",[199,200,204,205],"div",{"className":201},[202,203],"not-prose","my-6","\n  ",[199,206,214,215,214,225,214,230,214,234,214,238,214,242,204],{"className":207},[208,209,210,211,212,213],"mx-auto","max-w-sm","rounded-lg","border","bg-card","text-sm","\n    ",[199,216,224],{"className":217},[218,219,220,221,222,223],"border-b","px-4","py-2.5","text-center","font-medium","text-foreground","Application",[199,226,229],{"className":227},[218,219,220,221,228],"text-muted-foreground","Container",[199,231,233],{"className":232},[218,219,220,221,228],"OCI runtime (runc)",[199,235,237],{"className":236},[218,219,220,221,228],"containerd",[199,239,241],{"className":240},[218,219,220,221,228],"Docker Engine",[199,243,245],{"className":244},[218,219,220,221,228],"Linux Kernel",[21,247,248,249,251,252,254],{},"Docker's responsibility covers everything from the ",[47,250,241],{}," layer up to the ",[47,253,229],{}," layer. It orchestrates container lifecycles, sets up namespaces, applies capability sets, configures seccomp profiles, attaches LSMs, and manages cgroups.",[21,256,257],{},"What Docker does not do is provide a security boundary against the Linux kernel. The kernel is shared between all containers on the host, and Docker configures kernel security mechanisms, it does not add a new security layer on top of them.",[21,259,260],{},"This distinction is critical:",[66,262,263],{},[21,264,265],{},"Docker is not a security boundary against kernel vulnerabilities.",[21,267,268,269,274],{},"If a vulnerability exists in a kernel subsystem, OverlayFS, eBPF, netfilter, io_uring, or any other, an attacker who can interact with that subsystem may be able to bypass Docker's isolation entirely. These aren't misconfigurations; they're kernel bugs that Docker cannot patch. Many of the most impactful container escapes in recent years have exploited kernel vulnerabilities rather than Docker configuration flaws. For example, ",[58,270,273],{"href":271,"rel":272},"https:\u002F\u002Fnvd.nist.gov\u002Fvuln\u002Fdetail\u002FCVE-2023-0386",[62],"CVE-2023-0386"," was a container escape in the Linux kernel's OverlayFS filesystem that allowed an unprivileged attacker to gain root privileges on the host by mounting a crafted filesystem inside a container.",[21,276,277],{},"This doesn't mean Docker is insecure. It means that container security is ultimately Linux security. Understanding the kernel mechanisms Docker relies on is essential to understanding both its strengths and its limits.",[21,279,280],{},"Throughout the rest of this article, we'll examine how Docker features such as running as a non-root user, dropping Linux capabilities, using a read-only filesystem, enabling seccomp, and applying AppArmor or SELinux policies work together to reduce the impact of a successful compromise. Rather than preventing every attack, these mechanisms are designed to reduce the blast radius when an attacker inevitably gets in.",[16,282,284],{"id":283},"running-containers-as-a-non-root-user","Running Containers as a Non-Root User",[21,286,287],{},"One of the first recommendations you'll encounter in almost every Docker security guide is:",[66,289,290],{},[21,291,292],{},"Don't run your containers as root.",[21,294,295],{},"It's a good recommendation, but it's also one of the most misunderstood. A common question from developers is:",[66,297,298],{},[21,299,300],{},"If containers are already isolated, why does it matter whether my application runs as root?",[21,302,303],{},"The short answer is that root inside a container is not the same as root on the host, but it's still far more privileged than a regular user inside that container.",[21,305,306],{},"Understanding this distinction is key to understanding why running as a non-root user is considered a fundamental container hardening practice.",[140,308,310],{"id":309},"root-inside-a-container-isnt-host-root","Root Inside a Container Isn't Host Root",[21,312,313],{},"On a traditional Linux system, the root user (UID 0) has unrestricted access to almost every part of the operating system. Containers change this model.",[21,315,316,317,320],{},"Processes inside a container still have a user ID, and if that user is ",[47,318,319],{},"root"," (UID 0), they are considered root within the container's user namespace. However, Docker applies multiple security mechanisms, including Linux capabilities, namespaces, seccomp, and Linux Security Modules (AppArmor or SELinux) that prevent container root from exercising many of the privileges that host root normally possesses.",[21,322,323],{},"For example, a root process inside a default Docker container cannot:",[28,325,326,329,332,335,338],{},[31,327,328],{},"Load kernel modules.",[31,330,331],{},"Mount arbitrary filesystems.",[31,333,334],{},"Modify kernel parameters.",[31,336,337],{},"Change the system clock.",[31,339,340],{},"Inspect or control arbitrary host processes.",[21,342,343],{},"These operations require privileges that Docker intentionally removes or restricts. So while container root is certainly not equivalent to host root, it is still the most privileged user inside the container.",[140,345,347,348],{"id":346},"why-kubernetes-recommends-runasnonroot","Why Kubernetes Recommends ",[47,349,350],{},"runAsNonRoot",[21,352,353],{},"If you've deployed applications on Kubernetes, you've probably encountered the following security context:",[355,356,361],"pre",{"className":357,"code":358,"language":359,"meta":360,"style":360},"language-yaml shiki shiki-themes catppuccin-mocha catppuccin-mocha catppuccin-mocha","securityContext:\n  runAsNonRoot: true\n","yaml","",[47,362,363,376],{"__ignoreMap":360},[364,365,368,372],"span",{"class":366,"line":367},"line",1,[364,369,371],{"class":370},"si09J","securityContext",[364,373,375],{"class":374},"sG44b",":\n",[364,377,379,382,385],{"class":366,"line":378},2,[364,380,381],{"class":370},"  runAsNonRoot",[364,383,384],{"class":374},":",[364,386,388],{"class":387},"srg_i"," true\n",[21,390,391],{},"This setting instructs the kubelet to verify that the container does not start as UID 0. If the image is configured to run as root, Kubernetes refuses to start the container. This isn't because Kubernetes distrusts Docker's isolation. Most applications simply don't require root privileges to serve HTTP requests, process jobs, or interact with databases. Running them as an unprivileged user removes an entire class of post-exploitation techniques with very little operational cost.",[140,393,395],{"id":394},"container-escapes-and-why-they-matter","Container Escapes and Why They Matter",[21,397,398],{},"At this point, you might ask:",[66,400,401],{},[21,402,403],{},"If root inside a container isn't real root, why should I care?",[21,405,406],{},"Because container escapes exist.",[21,408,409],{},"Although rare, vulnerabilities in the Linux kernel or container runtime have occasionally allowed attackers to break out of containers and execute code on the host. If the compromised process is already running with extensive privileges, exploiting such vulnerabilities often becomes significantly easier or more impactful. Running applications as a non-root user doesn't eliminate the possibility of a container escape, but it can reduce the privileges available to an attacker if one does occur.",[21,411,412],{},"Historically, many of the most dangerous container escapes have not bypassed Docker's configuration at all, they've exploited vulnerabilities in the kernel itself. Attackers have leveraged flaws in OverlayFS, abused eBPF for privilege escalation, manipulated netfilter and nftables to reach kernel code, and exploited io_uring for arbitrary read\u002Fwrite primitives. In each case, the attacker didn't need to break Docker, they needed to break Linux.",[21,414,415],{},"This is why every kernel hardening layer matters. Each one removes a potential attack surface that an exploit could target.",[16,417,419],{"id":418},"user-namespaces","User Namespaces",[21,421,422],{},"So far, we've assumed Docker's default configuration, where UID 0 inside the container maps directly to UID 0 on the host. Linux also provides user namespaces, which allow container user IDs to be remapped. This is one of the strongest mitigations available against container escape impact, and it deserves to be treated as a first-class security mechanism rather than an optional add-on.",[21,424,425],{},"Here's how the remapping works:",[427,428,429,442],"table",{},[430,431,432],"thead",{},[433,434,435,439],"tr",{},[436,437,438],"th",{},"Inside the container",[436,440,441],{},"On the host",[443,444,445,454,462],"tbody",{},[433,446,447,451],{},[448,449,450],"td",{},"UID 0 (root)",[448,452,453],{},"UID 100000",[433,455,456,459],{},[448,457,458],{},"UID 1",[448,460,461],{},"UID 100001",[433,463,464,467],{},[448,465,466],{},"UID 1000",[448,468,469],{},"UID 101000",[21,471,472],{},"From the container's perspective, the application is still running as root. From the host's perspective, however, that process is just another unprivileged user. This fundamentally changes the impact of a container escape.",[21,474,475],{},"Consider what happens if an attacker finds a kernel vulnerability that lets them execute code outside the container's namespaces. Without user namespaces, the escaped process retains UID 0 on the host, meaning the attacker already has root-level access to the system. With user namespaces enabled, the escaped process is UID 100000, an ordinary unprivileged user. The attacker has escaped the container but gained no elevated privileges on the host.",[21,477,478],{},"While capabilities, seccomp, and LSMs all restrict what a process can do, user namespaces restrict the fundamental identity of the process itself. An attacker who escapes the container namespace still has to find a separate privilege escalation vulnerability on the host. This raises the bar from \"escape the container\" to \"escape the container and then compromise the host.\"",[21,480,481],{},"Despite their effectiveness, user namespaces are not enabled by default in many Docker installations. The primary reason is compatibility: bind mounts, file ownership mapping, and certain storage backends behave differently when UIDs are remapped. Some images assume they can write files as root that will be owned by root on the host, which breaks under user namespace remapping. These are solvable problems, but they require configuration and testing that many deployments don't invest in.",[21,483,484],{},"For production environments where security requirements are high, enabling user namespaces should be a priority. The protection they provide against container escape impact is difficult to achieve through any other single mechanism.",[16,486,488],{"id":487},"linux-capabilities","Linux Capabilities",[21,490,491],{},"If running containers as a non-root user is the first step toward reducing privileges, Linux capabilities are the second.",[21,493,494],{},"In fact, even if your application runs as root inside a container, it still doesn't possess the same privileges as root on a traditional Linux system. That's because modern Linux no longer treats root as an all-or-nothing concept. Instead, it breaks privileged operations into a collection of discrete capabilities.",[140,496,498],{"id":497},"why-linux-split-root-into-capabilities","Why Linux Split Root Into Capabilities",[21,500,501],{},"Historically, Unix had only two privilege levels:",[28,503,504,510],{},[31,505,506,509],{},[71,507,508],{},"Root (UID 0)",": unrestricted access to the system.",[31,511,512,515],{},[71,513,514],{},"Everyone else",": restricted access.",[21,517,518],{},"This model was simple, but often too coarse. Consider a web server like Nginx. It needs to bind to port 80, but it doesn't need to load kernel modules, modify the system clock, or reboot the machine.",[21,520,521],{},"Under the traditional Unix permission model, there was no way to grant only the privileges required to bind to a privileged port, you had to run the process as root, granting it far more power than necessary.",[21,523,524],{},"Linux capabilities solve this problem by decomposing root privileges into individual permissions. Each capability represents a specific privileged operation, allowing processes to receive only the permissions they actually need.",[21,526,527],{},"For example:",[427,529,530,540],{},[430,531,532],{},[433,533,534,537],{},[436,535,536],{},"Capability",[436,538,539],{},"Allows",[443,541,542,552,562,572,582,592],{},[433,543,544,549],{},[448,545,546],{},[47,547,548],{},"CAP_NET_BIND_SERVICE",[448,550,551],{},"Bind to ports below 1024",[433,553,554,559],{},[448,555,556],{},[47,557,558],{},"CAP_NET_ADMIN",[448,560,561],{},"Configure network interfaces, routing tables, firewall rules, and more",[433,563,564,569],{},[448,565,566],{},[47,567,568],{},"CAP_SYS_PTRACE",[448,570,571],{},"Trace or debug other processes",[433,573,574,579],{},[448,575,576],{},[47,577,578],{},"CAP_SYS_MODULE",[448,580,581],{},"Load and unload kernel modules",[433,583,584,589],{},[448,585,586],{},[47,587,588],{},"CAP_SYS_TIME",[448,590,591],{},"Modify the system clock",[433,593,594,599],{},[448,595,596],{},[47,597,598],{},"CAP_SYS_ADMIN",[448,600,601],{},"Perform a broad range of administrative operations",[140,603,605],{"id":604},"dockers-default-capability-set","Docker's Default Capability Set",[21,607,608],{},"When you start a container, Docker doesn't give it every Linux capability. Instead, it grants a subset intended to cover the majority of common workloads while dropping capabilities considered too dangerous.",[21,610,611,612,614,615,614,618,614,620,622],{},"It's important to understand what this default set actually represents. Docker's default capability set is not a carefully optimized security policy. It's a compromise between compatibility and security. Docker's maintainers reviewed the Linux capability list and dropped those that were clearly dangerous (",[47,613,578],{},", ",[47,616,617],{},"CAP_SYS_BOOT",[47,619,588],{},[47,621,598],{},") while keeping everything else that seemed reasonably safe for common workloads. The result is permissive by design, Docker would rather break nothing by default than force users to debug capability issues.",[21,624,625],{},"You can inspect the capabilities available to a process inside a container:",[355,627,631],{"className":628,"code":629,"language":630,"meta":360,"style":360},"language-bash shiki shiki-themes catppuccin-mocha catppuccin-mocha catppuccin-mocha","docker run --rm alpine sh -c \"\napk add --no-cache libcap >\u002Fdev\u002Fnull\ncapsh --print\n\"\n","bash",[47,632,633,658,663,669],{"__ignoreMap":360},[364,634,635,639,643,646,649,652,655],{"class":366,"line":367},[364,636,638],{"class":637},"seEE7","docker",[364,640,642],{"class":641},"swpoh"," run",[364,644,645],{"class":641}," --rm",[364,647,648],{"class":641}," alpine",[364,650,651],{"class":641}," sh",[364,653,654],{"class":641}," -c",[364,656,657],{"class":641}," \"\n",[364,659,660],{"class":366,"line":378},[364,661,662],{"class":641},"apk add --no-cache libcap >\u002Fdev\u002Fnull\n",[364,664,666],{"class":366,"line":665},3,[364,667,668],{"class":641},"capsh --print\n",[364,670,672],{"class":366,"line":671},4,[364,673,674],{"class":641},"\"\n",[21,676,677],{},[678,679],"img",{"alt":680,"src":681},"Default capabilities","\u002Fimages\u002Fblog\u002Fdocker-security-under-the-hood\u002FScreenshot1.png",[21,683,684,685,614,687,614,689,691,692,694],{},"You'll notice that capabilities such as ",[47,686,578],{},[47,688,617],{},[47,690,598],{},", and ",[47,693,588],{}," are absent. This is why a root process inside a default Docker container cannot perform many operations that host root can.",[140,696,698,699],{"id":697},"dropping-capabilities-with-cap-drop","Dropping Capabilities with ",[47,700,701],{},"--cap-drop",[21,703,704,705,614,707,614,709,691,711,713],{},"Docker already removes many high-risk capabilities by default. Notice that capabilities such as ",[47,706,598],{},[47,708,578],{},[47,710,558],{},[47,712,568],{}," are absent from the container's capability set.",[21,715,716,717,720,721,724,725,728],{},"However, Docker still grants a number of capabilities that many applications don't actually need. For example, a typical Next.js application doesn't need to create device nodes (",[47,718,719],{},"CAP_MKNOD","), send raw network packets (",[47,722,723],{},"CAP_NET_RAW","), or change file ownership (",[47,726,727],{},"CAP_CHOWN",").",[21,730,731,732,734],{},"This is where the ",[47,733,701],{}," flag becomes useful. Rather than relying solely on Docker's default capability set, you can explicitly remove capabilities that your workload doesn't require.",[21,736,737],{},"To start with an empty capability set:",[355,739,741],{"className":628,"code":740,"language":630,"meta":360,"style":360},"docker run --cap-drop ALL nginx\n",[47,742,743],{"__ignoreMap":360},[364,744,745,747,749,752,755],{"class":366,"line":367},[364,746,638],{"class":637},[364,748,642],{"class":641},[364,750,751],{"class":641}," --cap-drop",[364,753,754],{"class":641}," ALL",[364,756,757],{"class":641}," nginx\n",[21,759,760,761,764],{},"If you inspect the process again with ",[47,762,763],{},"capsh --print",", you'll notice that the effective and bounding capability sets are empty. The process may still run as UID 0, but it no longer possesses any Linux capabilities beyond those of a regular unprivileged process.",[21,766,767],{},[678,768],{"alt":769,"src":770},"Dropped capabilities","\u002Fimages\u002Fblog\u002Fdocker-security-under-the-hood\u002FScreenshot2.png",[21,772,773,774,614,776,691,778,781,782,785,786,789],{},"Notice the differences between the two outputs. In the first example, the process is running as UID 0 (root) and is granted Docker's default capability set, including capabilities such as ",[47,775,727],{},[47,777,548],{},[47,779,780],{},"CAP_SETFCAP",". After starting the container with ",[47,783,784],{},"--cap-drop ALL",", both the Current and Bounding capability sets become empty, yet the process is still running as UID 0 (root). This clearly demonstrates one of the key concepts behind Linux capabilities: being root is not the same as being privileged. A process may have a user ID of ",[47,787,788],{},"0",", but without the necessary capabilities, the kernel will deny many operations that would normally be available to a fully privileged root process.",[21,791,792],{},"In practice, most applications require at least one or two capabilities. A common hardening strategy is therefore:",[794,795,796,801,804],"ol",{},[31,797,798,799,50],{},"Drop every capability using ",[47,800,784],{},[31,802,803],{},"Start the application.",[31,805,806],{},"Add back only the capabilities required for it to function correctly.",[21,808,809],{},"This approach, granting a process only the permissions it needs and nothing more, is the Principle of Least Privilege in practice.",[140,811,813,814],{"id":812},"adding-capabilities-with-cap-add","Adding Capabilities with ",[47,815,816],{},"--cap-add",[21,818,819,820,822],{},"Suppose you're running Nginx and want it to listen on port 80. Binding to privileged ports requires the ",[47,821,548],{}," capability.",[21,824,825],{},"Instead of granting broad privileges, you can add only that capability:",[355,827,829],{"className":628,"code":828,"language":630,"meta":360,"style":360},"docker run \\\n  --cap-drop ALL \\\n  --cap-add NET_BIND_SERVICE \\\n  nginx\n",[47,830,831,841,850,860],{"__ignoreMap":360},[364,832,833,835,837],{"class":366,"line":367},[364,834,638],{"class":637},[364,836,642],{"class":641},[364,838,840],{"class":839},"seFKw"," \\\n",[364,842,843,846,848],{"class":366,"line":378},[364,844,845],{"class":641},"  --cap-drop",[364,847,754],{"class":641},[364,849,840],{"class":839},[364,851,852,855,858],{"class":366,"line":665},[364,853,854],{"class":641},"  --cap-add",[364,856,857],{"class":641}," NET_BIND_SERVICE",[364,859,840],{"class":839},[364,861,862],{"class":366,"line":671},[364,863,864],{"class":641},"  nginx\n",[21,866,867,868,50],{},"This is significantly safer than running the container with Docker's default capability set, or worse, using ",[47,869,49],{},[140,871,873,874,876],{"id":872},"why-cap_sys_admin-is-considered-the-new-root","Why ",[47,875,598],{}," Is Considered \"The New Root\"",[21,878,879],{},"Among all Linux capabilities, one deserves special attention:",[21,881,882],{},[71,883,884],{},[47,885,598],{},[21,887,888],{},"If you've spent any time reading kernel documentation or security advisories, you've probably encountered the phrase:",[66,890,891],{},[21,892,893,895],{},[47,894,598],{}," is the new root.",[21,897,898,899,901],{},"The nickname is well deserved. Unlike capabilities that grant a single, narrowly scoped privilege, ",[47,900,598],{}," covers an enormous collection of unrelated administrative operations.",[21,903,904],{},"Processes with this capability may be allowed to:",[28,906,907,910,913,916,919],{},[31,908,909],{},"Mount and unmount filesystems.",[31,911,912],{},"Perform namespace operations.",[31,914,915],{},"Configure certain kernel interfaces.",[31,917,918],{},"Execute privileged filesystem operations.",[31,920,921],{},"Interact with eBPF and other advanced kernel features (depending on the kernel version).",[21,923,924,925,927],{},"Over the years, many kernel vulnerabilities and container escape techniques have relied on obtaining ",[47,926,598],{},". For this reason, granting it should be treated with extreme caution. If your application doesn't explicitly require it, don't add it.",[140,929,931,933],{"id":930},"cap_net_admin-more-powerful-than-it-sounds",[47,932,558],{},": More Powerful Than It Sounds",[21,935,936,937,939],{},"Another commonly misunderstood capability is ",[47,938,558],{},". Despite its name, it doesn't simply allow a process to administer networking.",[21,941,942],{},"It enables a wide variety of privileged networking operations, including:",[28,944,945,948,951,954,957,960],{},[31,946,947],{},"Creating or modifying network interfaces.",[31,949,950],{},"Configuring routing tables.",[31,952,953],{},"Managing firewall rules.",[31,955,956],{},"Enabling packet forwarding.",[31,958,959],{},"Changing network namespaces.",[31,961,962],{},"Configuring traffic control (tc).",[21,964,965,966,968],{},"These privileges are perfectly reasonable for networking software such as VPN servers, CNI plugins, or software-defined networking components. They are almost never required for a typical web application. Granting ",[47,967,558],{}," to an application that only serves HTTP unnecessarily increases the impact of a successful compromise.",[140,970,972],{"id":971},"example-usage","Example Usage",[21,974,975,976,979,980,983,984,987],{},"Suppose we're deploying a Next.js application to production. Most Next.js applications don't expose ports ",[47,977,978],{},"80"," or ",[47,981,982],{},"443"," directly. Instead, they listen on an unprivileged port such as ",[47,985,986],{},"3000"," while a reverse proxy like Nginx, Traefik, or HAProxy handles incoming HTTP and HTTPS traffic. In that scenario, the application doesn't require any Linux capabilities.",[21,989,990],{},[71,991,992],{},"Docker Compose:",[355,994,996],{"className":357,"code":995,"language":359,"meta":360,"style":360},"services:\n  nextjs:\n    image: my-nextjs-app:latest\n    cap_drop:\n      - ALL\n",[47,997,998,1005,1012,1022,1029],{"__ignoreMap":360},[364,999,1000,1003],{"class":366,"line":367},[364,1001,1002],{"class":370},"services",[364,1004,375],{"class":374},[364,1006,1007,1010],{"class":366,"line":378},[364,1008,1009],{"class":370},"  nextjs",[364,1011,375],{"class":374},[364,1013,1014,1017,1019],{"class":366,"line":665},[364,1015,1016],{"class":370},"    image",[364,1018,384],{"class":374},[364,1020,1021],{"class":641}," my-nextjs-app:latest\n",[364,1023,1024,1027],{"class":366,"line":671},[364,1025,1026],{"class":370},"    cap_drop",[364,1028,375],{"class":374},[364,1030,1032,1036],{"class":366,"line":1031},5,[364,1033,1035],{"class":1034},"sKAjW","      -",[364,1037,1038],{"class":641}," ALL\n",[21,1040,1041],{},[71,1042,1043],{},"Kubernetes:",[355,1045,1047],{"className":357,"code":1046,"language":359,"meta":360,"style":360},"apiVersion: apps\u002Fv1\nkind: Deployment\nmetadata:\n  name: nextjs\nspec:\n  template:\n    spec:\n      containers:\n        - name: nextjs\n          image: my-nextjs-app:latest\n          securityContext:\n            capabilities:\n              drop:\n                - ALL\n",[47,1048,1049,1059,1069,1076,1086,1093,1101,1109,1117,1130,1140,1148,1156,1164],{"__ignoreMap":360},[364,1050,1051,1054,1056],{"class":366,"line":367},[364,1052,1053],{"class":370},"apiVersion",[364,1055,384],{"class":374},[364,1057,1058],{"class":641}," apps\u002Fv1\n",[364,1060,1061,1064,1066],{"class":366,"line":378},[364,1062,1063],{"class":370},"kind",[364,1065,384],{"class":374},[364,1067,1068],{"class":641}," Deployment\n",[364,1070,1071,1074],{"class":366,"line":665},[364,1072,1073],{"class":370},"metadata",[364,1075,375],{"class":374},[364,1077,1078,1081,1083],{"class":366,"line":671},[364,1079,1080],{"class":370},"  name",[364,1082,384],{"class":374},[364,1084,1085],{"class":641}," nextjs\n",[364,1087,1088,1091],{"class":366,"line":1031},[364,1089,1090],{"class":370},"spec",[364,1092,375],{"class":374},[364,1094,1096,1099],{"class":366,"line":1095},6,[364,1097,1098],{"class":370},"  template",[364,1100,375],{"class":374},[364,1102,1104,1107],{"class":366,"line":1103},7,[364,1105,1106],{"class":370},"    spec",[364,1108,375],{"class":374},[364,1110,1112,1115],{"class":366,"line":1111},8,[364,1113,1114],{"class":370},"      containers",[364,1116,375],{"class":374},[364,1118,1120,1123,1126,1128],{"class":366,"line":1119},9,[364,1121,1122],{"class":1034},"        -",[364,1124,1125],{"class":370}," name",[364,1127,384],{"class":374},[364,1129,1085],{"class":641},[364,1131,1133,1136,1138],{"class":366,"line":1132},10,[364,1134,1135],{"class":370},"          image",[364,1137,384],{"class":374},[364,1139,1021],{"class":641},[364,1141,1143,1146],{"class":366,"line":1142},11,[364,1144,1145],{"class":370},"          securityContext",[364,1147,375],{"class":374},[364,1149,1151,1154],{"class":366,"line":1150},12,[364,1152,1153],{"class":370},"            capabilities",[364,1155,375],{"class":374},[364,1157,1159,1162],{"class":366,"line":1158},13,[364,1160,1161],{"class":370},"              drop",[364,1163,375],{"class":374},[364,1165,1167,1170],{"class":366,"line":1166},14,[364,1168,1169],{"class":1034},"                -",[364,1171,1038],{"class":641},[21,1173,1174],{},"Now imagine an attacker exploits the Next.js vulnerability discussed earlier and gains remote code execution inside the container.",[21,1176,1177],{},"The exploit itself still succeeds, but the compromised process cannot perform privileged kernel operations such as creating raw sockets, configuring network interfaces, loading kernel modules, mounting filesystems, or changing the system clock because those capabilities were never granted.",[21,1179,1180,1181,979,1183,1185,1186,1189,1190,822],{},"But what if the application listens on port 80? Some applications, such as an Nginx container running directly on the host or a standalone Docker deployment, listen on privileged ports like ",[47,1182,978],{},[47,1184,982],{},". Binding to ports below ",[47,1187,1188],{},"1024"," requires the ",[47,1191,548],{},[21,1193,1194],{},"In those cases, you can grant only that specific capability instead of Docker's entire default capability set.",[21,1196,1197],{},[71,1198,992],{},[355,1200,1202],{"className":357,"code":1201,"language":359,"meta":360,"style":360},"services:\n  nginx:\n    image: nginx:latest\n    cap_drop:\n      - ALL\n    cap_add:\n      - NET_BIND_SERVICE\n",[47,1203,1204,1210,1217,1226,1232,1238,1245],{"__ignoreMap":360},[364,1205,1206,1208],{"class":366,"line":367},[364,1207,1002],{"class":370},[364,1209,375],{"class":374},[364,1211,1212,1215],{"class":366,"line":378},[364,1213,1214],{"class":370},"  nginx",[364,1216,375],{"class":374},[364,1218,1219,1221,1223],{"class":366,"line":665},[364,1220,1016],{"class":370},[364,1222,384],{"class":374},[364,1224,1225],{"class":641}," nginx:latest\n",[364,1227,1228,1230],{"class":366,"line":671},[364,1229,1026],{"class":370},[364,1231,375],{"class":374},[364,1233,1234,1236],{"class":366,"line":1031},[364,1235,1035],{"class":1034},[364,1237,1038],{"class":641},[364,1239,1240,1243],{"class":366,"line":1095},[364,1241,1242],{"class":370},"    cap_add",[364,1244,375],{"class":374},[364,1246,1247,1249],{"class":366,"line":1103},[364,1248,1035],{"class":1034},[364,1250,1251],{"class":641}," NET_BIND_SERVICE\n",[21,1253,1254],{},[71,1255,1043],{},[355,1257,1259],{"className":357,"code":1258,"language":359,"meta":360,"style":360},"apiVersion: apps\u002Fv1\nkind: Deployment\nmetadata:\n  name: nginx\nspec:\n  template:\n    spec:\n      containers:\n        - name: nginx\n          image: nginx:latest\n          securityContext:\n            capabilities:\n              drop:\n                - ALL\n              add:\n                - NET_BIND_SERVICE\n",[47,1260,1261,1269,1277,1283,1291,1297,1303,1309,1315,1325,1333,1339,1345,1351,1357,1365],{"__ignoreMap":360},[364,1262,1263,1265,1267],{"class":366,"line":367},[364,1264,1053],{"class":370},[364,1266,384],{"class":374},[364,1268,1058],{"class":641},[364,1270,1271,1273,1275],{"class":366,"line":378},[364,1272,1063],{"class":370},[364,1274,384],{"class":374},[364,1276,1068],{"class":641},[364,1278,1279,1281],{"class":366,"line":665},[364,1280,1073],{"class":370},[364,1282,375],{"class":374},[364,1284,1285,1287,1289],{"class":366,"line":671},[364,1286,1080],{"class":370},[364,1288,384],{"class":374},[364,1290,757],{"class":641},[364,1292,1293,1295],{"class":366,"line":1031},[364,1294,1090],{"class":370},[364,1296,375],{"class":374},[364,1298,1299,1301],{"class":366,"line":1095},[364,1300,1098],{"class":370},[364,1302,375],{"class":374},[364,1304,1305,1307],{"class":366,"line":1103},[364,1306,1106],{"class":370},[364,1308,375],{"class":374},[364,1310,1311,1313],{"class":366,"line":1111},[364,1312,1114],{"class":370},[364,1314,375],{"class":374},[364,1316,1317,1319,1321,1323],{"class":366,"line":1119},[364,1318,1122],{"class":1034},[364,1320,1125],{"class":370},[364,1322,384],{"class":374},[364,1324,757],{"class":641},[364,1326,1327,1329,1331],{"class":366,"line":1132},[364,1328,1135],{"class":370},[364,1330,384],{"class":374},[364,1332,1225],{"class":641},[364,1334,1335,1337],{"class":366,"line":1142},[364,1336,1145],{"class":370},[364,1338,375],{"class":374},[364,1340,1341,1343],{"class":366,"line":1150},[364,1342,1153],{"class":370},[364,1344,375],{"class":374},[364,1346,1347,1349],{"class":366,"line":1158},[364,1348,1161],{"class":370},[364,1350,375],{"class":374},[364,1352,1353,1355],{"class":366,"line":1166},[364,1354,1169],{"class":1034},[364,1356,1038],{"class":641},[364,1358,1360,1363],{"class":366,"line":1359},15,[364,1361,1362],{"class":370},"              add",[364,1364,375],{"class":374},[364,1366,1368,1370],{"class":366,"line":1367},16,[364,1369,1169],{"class":1034},[364,1371,1251],{"class":641},[21,1373,1374],{},"Whether you're deploying a Next.js application, Nginx, or any other workload, the goal is the same: grant only the capabilities the application actually needs. If it doesn't require privileged kernel operations, don't grant them. If it needs exactly one capability, grant exactly one capability, and nothing more.",[16,1376,1378],{"id":1377},"read-only-filesystems","Read-Only Filesystems",[21,1380,1381],{},"By default, a container's root filesystem is writable. That means any process running inside the container, including one controlled by an attacker, can create, modify, or delete files anywhere the filesystem permissions allow.",[21,1383,1384],{},"For many applications, this level of write access simply isn't necessary. A web server typically reads its application code, serves requests, and writes temporary data such as logs or cache files. It rarely needs to modify its own binaries or application source code.",[21,1386,1387],{},"Docker allows you to enforce this assumption by mounting the container's root filesystem as read-only.",[21,1389,1390],{},[71,1391,1392],{},"Docker:",[355,1394,1396],{"className":628,"code":1395,"language":630,"meta":360,"style":360},"docker run --read-only nginx\n",[47,1397,1398],{"__ignoreMap":360},[364,1399,1400,1402,1404,1407],{"class":366,"line":367},[364,1401,638],{"class":637},[364,1403,642],{"class":641},[364,1405,1406],{"class":641}," --read-only",[364,1408,757],{"class":641},[21,1410,1411],{},[71,1412,992],{},[355,1414,1416],{"className":357,"code":1415,"language":359,"meta":360,"style":360},"services:\n  app:\n    image: my-app:latest\n    read_only: true\n",[47,1417,1418,1424,1431,1440],{"__ignoreMap":360},[364,1419,1420,1422],{"class":366,"line":367},[364,1421,1002],{"class":370},[364,1423,375],{"class":374},[364,1425,1426,1429],{"class":366,"line":378},[364,1427,1428],{"class":370},"  app",[364,1430,375],{"class":374},[364,1432,1433,1435,1437],{"class":366,"line":665},[364,1434,1016],{"class":370},[364,1436,384],{"class":374},[364,1438,1439],{"class":641}," my-app:latest\n",[364,1441,1442,1445,1447],{"class":366,"line":671},[364,1443,1444],{"class":370},"    read_only",[364,1446,384],{"class":374},[364,1448,388],{"class":387},[21,1450,1451],{},[71,1452,1043],{},[355,1454,1456],{"className":357,"code":1455,"language":359,"meta":360,"style":360},"securityContext:\n  readOnlyRootFilesystem: true\n",[47,1457,1458,1464],{"__ignoreMap":360},[364,1459,1460,1462],{"class":366,"line":367},[364,1461,371],{"class":370},[364,1463,375],{"class":374},[364,1465,1466,1469,1471],{"class":366,"line":378},[364,1467,1468],{"class":370},"  readOnlyRootFilesystem",[364,1470,384],{"class":374},[364,1472,388],{"class":387},[21,1474,1475],{},"With a read-only root filesystem, attempts to modify the container's image layers fail, even if the process has sufficient file permissions. This transforms the container into an immutable runtime environment where application files cannot be altered after startup.",[140,1477,1479],{"id":1478},"why-would-you-want-a-read-only-filesystem","Why Would You Want a Read-Only Filesystem?",[21,1481,1482],{},"Once an attacker gains code execution inside a container, one of the first things they'll try is establishing persistence.",[21,1484,1485],{},"For example, they might attempt to:",[28,1487,1488,1491,1494,1497,1500,1503],{},[31,1489,1490],{},"Replace the application with a modified version.",[31,1492,1493],{},"Install a web shell.",[31,1495,1496],{},"Download additional malware.",[31,1498,1499],{},"Modify startup scripts.",[31,1501,1502],{},"Replace system utilities with trojanized versions.",[31,1504,1505],{},"Leave behind backdoors for future access.",[21,1507,1508],{},"With a writable filesystem, all of these become possible if filesystem permissions allow it. With a read-only root filesystem, these operations immediately fail. The attacker may still execute commands within the compromised process, but they cannot permanently alter the container's image or install persistent malware inside it.",[140,1510,1512,1513],{"id":1511},"temporary-writable-storage-with-tmpfs","Temporary Writable Storage with ",[47,1514,1515],{},"tmpfs",[21,1517,1518],{},"Of course, very few applications are completely read-only. Most need somewhere to write temporary files.",[21,1520,1521],{},"Examples include:",[28,1523,1524,1529,1532,1535,1538],{},[31,1525,1526],{},[47,1527,1528],{},"\u002Ftmp",[31,1530,1531],{},"Runtime sockets",[31,1533,1534],{},"PID files",[31,1536,1537],{},"Temporary uploads",[31,1539,1540],{},"Application caches",[21,1542,1543,1544,50],{},"Instead of making the entire filesystem writable, Docker allows these locations to be backed by an in-memory filesystem using ",[47,1545,1515],{},[21,1547,1548],{},[71,1549,1392],{},[355,1551,1553],{"className":628,"code":1552,"language":630,"meta":360,"style":360},"docker run --read-only --tmpfs \u002Ftmp my-nextjs-app:latest\n",[47,1554,1555],{"__ignoreMap":360},[364,1556,1557,1559,1561,1563,1566,1569],{"class":366,"line":367},[364,1558,638],{"class":637},[364,1560,642],{"class":641},[364,1562,1406],{"class":641},[364,1564,1565],{"class":641}," --tmpfs",[364,1567,1568],{"class":641}," \u002Ftmp",[364,1570,1021],{"class":641},[21,1572,1573],{},[71,1574,992],{},[355,1576,1578],{"className":357,"code":1577,"language":359,"meta":360,"style":360},"services:\n  nextjs:\n    image: my-nextjs-app:latest\n    read_only: true\n    tmpfs:\n      - \u002Ftmp\n",[47,1579,1580,1586,1592,1600,1608,1615],{"__ignoreMap":360},[364,1581,1582,1584],{"class":366,"line":367},[364,1583,1002],{"class":370},[364,1585,375],{"class":374},[364,1587,1588,1590],{"class":366,"line":378},[364,1589,1009],{"class":370},[364,1591,375],{"class":374},[364,1593,1594,1596,1598],{"class":366,"line":665},[364,1595,1016],{"class":370},[364,1597,384],{"class":374},[364,1599,1021],{"class":641},[364,1601,1602,1604,1606],{"class":366,"line":671},[364,1603,1444],{"class":370},[364,1605,384],{"class":374},[364,1607,388],{"class":387},[364,1609,1610,1613],{"class":366,"line":1031},[364,1611,1612],{"class":370},"    tmpfs",[364,1614,375],{"class":374},[364,1616,1617,1619],{"class":366,"line":1095},[364,1618,1035],{"class":1034},[364,1620,1621],{"class":641}," \u002Ftmp\n",[21,1623,1624],{},[71,1625,1043],{},[355,1627,1629],{"className":357,"code":1628,"language":359,"meta":360,"style":360},"volumes:\n  - name: tmp\n    emptyDir:\n      medium: Memory\n\ncontainers:\n  - name: nextjs\n    volumeMounts:\n      - name: tmp\n        mountPath: \u002Ftmp\n",[47,1630,1631,1638,1650,1657,1667,1673,1680,1690,1697,1707],{"__ignoreMap":360},[364,1632,1633,1636],{"class":366,"line":367},[364,1634,1635],{"class":370},"volumes",[364,1637,375],{"class":374},[364,1639,1640,1643,1645,1647],{"class":366,"line":378},[364,1641,1642],{"class":1034},"  -",[364,1644,1125],{"class":370},[364,1646,384],{"class":374},[364,1648,1649],{"class":641}," tmp\n",[364,1651,1652,1655],{"class":366,"line":665},[364,1653,1654],{"class":370},"    emptyDir",[364,1656,375],{"class":374},[364,1658,1659,1662,1664],{"class":366,"line":671},[364,1660,1661],{"class":370},"      medium",[364,1663,384],{"class":374},[364,1665,1666],{"class":641}," Memory\n",[364,1668,1669],{"class":366,"line":1031},[364,1670,1672],{"emptyLinePlaceholder":1671},true,"\n",[364,1674,1675,1678],{"class":366,"line":1095},[364,1676,1677],{"class":370},"containers",[364,1679,375],{"class":374},[364,1681,1682,1684,1686,1688],{"class":366,"line":1103},[364,1683,1642],{"class":1034},[364,1685,1125],{"class":370},[364,1687,384],{"class":374},[364,1689,1085],{"class":641},[364,1691,1692,1695],{"class":366,"line":1111},[364,1693,1694],{"class":370},"    volumeMounts",[364,1696,375],{"class":374},[364,1698,1699,1701,1703,1705],{"class":366,"line":1119},[364,1700,1035],{"class":1034},[364,1702,1125],{"class":370},[364,1704,384],{"class":374},[364,1706,1649],{"class":641},[364,1708,1709,1712,1714],{"class":366,"line":1132},[364,1710,1711],{"class":370},"        mountPath",[364,1713,384],{"class":374},[364,1715,1621],{"class":641},[21,1717,1718,1719,1721],{},"Unlike the root filesystem, a ",[47,1720,1515],{}," mount exists entirely in memory. Anything written there disappears when the container stops or restarts. This gives applications a place for temporary data without allowing permanent modifications to the container.",[140,1723,1725],{"id":1724},"writable-paths-should-be-explicit","Writable Paths Should Be Explicit",[21,1727,1728],{},"One of the biggest advantages of enabling a read-only filesystem is that it forces you to think about where your application actually needs write access. Rather than allowing writes everywhere, you explicitly define the few locations that must remain writable.",[21,1730,1731],{},"For example, an application might legitimately require:",[28,1733,1734,1739,1745],{},[31,1735,1736,1738],{},[47,1737,1528],{}," for temporary files.",[31,1740,1741,1744],{},[47,1742,1743],{},"\u002Fvar\u002Flog"," if logs are written to disk.",[31,1746,1747,1750],{},[47,1748,1749],{},"\u002Fuploads"," for user-uploaded content.",[21,1752,1753],{},"Everything else can remain immutable. This significantly reduces the opportunities for attackers to modify the application or establish persistence.",[140,1755,1757],{"id":1756},"malware-mitigation","Malware Mitigation",[21,1759,1760],{},"It's important to understand what a read-only filesystem does, and what it doesn't. It does not prevent an attacker from exploiting a vulnerability. It does not stop them from executing arbitrary code. Instead, it prevents many common post-exploitation techniques.",[21,1762,1763],{},"For example, an attacker can no longer:",[28,1765,1766,1769,1772,1775,1778],{},[31,1767,1768],{},"Replace application binaries.",[31,1770,1771],{},"Download malware into the container's filesystem.",[31,1773,1774],{},"Modify configuration files.",[31,1776,1777],{},"Install cron jobs or startup scripts.",[31,1779,1780],{},"Leave persistent backdoors inside the container image.",[140,1782,1784],{"id":1783},"read-only-containers-and-immutable-infrastructure","Read-Only Containers and Immutable Infrastructure",[21,1786,1787],{},"The idea of a read-only filesystem aligns with a broader infrastructure principle known as immutable infrastructure. In an immutable system, running workloads are never modified in place. If an application needs to be updated, you don't SSH into the container and edit files, you build a new image and deploy a new container.",[21,1789,1790],{},"Likewise, if a container becomes compromised, you don't attempt to clean or repair it. You destroy it and replace it with a fresh instance built from a trusted image.",[21,1792,1793],{},"This approach makes deployments more predictable, simplifies incident response, and eliminates an entire class of configuration drift problems.",[21,1795,1796],{},"A read-only root filesystem naturally reinforces this philosophy by ensuring that the running container remains identical to the image that was originally deployed.",[16,1798,1800,1801],{"id":1799},"preventing-privilege-escalation-with-no-new-privileges","Preventing Privilege Escalation with ",[47,1802,1803],{},"no-new-privileges",[21,1805,1806],{},"So far, we've focused on reducing the privileges that a container starts with. But what if a process tries to gain additional privileges after it has already started?",[21,1808,1809,1810,1813],{},"On a traditional Linux system, there are several mechanisms that allow a process to elevate its privileges during execution. The most common are ",[47,1811,1812],{},"setuid"," binaries and file capabilities.",[21,1815,1816],{},"To prevent this class of attacks, the Linux kernel provides a feature called No New Privileges (NNP).",[21,1818,1819],{},[71,1820,1392],{},[355,1822,1824],{"className":628,"code":1823,"language":630,"meta":360,"style":360},"docker run --security-opt no-new-privileges:true my-app:latest\n",[47,1825,1826],{"__ignoreMap":360},[364,1827,1828,1830,1832,1835,1838,1841],{"class":366,"line":367},[364,1829,638],{"class":637},[364,1831,642],{"class":641},[364,1833,1834],{"class":641}," --security-opt",[364,1836,1837],{"class":641}," no-new-privileges:",[364,1839,1840],{"class":387},"true",[364,1842,1439],{"class":641},[21,1844,1845],{},[71,1846,992],{},[355,1848,1850],{"className":357,"code":1849,"language":359,"meta":360,"style":360},"services:\n  app:\n    image: my-app:latest\n    security_opt:\n      - no-new-privileges:true\n",[47,1851,1852,1858,1864,1872,1879],{"__ignoreMap":360},[364,1853,1854,1856],{"class":366,"line":367},[364,1855,1002],{"class":370},[364,1857,375],{"class":374},[364,1859,1860,1862],{"class":366,"line":378},[364,1861,1428],{"class":370},[364,1863,375],{"class":374},[364,1865,1866,1868,1870],{"class":366,"line":665},[364,1867,1016],{"class":370},[364,1869,384],{"class":374},[364,1871,1439],{"class":641},[364,1873,1874,1877],{"class":366,"line":671},[364,1875,1876],{"class":370},"    security_opt",[364,1878,375],{"class":374},[364,1880,1881,1883],{"class":366,"line":1031},[364,1882,1035],{"class":1034},[364,1884,1885],{"class":641}," no-new-privileges:true\n",[21,1887,1888],{},[71,1889,1043],{},[355,1891,1893],{"className":357,"code":1892,"language":359,"meta":360,"style":360},"securityContext:\n  allowPrivilegeEscalation: false\n",[47,1894,1895,1901],{"__ignoreMap":360},[364,1896,1897,1899],{"class":366,"line":367},[364,1898,371],{"class":370},[364,1900,375],{"class":374},[364,1902,1903,1906,1908],{"class":366,"line":378},[364,1904,1905],{"class":370},"  allowPrivilegeEscalation",[364,1907,384],{"class":374},[364,1909,1910],{"class":387}," false\n",[21,1912,1913,1914,1916],{},"Once enabled, the kernel guarantees that a process cannot gain privileges that it didn't already possess, regardless of what executable it launches. This makes ",[47,1915,1803],{}," one of the simplest yet most effective hardening options available.",[140,1918,1920,1921],{"id":1919},"understanding-setuid","Understanding ",[47,1922,1812],{},[21,1924,1925,1926,1928,1929,1931],{},"Linux files can have a special permission known as the ",[47,1927,1812],{}," bit. Normally, a program runs with the privileges of the user executing it. A ",[47,1930,1812],{}," program is different, it executes with the privileges of the file owner instead.",[21,1933,1934,1935,1938,1939,1942,1943,1945,1946,1948],{},"For example, the ",[47,1936,1937],{},"passwd"," utility needs to modify ",[47,1940,1941],{},"\u002Fetc\u002Fshadow",", a file writable only by ",[47,1944,319],{},". Rather than requiring every user to become root, Linux marks the binary as ",[47,1947,1812],{},", allowing it to temporarily execute with root privileges.",[21,1950,1951,1952,1954],{},"This mechanism is incredibly useful, but it also creates an opportunity for privilege escalation. If an attacker can execute a vulnerable ",[47,1953,1812],{}," binary, they may be able to obtain privileges they didn't previously have.",[21,1956,1957,1958,1960,1961,1963],{},"With ",[47,1959,1803],{}," enabled, the kernel ignores the ",[47,1962,1812],{}," bit during execution. The program still runs, but it does not inherit elevated privileges.",[140,1965,1967,1968,1971],{"id":1966},"file-capabilities-setcap","File Capabilities (",[47,1969,1970],{},"setcap",")",[21,1973,1974,1975,1977],{},"Linux capabilities don't have to be assigned only to running processes. They can also be attached directly to executable files using the ",[47,1976,1970],{}," utility.",[21,1979,527],{},[355,1981,1983],{"className":628,"code":1982,"language":630,"meta":360,"style":360},"setcap cap_net_bind_service=+ep \u002Fusr\u002Flocal\u002Fbin\u002Fmy-server\n",[47,1984,1985],{"__ignoreMap":360},[364,1986,1987,1989,1992],{"class":366,"line":367},[364,1988,1970],{"class":637},[364,1990,1991],{"class":641}," cap_net_bind_service=+ep",[364,1993,1994],{"class":641}," \u002Fusr\u002Flocal\u002Fbin\u002Fmy-server\n",[21,1996,1997],{},"This allows the executable to bind to privileged ports without running as root. Under normal circumstances, executing this binary grants the process the specified capability.",[21,1999,2000,2001,2003],{},"However, when ",[47,2002,1803],{}," is enabled, those additional capabilities are not acquired, preventing privilege escalation through file capabilities as well.",[140,2005,2007,2008],{"id":2006},"the-role-of-execve","The Role of ",[47,2009,2010],{},"execve()",[21,2012,2013,2014,2016,2017,2019],{},"Both ",[47,2015,1812],{}," and file capabilities take effect during the ",[47,2018,2010],{}," system call. Every time a Linux process launches another program, the kernel evaluates whether the newly executed binary should receive additional privileges. Normally, this is where privilege elevation occurs.",[21,2021,1957,2022,2024],{},[47,2023,1803],{},", the kernel changes the rules:",[66,2026,2027],{},[21,2028,2029,2030,2032],{},"No process may gain more privileges through ",[47,2031,2010],{}," than it already possessed.",[21,2034,2035],{},"The process can execute another program, but it cannot become more privileged than it was before.",[140,2037,2039],{"id":2038},"example-mitigating-setuid-based-privilege-escalation","Example: Mitigating Setuid-Based Privilege Escalation",[21,2041,2042,2043,2045,2046,2051],{},"A real-world example of why ",[47,2044,1803],{}," exists is the PwnKit vulnerability (",[58,2047,2050],{"href":2048,"rel":2049},"https:\u002F\u002Fnvd.nist.gov\u002Fvuln\u002Fdetail\u002FCVE-2021-4034",[62],"CVE-2021-4034","), which was disclosed in 2022.",[21,2053,2054,2055,2058,2059,2061,2062,2064],{},"The vulnerability affected ",[47,2056,2057],{},"pkexec",", a setuid-root utility installed by default on many Linux distributions. Because ",[47,2060,2057],{}," executes with the privileges of its owner (",[47,2063,319],{},"), a flaw in its implementation allowed an unprivileged local user to obtain a root shell.",[21,2066,2067,2068,2070],{},"Imagine our vulnerable Next.js application has been compromised, giving an attacker command execution inside the container. During enumeration, they discover that a vulnerable ",[47,2069,2057],{}," binary is present and attempt to exploit it.",[21,2072,2073,2074,2076,2077,2079,2080,2082,2083,2085,2086,2088],{},"Without ",[47,2075,1803],{},", the kernel honors the ",[47,2078,1812],{}," bit during the ",[47,2081,2010],{}," call. If the exploit succeeds, the attacker gains a shell running as ",[47,2084,319],{},". With ",[47,2087,1803],{}," enabled, the outcome is different.",[21,2090,2091,2092,2094,2095,2097],{},"Although the attacker can still execute ",[47,2093,2057],{},", the kernel refuses to grant the additional privileges associated with the ",[47,2096,1812],{}," bit. The process continues running with the attacker's existing privileges, preventing this particular privilege-escalation path from succeeding.",[21,2099,2100,2101,2103,2104,2106,2107,50],{},"It's important to note that ",[47,2102,1803],{}," is not a universal defense against every local privilege-escalation vulnerability. It specifically prevents new privileges from being acquired through ",[47,2105,1812],{}," executables and file capabilities during ",[47,2108,2010],{},[16,2110,2112],{"id":2111},"seccomp-restricting-system-calls","Seccomp: Restricting System Calls",[21,2114,2115],{},"Even after removing unnecessary capabilities and preventing privilege escalation, a compromised process can still invoke Linux system calls.",[21,2117,2118],{},"Every interaction between userspace and the Linux kernel ultimately happens through a system call, or syscall.",[21,2120,2121],{},"Reading a file. Opening a socket. Creating a process. Allocating memory. Eventually, every one of these operations becomes a syscall.",[21,2123,2124,2125,614,2128,614,2131,2134],{},"Seccomp allows us to control which syscalls a process is allowed to invoke. Modern Linux has hundreds of syscalls, ",[47,2126,2127],{},"mount()",[47,2129,2130],{},"bpf()",[47,2132,2133],{},"ptrace()",", powerful kernel interfaces that most applications never need.",[140,2136,2138],{"id":2137},"dockers-default-seccomp-profile","Docker's Default Seccomp Profile",[21,2140,2141],{},"By default, Docker applies a seccomp profile to every container.",[21,2143,2144],{},"Rather than allowing unrestricted access to the kernel, Docker blocks a number of high-risk syscalls that are rarely required by ordinary applications. Examples include operations related to kernel debugging, loading kernel modules, certain namespace operations, and legacy or dangerous kernel interfaces.",[21,2146,2147],{},"It's important to understand what this default profile is, and what it isn't. Docker's default seccomp profile is fairly permissive. It blocks syscalls that are clearly dangerous or almost never needed in containers, but it is not a strict whitelist. The majority of syscalls remain permitted. This is by design: a more restrictive default would break many legitimate workloads.",[21,2149,2150],{},"In high-security environments, the default profile should be seen as a starting point rather than a final configuration. Custom profiles that whitelist only the syscalls your application actually uses provide significantly stronger protection.",[140,2152,2154],{"id":2153},"dangerous-system-calls","Dangerous System Calls",[21,2156,2157],{},"Many historical Linux vulnerabilities have involved privileged or complex system calls. Some examples include:",[28,2159,2160,2165,2170,2175,2181],{},[31,2161,2162,2164],{},[47,2163,2133],{}," for debugging other processes.",[31,2166,2167,2169],{},[47,2168,2127],{}," for manipulating filesystems.",[31,2171,2172,2174],{},[47,2173,2130],{}," for interacting with the eBPF subsystem.",[31,2176,2177,2180],{},[47,2178,2179],{},"userfaultfd()"," which has been involved in multiple privilege escalation vulnerabilities.",[31,2182,2183],{},"Certain namespace-related syscalls.",[21,2185,2186],{},"These interfaces are incredibly powerful. They're also unnecessary for the overwhelming majority of web applications. Blocking them removes an entire class of potential post-exploitation techniques.",[140,2188,2190],{"id":2189},"custom-seccomp-profiles","Custom Seccomp Profiles",[21,2192,2193],{},"Docker's default seccomp profile is intentionally generic. It works well for most workloads, but high-security environments often go further by defining custom profiles tailored to a specific application.",[21,2195,2196],{},"For example, a Next.js application has very different syscall requirements than a VPN server or a container runtime.",[21,2198,2199],{},"A custom seccomp profile can whitelist only the syscalls the application actually uses while denying everything else.",[140,2201,2203],{"id":2202},"example-blocking-kernel-level-attacks","Example: Blocking Kernel-Level Attacks",[21,2205,2206,2207,2209,2210,2212],{},"A Python API server is compromised through a vulnerable dependency. The attacker gains code execution and tries to use ",[47,2208,2133],{}," to inject into other processes or ",[47,2211,2130],{}," to interact with the eBPF subsystem.",[21,2214,2215],{},"If those syscalls are blocked by the seccomp profile, the kernel immediately denies the request.",[21,2217,2218],{},"The attacker still has code execution, but they cannot freely access every kernel interface on the system.",[16,2220,2222],{"id":2221},"apparmor-and-selinux","AppArmor and SELinux",[21,2224,2225],{},"Thus far we've covered capabilities (which control privileged operations) and seccomp (which controls system calls). There's a third layer in this stack: Linux Security Modules, or LSMs.",[21,2227,2228],{},"AppArmor and SELinux are the two most widely deployed LSMs. They answer a different question than capabilities or seccomp:",[66,2230,2231],{},[21,2232,2233],{},"Even if a process has the right capability and the right syscall, what files, directories, network resources, and other objects may it actually access?",[21,2235,2236],{},"Where capabilities define what a process can do, and seccomp defines which kernel APIs it can call, LSMs define which objects it can touch.",[140,2238,2240],{"id":2239},"what-lsms-do","What LSMs Do",[21,2242,2243],{},"An LSM is a kernel framework that allows security policies to be enforced on every security-sensitive operation. Whenever a process attempts to open a file, bind to a socket, or access a directory, the LSM checks its policy before allowing or denying the operation.",[21,2245,2246,2247,2250],{},"AppArmor uses path-based policies. You write a profile that says things like \"this binary may read ",[47,2248,2249],{},"\u002Fetc\u002Fnginx\u002Fnginx.conf"," but may not write to it,\" or \"this binary may not create network sockets at all.\"",[21,2252,2253],{},"SELinux uses label-based policies. Every process and every object (file, socket, device, etc.) gets a security label, and the policy defines which labeled processes may access which labeled objects. SELinux is more powerful and more complex, which is why it's more common in government and high-security environments than in general-purpose container deployments.",[140,2255,2257],{"id":2256},"how-docker-uses-lsms","How Docker Uses LSMs",[21,2259,2260],{},"When you run a container, Docker can attach an AppArmor profile or SELinux context to the container's processes. This provides an additional layer of access control beyond what capabilities and seccomp provide.",[21,2262,2263],{},"Docker ships with a default AppArmor profile for containers that restricts access to sensitive host paths and system resources. It's applied automatically if AppArmor is loaded on the host.",[21,2265,2266,2267,2270],{},"SELinux support in Docker is available but requires the host to be running SELinux (most common on RHEL\u002FCentOS\u002FFedora systems) and the ",[47,2268,2269],{},"selinux-enabled"," flag to be configured in the Docker daemon.",[140,2272,2274],{"id":2273},"how-capabilities-seccomp-and-lsms-work-together","How Capabilities, Seccomp, and LSMs Work Together",[21,2276,2277],{},"Each Linux security mechanism answers a different question. Understanding the distinction helps you think clearly about what each layer contributes:",[28,2279,2280,2286,2292],{},[31,2281,2282,2285],{},[71,2283,2284],{},"Capabilities"," answer: \"What privileged operations can I perform?\"",[31,2287,2288,2291],{},[71,2289,2290],{},"Seccomp"," answers: \"Which kernel APIs can I call?\"",[31,2293,2294,2297],{},[71,2295,2296],{},"LSMs"," answer: \"Even if I can call it, what objects may I access?\"",[21,2299,2300,2301,2303,2304,2307],{},"These layers are complementary. A process might have ",[47,2302,548],{}," (it can bind to privileged ports) and seccomp might allow the ",[47,2305,2306],{},"bind()"," syscall, but an AppArmor profile can still block it from binding to a specific port or network interface. Each mechanism constrains a different dimension of what the process can do.",[21,2309,2310],{},"None of these layers alone is sufficient. Together, they create a defense-in-depth posture where an attacker must bypass multiple independent restrictions to achieve their objectives.",[16,2312,2314],{"id":2313},"rootless-docker","Rootless Docker",[21,2316,2317],{},"Rootless Docker runs the Docker daemon and containers without root privileges on the host. It builds on user namespaces, the same mechanism discussed earlier, but goes further by also running the Docker daemon itself as an unprivileged user.",[21,2319,2320],{},"The key difference from standard user namespace remapping is scope. With user namespaces in standard Docker, only the container processes are remapped, while the Docker daemon still runs as root. With rootless mode, the entire Docker stack, daemon, containerd, and runc, runs without host root privileges.",[21,2322,2323,2324,2327],{},"Rootless Docker has some limitations. It cannot bind to ports below 1024 (though tools like ",[47,2325,2326],{},"authbind"," or redirectors can work around this), it has limited support for certain storage drivers, and it doesn't work with all network configurations.",[21,2329,2330],{},"For environments where an additional layer of host-level isolation is desired, rootless mode is a valuable option. For most production deployments, the combination of standard user namespaces with the other hardening measures discussed in this article provides substantial protection.",[16,2332,2334],{"id":2333},"docker-daemon-security","Docker Daemon Security",[21,2336,2337],{},"Docker's security model isn't limited to containers. The Docker daemon itself is a critical security boundary.",[140,2339,2341,2342,2344],{"id":2340},"the-docker-group-is-root","The ",[47,2343,638],{}," Group Is Root",[21,2346,2347,2348,2350,2351,2354,2355,2357],{},"On systems where Docker is installed, users can be added to the ",[47,2349,638],{}," group to run Docker commands without ",[47,2352,2353],{},"sudo",". This is convenient, but it comes with a significant security implication: membership in the ",[47,2356,638],{}," group is effectively equivalent to root access on the host.",[21,2359,2360,2361,2363],{},"The reason is straightforward. A user in the ",[47,2362,638],{}," group can:",[28,2365,2366,2371,2374,2377],{},[31,2367,2368,2369,50],{},"Start containers with any capability, including ",[47,2370,49],{},[31,2372,2373],{},"Mount any host directory into a container with full read-write access.",[31,2375,2376],{},"Access the Docker API socket directly.",[31,2378,2379],{},"Modify Docker's configuration.",[21,2381,2382],{},"This means any process, containerized or not, that has access to the Docker socket effectively has root access to the host.",[140,2384,2386,2387,2390],{"id":2385},"mounting-varrundockersock-into-a-container","Mounting ",[47,2388,2389],{},"\u002Fvar\u002Frun\u002Fdocker.sock"," Into a Container",[21,2392,2393,2394,2396],{},"A common anti-pattern in Docker deployments is mounting the Docker socket (",[47,2395,2389],{},") into a container. This is often done to allow a container to manage other containers, for example, a CI\u002FCD agent or a monitoring tool.",[355,2398,2400],{"className":357,"code":2399,"language":359,"meta":360,"style":360},"services:\n  container-manager:\n    image: my-manager:latest\n    volumes:\n      - \u002Fvar\u002Frun\u002Fdocker.sock:\u002Fvar\u002Frun\u002Fdocker.sock\n",[47,2401,2402,2408,2415,2424,2431],{"__ignoreMap":360},[364,2403,2404,2406],{"class":366,"line":367},[364,2405,1002],{"class":370},[364,2407,375],{"class":374},[364,2409,2410,2413],{"class":366,"line":378},[364,2411,2412],{"class":370},"  container-manager",[364,2414,375],{"class":374},[364,2416,2417,2419,2421],{"class":366,"line":665},[364,2418,1016],{"class":370},[364,2420,384],{"class":374},[364,2422,2423],{"class":641}," my-manager:latest\n",[364,2425,2426,2429],{"class":366,"line":671},[364,2427,2428],{"class":370},"    volumes",[364,2430,375],{"class":374},[364,2432,2433,2435],{"class":366,"line":1031},[364,2434,1035],{"class":1034},[364,2436,2437],{"class":641}," \u002Fvar\u002Frun\u002Fdocker.sock:\u002Fvar\u002Frun\u002Fdocker.sock\n",[21,2439,2440,2441,2443],{},"Mounting the Docker socket into a container gives that container's processes the same privileges as a user in the ",[47,2442,638],{}," group. If an attacker compromises that container, they can start new containers, mount arbitrary host filesystems, and gain full host-level access, all without escaping the container.",[21,2445,2446],{},"If a workload requires Docker access, consider using the Docker API over TLS with client certificates, or use a security-aware proxy that exposes only the specific API operations the workload needs.",[21,2448,2449],{},"Beyond the socket, the Docker daemon itself should be secured. Enabling TLS for the Docker API prevents unauthenticated access, and audit logging helps detect suspicious API calls. Rootless mode and deeper daemon hardening will be covered in a follow-up post.",[16,2451,2453],{"id":2452},"resource-abuse","Resource Abuse",[21,2455,2456],{},"So far, we've focused on preventing an attacker from gaining additional privileges or modifying the system. However, not every attack is about privilege escalation.",[21,2458,2459],{},"Sometimes, an attacker simply wants to make your application unavailable.",[21,2461,2462],{},"Imagine our vulnerable Next.js application has been compromised. Instead of attempting to escape the container, the attacker executes an infinite loop, continuously allocates memory, or spawns thousands of child processes.",[21,2464,2465],{},"These attacks don't require elevated privileges, they simply abuse the resources available to the container.",[21,2467,2468],{},"To mitigate this class of attacks, Docker relies on cgroups (Control Groups), allowing administrators to place limits on CPU, memory, and process creation.",[140,2470,2472],{"id":2471},"memory-limits","Memory Limits",[21,2474,2475],{},"Without memory limits, a single compromised container can consume all available RAM on the host, potentially affecting every other workload.",[21,2477,2478],{},[71,2479,992],{},[355,2481,2483],{"className":357,"code":2482,"language":359,"meta":360,"style":360},"services:\n  nextjs:\n    image: my-nextjs-app\n    deploy:\n      resources:\n        limits:\n          memory: 512M\n",[47,2484,2485,2491,2497,2506,2513,2520,2527],{"__ignoreMap":360},[364,2486,2487,2489],{"class":366,"line":367},[364,2488,1002],{"class":370},[364,2490,375],{"class":374},[364,2492,2493,2495],{"class":366,"line":378},[364,2494,1009],{"class":370},[364,2496,375],{"class":374},[364,2498,2499,2501,2503],{"class":366,"line":665},[364,2500,1016],{"class":370},[364,2502,384],{"class":374},[364,2504,2505],{"class":641}," my-nextjs-app\n",[364,2507,2508,2511],{"class":366,"line":671},[364,2509,2510],{"class":370},"    deploy",[364,2512,375],{"class":374},[364,2514,2515,2518],{"class":366,"line":1031},[364,2516,2517],{"class":370},"      resources",[364,2519,375],{"class":374},[364,2521,2522,2525],{"class":366,"line":1095},[364,2523,2524],{"class":370},"        limits",[364,2526,375],{"class":374},[364,2528,2529,2532,2534],{"class":366,"line":1103},[364,2530,2531],{"class":370},"          memory",[364,2533,384],{"class":374},[364,2535,2536],{"class":641}," 512M\n",[21,2538,2539],{},[71,2540,1043],{},[355,2542,2544],{"className":357,"code":2543,"language":359,"meta":360,"style":360},"resources:\n  requests:\n    memory: \"256Mi\"\n  limits:\n    memory: \"512Mi\"\n",[47,2545,2546,2553,2560,2570,2577],{"__ignoreMap":360},[364,2547,2548,2551],{"class":366,"line":367},[364,2549,2550],{"class":370},"resources",[364,2552,375],{"class":374},[364,2554,2555,2558],{"class":366,"line":378},[364,2556,2557],{"class":370},"  requests",[364,2559,375],{"class":374},[364,2561,2562,2565,2567],{"class":366,"line":665},[364,2563,2564],{"class":370},"    memory",[364,2566,384],{"class":374},[364,2568,2569],{"class":641}," \"256Mi\"\n",[364,2571,2572,2575],{"class":366,"line":671},[364,2573,2574],{"class":370},"  limits",[364,2576,375],{"class":374},[364,2578,2579,2581,2583],{"class":366,"line":1031},[364,2580,2564],{"class":370},[364,2582,384],{"class":374},[364,2584,2585],{"class":641}," \"512Mi\"\n",[21,2587,2588],{},"If the process exceeds its configured limit, the Linux Out-Of-Memory (OOM) killer terminates it instead of allowing it to exhaust host memory.",[140,2590,2592,2593,1971],{"id":2591},"process-limits-pids_limit","Process Limits (",[47,2594,2595],{},"pids_limit",[21,2597,2598],{},"Another common denial-of-service technique is the fork bomb, where a process continuously creates child processes until the operating system can no longer create new ones.",[21,2600,2601],{},"Docker allows us to restrict how many processes a container may create.",[355,2603,2605],{"className":357,"code":2604,"language":359,"meta":360,"style":360},"services:\n  nextjs:\n    image: my-nextjs-app\n    pids_limit: 100\n",[47,2606,2607,2613,2619,2627],{"__ignoreMap":360},[364,2608,2609,2611],{"class":366,"line":367},[364,2610,1002],{"class":370},[364,2612,375],{"class":374},[364,2614,2615,2617],{"class":366,"line":378},[364,2616,1009],{"class":370},[364,2618,375],{"class":374},[364,2620,2621,2623,2625],{"class":366,"line":665},[364,2622,1016],{"class":370},[364,2624,384],{"class":374},[364,2626,2505],{"class":641},[364,2628,2629,2632,2634],{"class":366,"line":671},[364,2630,2631],{"class":370},"    pids_limit",[364,2633,384],{"class":374},[364,2635,2636],{"class":387}," 100\n",[21,2638,2639],{},"Even if an attacker gains code execution, they cannot create more processes than the configured limit.",[140,2641,2643],{"id":2642},"cpu-limits","CPU Limits",[21,2645,2646],{},"CPU exhaustion is another straightforward way to disrupt a service.",[21,2648,2649],{},"By configuring CPU limits, we ensure that a single compromised container cannot monopolize the host's processors.",[21,2651,2652],{},[71,2653,992],{},[355,2655,2657],{"className":357,"code":2656,"language":359,"meta":360,"style":360},"services:\n  nextjs:\n    image: my-nextjs-app\n    deploy:\n      resources:\n        limits:\n          cpus: \"1.0\"\n",[47,2658,2659,2665,2671,2679,2685,2691,2697],{"__ignoreMap":360},[364,2660,2661,2663],{"class":366,"line":367},[364,2662,1002],{"class":370},[364,2664,375],{"class":374},[364,2666,2667,2669],{"class":366,"line":378},[364,2668,1009],{"class":370},[364,2670,375],{"class":374},[364,2672,2673,2675,2677],{"class":366,"line":665},[364,2674,1016],{"class":370},[364,2676,384],{"class":374},[364,2678,2505],{"class":641},[364,2680,2681,2683],{"class":366,"line":671},[364,2682,2510],{"class":370},[364,2684,375],{"class":374},[364,2686,2687,2689],{"class":366,"line":1031},[364,2688,2517],{"class":370},[364,2690,375],{"class":374},[364,2692,2693,2695],{"class":366,"line":1095},[364,2694,2524],{"class":370},[364,2696,375],{"class":374},[364,2698,2699,2702,2704],{"class":366,"line":1103},[364,2700,2701],{"class":370},"          cpus",[364,2703,384],{"class":374},[364,2705,2706],{"class":641}," \"1.0\"\n",[21,2708,2709],{},[71,2710,1043],{},[355,2712,2714],{"className":357,"code":2713,"language":359,"meta":360,"style":360},"resources:\n  requests:\n    cpu: \"500m\"\n  limits:\n    cpu: \"1\"\n",[47,2715,2716,2722,2728,2738,2744],{"__ignoreMap":360},[364,2717,2718,2720],{"class":366,"line":367},[364,2719,2550],{"class":370},[364,2721,375],{"class":374},[364,2723,2724,2726],{"class":366,"line":378},[364,2725,2557],{"class":370},[364,2727,375],{"class":374},[364,2729,2730,2733,2735],{"class":366,"line":665},[364,2731,2732],{"class":370},"    cpu",[364,2734,384],{"class":374},[364,2736,2737],{"class":641}," \"500m\"\n",[364,2739,2740,2742],{"class":366,"line":671},[364,2741,2574],{"class":370},[364,2743,375],{"class":374},[364,2745,2746,2748,2750],{"class":366,"line":1031},[364,2747,2732],{"class":370},[364,2749,384],{"class":374},[364,2751,2752],{"class":641}," \"1\"\n",[21,2754,2755],{},"These limits don't prevent abuse, they contain it.",[16,2757,2759],{"id":2758},"device-access","Device Access",[21,2761,2762,2763,2766],{},"By default, Docker isolates containers from the host's hardware. This is important because, on Linux, many hardware resources are exposed as files under ",[47,2764,2765],{},"\u002Fdev",". Granting access to one of these devices often provides direct access to a kernel-managed interface, so device access should be granted deliberately rather than by default.",[21,2768,2769],{},"Common examples include:",[28,2771,2772,2775,2781,2784],{},[31,2773,2774],{},"NVIDIA GPUs for AI and machine learning inference.",[31,2776,2777,2780],{},[47,2778,2779],{},"\u002Fdev\u002Fnet\u002Ftun"," for VPN software such as WireGuard or OpenVPN.",[31,2782,2783],{},"Hardware Security Modules (HSMs) used for cryptographic key management.",[31,2785,2786],{},"USB or serial devices used in industrial and IoT deployments.",[21,2788,2789],{},"For example, a WireGuard container requires access to the TUN device:",[355,2791,2793],{"className":357,"code":2792,"language":359,"meta":360,"style":360},"services:\n  wireguard:\n    image: linuxserver\u002Fwireguard\n    devices:\n      - \u002Fdev\u002Fnet\u002Ftun:\u002Fdev\u002Fnet\u002Ftun\n    cap_add:\n      - NET_ADMIN\n",[47,2794,2795,2801,2808,2817,2824,2831,2837],{"__ignoreMap":360},[364,2796,2797,2799],{"class":366,"line":367},[364,2798,1002],{"class":370},[364,2800,375],{"class":374},[364,2802,2803,2806],{"class":366,"line":378},[364,2804,2805],{"class":370},"  wireguard",[364,2807,375],{"class":374},[364,2809,2810,2812,2814],{"class":366,"line":665},[364,2811,1016],{"class":370},[364,2813,384],{"class":374},[364,2815,2816],{"class":641}," linuxserver\u002Fwireguard\n",[364,2818,2819,2822],{"class":366,"line":671},[364,2820,2821],{"class":370},"    devices",[364,2823,375],{"class":374},[364,2825,2826,2828],{"class":366,"line":1031},[364,2827,1035],{"class":1034},[364,2829,2830],{"class":641}," \u002Fdev\u002Fnet\u002Ftun:\u002Fdev\u002Fnet\u002Ftun\n",[364,2832,2833,2835],{"class":366,"line":1095},[364,2834,1242],{"class":370},[364,2836,375],{"class":374},[364,2838,2839,2841],{"class":366,"line":1103},[364,2840,1035],{"class":1034},[364,2842,2843],{"class":641}," NET_ADMIN\n",[21,2845,2846,2847,2849,2850,2852],{},"Rather than exposing the entire ",[47,2848,2765],{}," hierarchy, or running the container as ",[47,2851,49],{},", grant only the specific devices your workload requires.",[21,2854,2855,2856],{},"Device access is a broad topic in its own right, and the exact devices needed vary significantly between workloads. The important takeaway isn't to memorize every possible device, but to follow the same principle we've applied throughout this article: ",[71,2857,2858],{},"expose only what the application actually needs, and nothing more.",[21,2860,2861,2862,614,2865,614,2868,2871,2872,2875,2876,50],{},"Docker supports many additional device-related features that are beyond the scope of this article, including device permissions (",[47,2863,2864],{},"r",[47,2866,2867],{},"w",[47,2869,2870],{},"m","), GPU support, the Container Device Interface (CDI), and device cgroup rules. If your workload requires more advanced device configuration, the official Docker documentation provides a comprehensive reference for the ",[47,2873,2874],{},"--device"," flag and related runtime options:\n",[58,2877,2878],{"href":2878,"rel":2879},"https:\u002F\u002Fdocs.docker.com\u002Freference\u002Fcli\u002Fdocker\u002Fcontainer\u002Frun\u002F#device",[62],[16,2881,2883],{"id":2882},"privileged-containers","Privileged Containers",[21,2885,2886],{},"By now we've covered several layers of Docker's security model:",[28,2888,2889,2892,2895,2898,2901,2904,2907,2910],{},[31,2890,2891],{},"Running as a non-root user.",[31,2893,2894],{},"Dropping unnecessary capabilities.",[31,2896,2897],{},"Using a read-only root filesystem.",[31,2899,2900],{},"Preventing privilege escalation.",[31,2902,2903],{},"Restricting system calls.",[31,2905,2906],{},"Limiting resources.",[31,2908,2909],{},"Exposing only required devices.",[31,2911,2912],{},"Configuring LSMs.",[21,2914,2341,2915,2917],{},[47,2916,49],{}," flag effectively bypasses many of these protections.",[140,2919,2921,2922,2924],{"id":2920},"what-privileged-actually-does","What ",[47,2923,49],{}," Actually Does",[21,2926,2927,2928,2930],{},"Running a container with ",[47,2929,49],{}," is much more than \"giving it more permissions.\" Docker grants the container nearly every Linux capability, provides broad access to host devices, relaxes device cgroup restrictions, and disables several of the runtime's default safety mechanisms.",[21,2932,2933],{},"The result is a container that behaves much more like a regular process running directly on the host.",[140,2935,2937],{"id":2936},"why-you-should-avoid-it","Why You Should Avoid It",[21,2939,2940],{},"A common troubleshooting pattern is:",[66,2942,2943,2946],{},[21,2944,2945],{},"The container doesn't have permission.",[21,2947,2948,2949,50],{},"Run it with ",[47,2950,49],{},[21,2952,2953],{},"While this often fixes the immediate problem, it also grants dozens of permissions the application may never need.",[21,2955,2956],{},"Instead, identify the specific requirement:",[28,2958,2959,2965,2970],{},[31,2960,2961,2962,2964],{},"Does the application need ",[47,2963,558],{},"?",[31,2966,2967,2968,2964],{},"Does it need access to ",[47,2969,2779],{},[31,2971,2972],{},"Does it need a single Linux capability?",[21,2974,2975,2976,2978],{},"Granting one permission is almost always preferable to granting every permission. As a rule of thumb, ",[47,2977,49],{}," should be reserved for specialized infrastructure software such as low-level container runtimes, debugging tools, or hardware management utilities, not ordinary web applications, APIs, or background workers.",[21,2980,2981,2982,2984],{},"If your production application requires ",[47,2983,49],{},", it's usually worth investigating why before accepting it as the solution.",[16,2986,2988],{"id":2987},"putting-it-all-together","Putting It All Together",[21,2990,2991],{},"Throughout this article, we've explored Docker's security features individually. In practice, however, these features aren't meant to be used in isolation, they complement one another.",[21,2993,2994],{},"Let's revisit the threat model from the beginning of this article.",[21,2996,2997],{},"An attacker exploits a vulnerability in our Next.js application and gains remote code execution inside the container.",[21,2999,3000],{},"At this point, every hardening measure we've discussed begins working together:",[28,3002,3003,3006,3009,3012,3015,3018,3021],{},[31,3004,3005],{},"The application runs as a non-root user.",[31,3007,3008],{},"All unnecessary Linux capabilities have been removed.",[31,3010,3011],{},"The root filesystem is read-only.",[31,3013,3014],{},"Temporary files are written only to tmpfs.",[31,3016,3017],{},"Privilege escalation is disabled.",[31,3019,3020],{},"The container is constrained by CPU, memory, and PID limits.",[31,3022,3023],{},"Only the minimum resources required by the application are exposed.",[21,3025,3026],{},"None of these measures prevents the initial exploit. Instead, they work together to reduce the attacker's options after a successful compromise.",[21,3028,3029],{},"The following examples show what this might look like for a production-ready Next.js application running behind an Nginx reverse proxy.",[140,3031,3032],{"id":638},[47,3033,8],{},[355,3035,3037],{"className":628,"code":3036,"language":630,"meta":360,"style":360},"docker network create web\n\ndocker run -d \\\n  --name nextjs \\\n  --network web \\\n  --user 1000:1000 \\\n  --read-only \\\n  --tmpfs \u002Ftmp \\\n  --cap-drop ALL \\\n  --security-opt no-new-privileges:true \\\n  --memory 512m \\\n  --cpus 1 \\\n  --pids-limit 100 \\\n  my-nextjs-app:latest\n\ndocker run -d \\\n  --name nginx \\\n  --network web \\\n  -p 80:80 \\\n  --read-only \\\n  --tmpfs \u002Fvar\u002Fcache\u002Fnginx \\\n  --tmpfs \u002Fvar\u002Frun \\\n  --cap-drop ALL \\\n  --cap-add NET_BIND_SERVICE \\\n  --security-opt no-new-privileges:true \\\n  nginx:latest\n",[47,3038,3039,3052,3056,3067,3077,3087,3097,3104,3113,3121,3132,3142,3152,3162,3167,3171,3181,3191,3200,3211,3218,3228,3238,3247,3256,3267],{"__ignoreMap":360},[364,3040,3041,3043,3046,3049],{"class":366,"line":367},[364,3042,638],{"class":637},[364,3044,3045],{"class":641}," network",[364,3047,3048],{"class":641}," create",[364,3050,3051],{"class":641}," web\n",[364,3053,3054],{"class":366,"line":378},[364,3055,1672],{"emptyLinePlaceholder":1671},[364,3057,3058,3060,3062,3065],{"class":366,"line":665},[364,3059,638],{"class":637},[364,3061,642],{"class":641},[364,3063,3064],{"class":641}," -d",[364,3066,840],{"class":839},[364,3068,3069,3072,3075],{"class":366,"line":671},[364,3070,3071],{"class":641},"  --name",[364,3073,3074],{"class":641}," nextjs",[364,3076,840],{"class":839},[364,3078,3079,3082,3085],{"class":366,"line":1031},[364,3080,3081],{"class":641},"  --network",[364,3083,3084],{"class":641}," web",[364,3086,840],{"class":839},[364,3088,3089,3092,3095],{"class":366,"line":1095},[364,3090,3091],{"class":641},"  --user",[364,3093,3094],{"class":641}," 1000:1000",[364,3096,840],{"class":839},[364,3098,3099,3102],{"class":366,"line":1103},[364,3100,3101],{"class":641},"  --read-only",[364,3103,840],{"class":839},[364,3105,3106,3109,3111],{"class":366,"line":1111},[364,3107,3108],{"class":641},"  --tmpfs",[364,3110,1568],{"class":641},[364,3112,840],{"class":839},[364,3114,3115,3117,3119],{"class":366,"line":1119},[364,3116,845],{"class":641},[364,3118,754],{"class":641},[364,3120,840],{"class":839},[364,3122,3123,3126,3128,3130],{"class":366,"line":1132},[364,3124,3125],{"class":641},"  --security-opt",[364,3127,1837],{"class":641},[364,3129,1840],{"class":387},[364,3131,840],{"class":839},[364,3133,3134,3137,3140],{"class":366,"line":1142},[364,3135,3136],{"class":641},"  --memory",[364,3138,3139],{"class":641}," 512m",[364,3141,840],{"class":839},[364,3143,3144,3147,3150],{"class":366,"line":1150},[364,3145,3146],{"class":641},"  --cpus",[364,3148,3149],{"class":387}," 1",[364,3151,840],{"class":839},[364,3153,3154,3157,3160],{"class":366,"line":1158},[364,3155,3156],{"class":641},"  --pids-limit",[364,3158,3159],{"class":387}," 100",[364,3161,840],{"class":839},[364,3163,3164],{"class":366,"line":1166},[364,3165,3166],{"class":641},"  my-nextjs-app:latest\n",[364,3168,3169],{"class":366,"line":1359},[364,3170,1672],{"emptyLinePlaceholder":1671},[364,3172,3173,3175,3177,3179],{"class":366,"line":1367},[364,3174,638],{"class":637},[364,3176,642],{"class":641},[364,3178,3064],{"class":641},[364,3180,840],{"class":839},[364,3182,3184,3186,3189],{"class":366,"line":3183},17,[364,3185,3071],{"class":641},[364,3187,3188],{"class":641}," nginx",[364,3190,840],{"class":839},[364,3192,3194,3196,3198],{"class":366,"line":3193},18,[364,3195,3081],{"class":641},[364,3197,3084],{"class":641},[364,3199,840],{"class":839},[364,3201,3203,3206,3209],{"class":366,"line":3202},19,[364,3204,3205],{"class":641},"  -p",[364,3207,3208],{"class":641}," 80:80",[364,3210,840],{"class":839},[364,3212,3214,3216],{"class":366,"line":3213},20,[364,3215,3101],{"class":641},[364,3217,840],{"class":839},[364,3219,3221,3223,3226],{"class":366,"line":3220},21,[364,3222,3108],{"class":641},[364,3224,3225],{"class":641}," \u002Fvar\u002Fcache\u002Fnginx",[364,3227,840],{"class":839},[364,3229,3231,3233,3236],{"class":366,"line":3230},22,[364,3232,3108],{"class":641},[364,3234,3235],{"class":641}," \u002Fvar\u002Frun",[364,3237,840],{"class":839},[364,3239,3241,3243,3245],{"class":366,"line":3240},23,[364,3242,845],{"class":641},[364,3244,754],{"class":641},[364,3246,840],{"class":839},[364,3248,3250,3252,3254],{"class":366,"line":3249},24,[364,3251,854],{"class":641},[364,3253,857],{"class":641},[364,3255,840],{"class":839},[364,3257,3259,3261,3263,3265],{"class":366,"line":3258},25,[364,3260,3125],{"class":641},[364,3262,1837],{"class":641},[364,3264,1840],{"class":387},[364,3266,840],{"class":839},[364,3268,3270],{"class":366,"line":3269},26,[364,3271,3272],{"class":641},"  nginx:latest\n",[140,3274,3276],{"id":3275},"docker-compose",[47,3277,3278],{},"Docker Compose",[355,3280,3282],{"className":357,"code":3281,"language":359,"meta":360,"style":360},"services:\n  nextjs:\n    image: my-nextjs-app:latest\n    user: \"1000:1000\"\n    read_only: true\n    tmpfs:\n      - \u002Ftmp\n    cap_drop:\n      - ALL\n    security_opt:\n      - no-new-privileges:true\n    pids_limit: 100\n    deploy:\n      resources:\n        limits:\n          cpus: \"1.0\"\n          memory: 512M\n\n  nginx:\n    image: nginx:latest\n    ports:\n      - \"80:80\"\n    read_only: true\n    tmpfs:\n      - \u002Fvar\u002Fcache\u002Fnginx\n      - \u002Fvar\u002Frun\n    cap_drop:\n      - ALL\n    cap_add:\n      - NET_BIND_SERVICE\n    security_opt:\n      - no-new-privileges:true\n",[47,3283,3284,3290,3296,3304,3314,3322,3328,3334,3340,3346,3352,3358,3366,3372,3378,3384,3392,3400,3404,3410,3418,3425,3432,3440,3446,3453,3460,3467,3474,3481,3488,3495],{"__ignoreMap":360},[364,3285,3286,3288],{"class":366,"line":367},[364,3287,1002],{"class":370},[364,3289,375],{"class":374},[364,3291,3292,3294],{"class":366,"line":378},[364,3293,1009],{"class":370},[364,3295,375],{"class":374},[364,3297,3298,3300,3302],{"class":366,"line":665},[364,3299,1016],{"class":370},[364,3301,384],{"class":374},[364,3303,1021],{"class":641},[364,3305,3306,3309,3311],{"class":366,"line":671},[364,3307,3308],{"class":370},"    user",[364,3310,384],{"class":374},[364,3312,3313],{"class":641}," \"1000:1000\"\n",[364,3315,3316,3318,3320],{"class":366,"line":1031},[364,3317,1444],{"class":370},[364,3319,384],{"class":374},[364,3321,388],{"class":387},[364,3323,3324,3326],{"class":366,"line":1095},[364,3325,1612],{"class":370},[364,3327,375],{"class":374},[364,3329,3330,3332],{"class":366,"line":1103},[364,3331,1035],{"class":1034},[364,3333,1621],{"class":641},[364,3335,3336,3338],{"class":366,"line":1111},[364,3337,1026],{"class":370},[364,3339,375],{"class":374},[364,3341,3342,3344],{"class":366,"line":1119},[364,3343,1035],{"class":1034},[364,3345,1038],{"class":641},[364,3347,3348,3350],{"class":366,"line":1132},[364,3349,1876],{"class":370},[364,3351,375],{"class":374},[364,3353,3354,3356],{"class":366,"line":1142},[364,3355,1035],{"class":1034},[364,3357,1885],{"class":641},[364,3359,3360,3362,3364],{"class":366,"line":1150},[364,3361,2631],{"class":370},[364,3363,384],{"class":374},[364,3365,2636],{"class":387},[364,3367,3368,3370],{"class":366,"line":1158},[364,3369,2510],{"class":370},[364,3371,375],{"class":374},[364,3373,3374,3376],{"class":366,"line":1166},[364,3375,2517],{"class":370},[364,3377,375],{"class":374},[364,3379,3380,3382],{"class":366,"line":1359},[364,3381,2524],{"class":370},[364,3383,375],{"class":374},[364,3385,3386,3388,3390],{"class":366,"line":1367},[364,3387,2701],{"class":370},[364,3389,384],{"class":374},[364,3391,2706],{"class":641},[364,3393,3394,3396,3398],{"class":366,"line":3183},[364,3395,2531],{"class":370},[364,3397,384],{"class":374},[364,3399,2536],{"class":641},[364,3401,3402],{"class":366,"line":3193},[364,3403,1672],{"emptyLinePlaceholder":1671},[364,3405,3406,3408],{"class":366,"line":3202},[364,3407,1214],{"class":370},[364,3409,375],{"class":374},[364,3411,3412,3414,3416],{"class":366,"line":3213},[364,3413,1016],{"class":370},[364,3415,384],{"class":374},[364,3417,1225],{"class":641},[364,3419,3420,3423],{"class":366,"line":3220},[364,3421,3422],{"class":370},"    ports",[364,3424,375],{"class":374},[364,3426,3427,3429],{"class":366,"line":3230},[364,3428,1035],{"class":1034},[364,3430,3431],{"class":641}," \"80:80\"\n",[364,3433,3434,3436,3438],{"class":366,"line":3240},[364,3435,1444],{"class":370},[364,3437,384],{"class":374},[364,3439,388],{"class":387},[364,3441,3442,3444],{"class":366,"line":3249},[364,3443,1612],{"class":370},[364,3445,375],{"class":374},[364,3447,3448,3450],{"class":366,"line":3258},[364,3449,1035],{"class":1034},[364,3451,3452],{"class":641}," \u002Fvar\u002Fcache\u002Fnginx\n",[364,3454,3455,3457],{"class":366,"line":3269},[364,3456,1035],{"class":1034},[364,3458,3459],{"class":641}," \u002Fvar\u002Frun\n",[364,3461,3463,3465],{"class":366,"line":3462},27,[364,3464,1026],{"class":370},[364,3466,375],{"class":374},[364,3468,3470,3472],{"class":366,"line":3469},28,[364,3471,1035],{"class":1034},[364,3473,1038],{"class":641},[364,3475,3477,3479],{"class":366,"line":3476},29,[364,3478,1242],{"class":370},[364,3480,375],{"class":374},[364,3482,3484,3486],{"class":366,"line":3483},30,[364,3485,1035],{"class":1034},[364,3487,1251],{"class":641},[364,3489,3491,3493],{"class":366,"line":3490},31,[364,3492,1876],{"class":370},[364,3494,375],{"class":374},[364,3496,3498,3500],{"class":366,"line":3497},32,[364,3499,1035],{"class":1034},[364,3501,1885],{"class":641},[140,3503,3505],{"id":3504},"kubernetes",[47,3506,3507],{},"Kubernetes",[355,3509,3511],{"className":357,"code":3510,"language":359,"meta":360,"style":360},"apiVersion: apps\u002Fv1\nkind: Deployment\nmetadata:\n  name: nextjs\nspec:\n  replicas: 1\n  selector:\n    matchLabels:\n      app: nextjs\n  template:\n    metadata:\n      labels:\n        app: nextjs\n    spec:\n      containers:\n        - name: nextjs\n          image: my-nextjs-app:latest\n          securityContext:\n            runAsNonRoot: true\n            allowPrivilegeEscalation: false\n            readOnlyRootFilesystem: true\n            capabilities:\n              drop:\n                - ALL\n          resources:\n            requests:\n              cpu: \"250m\"\n              memory: \"256Mi\"\n            limits:\n              cpu: \"1\"\n              memory: \"512Mi\"\n          volumeMounts:\n            - name: tmp\n              mountPath: \u002Ftmp\n      volumes:\n        - name: tmp\n          emptyDir:\n            medium: Memory\n",[47,3512,3513,3521,3529,3535,3543,3549,3559,3566,3573,3582,3588,3595,3602,3611,3617,3623,3633,3641,3647,3656,3665,3674,3680,3686,3692,3699,3706,3716,3725,3732,3740,3748,3755,3767,3777,3785,3796,3804],{"__ignoreMap":360},[364,3514,3515,3517,3519],{"class":366,"line":367},[364,3516,1053],{"class":370},[364,3518,384],{"class":374},[364,3520,1058],{"class":641},[364,3522,3523,3525,3527],{"class":366,"line":378},[364,3524,1063],{"class":370},[364,3526,384],{"class":374},[364,3528,1068],{"class":641},[364,3530,3531,3533],{"class":366,"line":665},[364,3532,1073],{"class":370},[364,3534,375],{"class":374},[364,3536,3537,3539,3541],{"class":366,"line":671},[364,3538,1080],{"class":370},[364,3540,384],{"class":374},[364,3542,1085],{"class":641},[364,3544,3545,3547],{"class":366,"line":1031},[364,3546,1090],{"class":370},[364,3548,375],{"class":374},[364,3550,3551,3554,3556],{"class":366,"line":1095},[364,3552,3553],{"class":370},"  replicas",[364,3555,384],{"class":374},[364,3557,3558],{"class":387}," 1\n",[364,3560,3561,3564],{"class":366,"line":1103},[364,3562,3563],{"class":370},"  selector",[364,3565,375],{"class":374},[364,3567,3568,3571],{"class":366,"line":1111},[364,3569,3570],{"class":370},"    matchLabels",[364,3572,375],{"class":374},[364,3574,3575,3578,3580],{"class":366,"line":1119},[364,3576,3577],{"class":370},"      app",[364,3579,384],{"class":374},[364,3581,1085],{"class":641},[364,3583,3584,3586],{"class":366,"line":1132},[364,3585,1098],{"class":370},[364,3587,375],{"class":374},[364,3589,3590,3593],{"class":366,"line":1142},[364,3591,3592],{"class":370},"    metadata",[364,3594,375],{"class":374},[364,3596,3597,3600],{"class":366,"line":1150},[364,3598,3599],{"class":370},"      labels",[364,3601,375],{"class":374},[364,3603,3604,3607,3609],{"class":366,"line":1158},[364,3605,3606],{"class":370},"        app",[364,3608,384],{"class":374},[364,3610,1085],{"class":641},[364,3612,3613,3615],{"class":366,"line":1166},[364,3614,1106],{"class":370},[364,3616,375],{"class":374},[364,3618,3619,3621],{"class":366,"line":1359},[364,3620,1114],{"class":370},[364,3622,375],{"class":374},[364,3624,3625,3627,3629,3631],{"class":366,"line":1367},[364,3626,1122],{"class":1034},[364,3628,1125],{"class":370},[364,3630,384],{"class":374},[364,3632,1085],{"class":641},[364,3634,3635,3637,3639],{"class":366,"line":3183},[364,3636,1135],{"class":370},[364,3638,384],{"class":374},[364,3640,1021],{"class":641},[364,3642,3643,3645],{"class":366,"line":3193},[364,3644,1145],{"class":370},[364,3646,375],{"class":374},[364,3648,3649,3652,3654],{"class":366,"line":3202},[364,3650,3651],{"class":370},"            runAsNonRoot",[364,3653,384],{"class":374},[364,3655,388],{"class":387},[364,3657,3658,3661,3663],{"class":366,"line":3213},[364,3659,3660],{"class":370},"            allowPrivilegeEscalation",[364,3662,384],{"class":374},[364,3664,1910],{"class":387},[364,3666,3667,3670,3672],{"class":366,"line":3220},[364,3668,3669],{"class":370},"            readOnlyRootFilesystem",[364,3671,384],{"class":374},[364,3673,388],{"class":387},[364,3675,3676,3678],{"class":366,"line":3230},[364,3677,1153],{"class":370},[364,3679,375],{"class":374},[364,3681,3682,3684],{"class":366,"line":3240},[364,3683,1161],{"class":370},[364,3685,375],{"class":374},[364,3687,3688,3690],{"class":366,"line":3249},[364,3689,1169],{"class":1034},[364,3691,1038],{"class":641},[364,3693,3694,3697],{"class":366,"line":3258},[364,3695,3696],{"class":370},"          resources",[364,3698,375],{"class":374},[364,3700,3701,3704],{"class":366,"line":3269},[364,3702,3703],{"class":370},"            requests",[364,3705,375],{"class":374},[364,3707,3708,3711,3713],{"class":366,"line":3462},[364,3709,3710],{"class":370},"              cpu",[364,3712,384],{"class":374},[364,3714,3715],{"class":641}," \"250m\"\n",[364,3717,3718,3721,3723],{"class":366,"line":3469},[364,3719,3720],{"class":370},"              memory",[364,3722,384],{"class":374},[364,3724,2569],{"class":641},[364,3726,3727,3730],{"class":366,"line":3476},[364,3728,3729],{"class":370},"            limits",[364,3731,375],{"class":374},[364,3733,3734,3736,3738],{"class":366,"line":3483},[364,3735,3710],{"class":370},[364,3737,384],{"class":374},[364,3739,2752],{"class":641},[364,3741,3742,3744,3746],{"class":366,"line":3490},[364,3743,3720],{"class":370},[364,3745,384],{"class":374},[364,3747,2585],{"class":641},[364,3749,3750,3753],{"class":366,"line":3497},[364,3751,3752],{"class":370},"          volumeMounts",[364,3754,375],{"class":374},[364,3756,3758,3761,3763,3765],{"class":366,"line":3757},33,[364,3759,3760],{"class":1034},"            -",[364,3762,1125],{"class":370},[364,3764,384],{"class":374},[364,3766,1649],{"class":641},[364,3768,3770,3773,3775],{"class":366,"line":3769},34,[364,3771,3772],{"class":370},"              mountPath",[364,3774,384],{"class":374},[364,3776,1621],{"class":641},[364,3778,3780,3783],{"class":366,"line":3779},35,[364,3781,3782],{"class":370},"      volumes",[364,3784,375],{"class":374},[364,3786,3788,3790,3792,3794],{"class":366,"line":3787},36,[364,3789,1122],{"class":1034},[364,3791,1125],{"class":370},[364,3793,384],{"class":374},[364,3795,1649],{"class":641},[364,3797,3799,3802],{"class":366,"line":3798},37,[364,3800,3801],{"class":370},"          emptyDir",[364,3803,375],{"class":374},[364,3805,3807,3810,3812],{"class":366,"line":3806},38,[364,3808,3809],{"class":370},"            medium",[364,3811,384],{"class":374},[364,3813,1666],{"class":641},[21,3815,3816],{},"These examples aren't intended to be copied verbatim into every production environment. Every workload has different requirements. Instead, they demonstrate the security mindset we've followed throughout this article: remove privileges you don't need, expose only the resources your application actually requires, and assume the application may eventually be compromised.",[16,3818,3820],{"id":3819},"conclusion","Conclusion",[21,3822,3823,3824,3826],{},"Docker security is defense in depth, not a silver bullet. Non-root users, capabilities, read-only filesystems, ",[47,3825,1803],{},", seccomp, LSMs, cgroups, and device restrictions each remove a slice of attack surface. Individually they're useful. Together they make post-exploitation significantly harder.",[21,3828,3829],{},"Docker won't prevent your application from being compromised. What it does is limit what an attacker can do after they get in, and that distinction matters. Every unnecessary permission, writable directory, or exposed device is an opportunity you didn't need to give them.",[21,3831,3832],{},"Security isn't about making compromise impossible. It's about ensuring that when it happens, the attacker has as few options as possible.",[3834,3835],"hr",{},[21,3837,3838],{},[3839,3840,3841],"em",{},"This post focused on Docker's runtime security mechanisms. If I have time, I might write follow-ups on supply chain security, secrets management, network policies, and runtime detection.",[3843,3844,3845],"style",{},"html pre.shiki code .si09J, html code.shiki .si09J{--shiki-light:#89B4FA;--shiki-default:#89B4FA;--shiki-dark:#89B4FA}html pre.shiki code .sG44b, html code.shiki .sG44b{--shiki-light:#94E2D5;--shiki-default:#94E2D5;--shiki-dark:#94E2D5}html pre.shiki code .srg_i, html code.shiki .srg_i{--shiki-light:#FAB387;--shiki-default:#FAB387;--shiki-dark:#FAB387}html .light .shiki span {color: var(--shiki-light);background: var(--shiki-light-bg);font-style: var(--shiki-light-font-style);font-weight: var(--shiki-light-font-weight);text-decoration: var(--shiki-light-text-decoration);}html.light .shiki span {color: var(--shiki-light);background: var(--shiki-light-bg);font-style: var(--shiki-light-font-style);font-weight: var(--shiki-light-font-weight);text-decoration: var(--shiki-light-text-decoration);}html .default .shiki span {color: var(--shiki-default);background: var(--shiki-default-bg);font-style: var(--shiki-default-font-style);font-weight: var(--shiki-default-font-weight);text-decoration: var(--shiki-default-text-decoration);}html .shiki span {color: var(--shiki-default);background: var(--shiki-default-bg);font-style: var(--shiki-default-font-style);font-weight: var(--shiki-default-font-weight);text-decoration: var(--shiki-default-text-decoration);}html .dark .shiki span {color: var(--shiki-dark);background: var(--shiki-dark-bg);font-style: var(--shiki-dark-font-style);font-weight: var(--shiki-dark-font-weight);text-decoration: var(--shiki-dark-text-decoration);}html.dark .shiki span {color: var(--shiki-dark);background: var(--shiki-dark-bg);font-style: var(--shiki-dark-font-style);font-weight: var(--shiki-dark-font-weight);text-decoration: var(--shiki-dark-text-decoration);}html pre.shiki code .seEE7, html code.shiki .seEE7{--shiki-light:#89B4FA;--shiki-light-font-style:italic;--shiki-default:#89B4FA;--shiki-default-font-style:italic;--shiki-dark:#89B4FA;--shiki-dark-font-style:italic}html pre.shiki code .swpoh, html code.shiki .swpoh{--shiki-light:#A6E3A1;--shiki-default:#A6E3A1;--shiki-dark:#A6E3A1}html pre.shiki code .seFKw, html code.shiki .seFKw{--shiki-light:#F5C2E7;--shiki-default:#F5C2E7;--shiki-dark:#F5C2E7}html pre.shiki code .sKAjW, html code.shiki .sKAjW{--shiki-light:#9399B2;--shiki-default:#9399B2;--shiki-dark:#9399B2}",{"title":360,"searchDepth":378,"depth":378,"links":3847},[3848,3849,3850,3852,3853,3854,3855,3857,3859,3861,3863,3864,3865,3867,3868,3869,3870,3872,3874,3876,3877,3878,3879,3880,3881,3882,3883,3884,3886,3888,3889,3891,3892,3894,3895,3896,3897],{"id":142,"depth":378,"text":143},{"id":309,"depth":378,"text":310},{"id":346,"depth":378,"text":3851},"Why Kubernetes Recommends runAsNonRoot",{"id":394,"depth":378,"text":395},{"id":497,"depth":378,"text":498},{"id":604,"depth":378,"text":605},{"id":697,"depth":378,"text":3856},"Dropping Capabilities with --cap-drop",{"id":812,"depth":378,"text":3858},"Adding Capabilities with --cap-add",{"id":872,"depth":378,"text":3860},"Why CAP_SYS_ADMIN Is Considered \"The New Root\"",{"id":930,"depth":378,"text":3862},"CAP_NET_ADMIN: More Powerful Than It Sounds",{"id":971,"depth":378,"text":972},{"id":1478,"depth":378,"text":1479},{"id":1511,"depth":378,"text":3866},"Temporary Writable Storage with tmpfs",{"id":1724,"depth":378,"text":1725},{"id":1756,"depth":378,"text":1757},{"id":1783,"depth":378,"text":1784},{"id":1919,"depth":378,"text":3871},"Understanding setuid",{"id":1966,"depth":378,"text":3873},"File Capabilities (setcap)",{"id":2006,"depth":378,"text":3875},"The Role of execve()",{"id":2038,"depth":378,"text":2039},{"id":2137,"depth":378,"text":2138},{"id":2153,"depth":378,"text":2154},{"id":2189,"depth":378,"text":2190},{"id":2202,"depth":378,"text":2203},{"id":2239,"depth":378,"text":2240},{"id":2256,"depth":378,"text":2257},{"id":2273,"depth":378,"text":2274},{"id":2340,"depth":378,"text":3885},"The docker Group Is Root",{"id":2385,"depth":378,"text":3887},"Mounting \u002Fvar\u002Frun\u002Fdocker.sock Into a Container",{"id":2471,"depth":378,"text":2472},{"id":2591,"depth":378,"text":3890},"Process Limits (pids_limit)",{"id":2642,"depth":378,"text":2643},{"id":2920,"depth":378,"text":3893},"What --privileged Actually Does",{"id":2936,"depth":378,"text":2937},{"id":638,"depth":378,"text":8},{"id":3275,"depth":378,"text":3278},{"id":3504,"depth":378,"text":3507},1783032848276]