Breaking Code · 01

How curl GETs it?

What really happens between you entering a curl command and the bytes landing in your terminal.

10 min read
Breaking Code — a Revibe series

Type this in your terminal:

curl https://example.com

Hit Enter, and a second later some HTML scrolls past. It looks instant, but it isn't.

Between your keystroke and those bytes, there's roughly 200 milliseconds of careful, layered work. A name gets looked up, a phone line gets opened, a secret handshake happens, and a polite request goes out. After that, a tight little loop catches bytes as the kernel hands them up, one batch at a time, until the server says it's done.

This is the walk — eight stops, in order, with no skipping ahead.


1. Finding the server

The internet doesn't move data using names — it uses numbers, called IP addresses. So curl's first job is to figure out the number behind example.com. That lookup is what DNS — the Domain Name System — is for. It's basically the internet's phone book.

It's not one server somewhere — it's thousands of them, organized in layers and run by different people. When curl asks "where does example.com live?", the question travels through a chain:

sequenceDiagram participant You as Your computer participant Resolver as ISP resolver participant Root as Root servers participant TLD as .com servers participant Auth as example.com nameserver You->>You: check local cache (miss) You->>Resolver: where is example.com? Resolver->>Root: where is example.com? Root-->>Resolver: ask the .com servers Resolver->>TLD: where is example.com? TLD-->>Resolver: ask example.com's nameserver Resolver->>Auth: where is example.com? Auth-->>Resolver: 93.184.216.34 Resolver-->>You: 93.184.216.34

Each step caches the answer. The next request skips ahead.

Your computer asks itself first. If it looked up example.com recently, it uses the cached answer and is done in microseconds. If not, it asks a resolver — usually a server run by your internet provider, or one you picked yourself like Google's 8.8.8.8 or Cloudflare's 1.1.1.1. These are real machines in real data centres, and they do nothing all day except answer "what's the IP for ___?" billions of times.

If the resolver doesn't know either, it asks the root servers — thirteen of them globally, the oldest piece of the internet's plumbing. The roots point to the .com servers, which point to example.com's authoritative nameserver, which finally says: 93.184.216.34.

Think of it like asking your way around a new city. The hotel receptionist doesn't know — she calls a tourist office. That office doesn't know the exact street — they call someone in the neighborhood, who knows the actual house. Everyone in the chain remembers the answer, so the next tourist gets there faster.

In curl, the whole dance is hidden behind one function:

// lib/hostip.c
CURLcode Curl_resolv(struct Curl_easy *data,
                     uint8_t dns_queries,
                     const char *hostname,
                     uint16_t port,
                     uint8_t transport,
                     timediff_t timeout_ms,
                     uint32_t *presolv_id,
                     struct Curl_dns_entry **pdns);
Hand it a hostname, get back an address.

Behind that signature, curl can use the OS's blocking resolver, an async library, or even DNS-over-HTTPS — where the lookup itself is encrypted, so your ISP can't see which sites you're visiting. The caller doesn't care which.


2. Knocking on the door

curl now has the address, so it's time to make contact.

It opens a socket, which is just a thing your program uses to talk to the network — the same way a phone is a thing you use to talk to a person. curl picks up the phone and dials 93.184.216.34 on port 443 (the standard port for HTTPS). The kernel does the hard work of finding a path through the network and negotiating with the other side, while curl just makes the call.

Underneath all the abstraction layers, the moment of contact is a single C function call:

// lib/cf-socket.c
rc = connect(ctx->sock,
             &ctx->addr.curl_sa_addr,
             (curl_socklen_t)ctx->addr.addrlen);
The line where curl hands the conversation to the kernel.

It takes three arguments: the socket (the phone), the address (who to call), and a length.

Wait — why does connect() need a length?

IP addresses come in two sizes. An IPv4 address structure is about 16 bytes. An IPv6 one is about 28. The address argument is just a pointer — a "the address is over there" — so the kernel also needs to know how big it is to read it correctly. That's all length is: the size of the address blob in bytes.

Imagine handing someone a slip of paper with a phone number and saying "it's 10 digits long." Without that, they wouldn't know if it was a local number or an international one. length does the same job for the kernel.

If the server is awake on the other end, it picks up, and curl now has an open connection to it — a two-way line where either side can speak and the other will hear. Everything that follows happens over this one line.


3. The whispered handshake

The line is open, but it's not yet private. Anyone sitting between you and the server — your coffee shop's Wi-Fi router, your ISP, a curious sysadmin somewhere — can see every byte going across. For an HTTPS request, that's not okay, so before any real conversation happens, curl and the server do a little dance to set up encryption. This dance is called the TLS handshake.

It looks like this:

sequenceDiagram participant C as curl participant S as server C->>S: hello, here are the ciphers I support S-->>C: hello, here's my certificate C->>C: verify certificate against trusted CAs C->>S: here's a shared secret, encrypted with your public key S-->>C: ok, switching to encrypted Note over C,S: everything from here is encrypted

One round trip on modern TLS versions. Two on older ones.

Two things matter here. First, the server has to prove it is who it claims to be, and it does that by showing a certificate — essentially an ID card signed by a trusted authority (like Let's Encrypt or DigiCert). curl checks the signature against a list of trusted authorities your operating system ships with, and if the signature doesn't add up, the request fails before any data is sent.

Second, both sides need to agree on a secret key that only the two of them know, so that everything they say afterwards can be scrambled. They do this with some clever public-key math, and the upshot is that from this point on, every byte going across the wire is encrypted. Anyone watching the connection just sees noise.

If you typed http:// instead of https://, this whole step is skipped, and your data goes out in plain text. That's why nearly every website on the modern internet uses HTTPS.


4. Saying what you want

The line is open and encrypted, so curl can finally say what it came to say. The request itself is surprisingly simple — it's just plain text, sent over the connection one chunk at a time. For our curl https://example.com command, the request that goes out looks roughly like this:

GET / HTTP/1.1
Host: example.com
User-Agent: curl/8.x
Accept: */*
The actual bytes curl writes to the socket (before TLS encrypts them).

The first line says "give me whatever lives at path /, using HTTP version 1.1." The other three are polite metadata: which host we're talking to, what client we are, and what kinds of content we'll accept back. The empty line at the end is HTTP's way of saying "headers are done, the body would start here" — but a GET request has no body, so that's the whole message.

Inside curl, this request is built up header by header inside a function called Curl_http() in lib/http.c. It picks the method, sets the User-Agent, adds any cookies or auth headers, walks through a list of standard headers in a loop, and once the whole thing is assembled into a buffer, it ships the buffer down the socket in one write. The server reads it on the other end and starts preparing a response.


5. The pump

With the request out the door, curl now sits and waits for the response. But "waits" is the wrong word for what actually happens, because curl doesn't passively sit there. It actively asks the kernel for bytes, over and over, until the server has nothing left to say.

Here's the surprising bit: curl doesn't pull bytes out of the network. The network card receives them, the kernel buffers them, and curl scoops them up the moment the kernel says they're ready. It's like standing under a faucet with a cup. You don't pull the water out of the tap — you just hold the cup there, and the water lands in it.

The scooping happens inside a tight little loop in lib/transfer.c, and once you strip away the error handling and the rate limiting, it really is this small:

// lib/transfer.c (simplified)
do {
    result = xfer_recv_resp(data, buf, bytestoread,
                            is_multiplex, &blen);

    is_eos = (blen == 0);          // server said "done"

    if(!blen) {                     // 0 bytes = end of stream
        Curl_req_stop_send_recv(data);
        if(k->eos_written) break;
    }

    result = Curl_xfer_write_resp(data, buf, blen, is_eos);

    if(!CURL_REQ_WANT_RECV(data)) break;
} while(maxloops--);                // fairness cap: 10 reads, then yield
The whole receive engine, in about a dozen lines.

Three details in this loop are worth pointing out, because they show up everywhere else in libcurl.

xfer_recv_resp is the moment curl asks the kernel for bytes. The kernel answers in one of three ways: "here, take these," "nothing right now, try later" (the EAGAIN case), or "zero bytes — the connection's done" (the end-of-stream case). curl handles all three branches inside the same loop without changing shape.

The maxloops = 10 cap is a small piece of fairness engineering. After ten quick reads, curl breaks out of the loop and gives other ongoing transfers a turn. Without this, a single fast download could starve every other handle attached to the same multi instance, which matters a lot when you're running ten LLM streams in parallel and don't want one of them to monopolise the thread.

The bytestoread value gets clamped earlier, just before the loop runs, if curl already knows the total response size. If the server said Content-Length: 1256, curl will never ask the kernel for more than the bytes that remain. That way it can't accidentally over-read past the boundary of one response and into the start of the next.


6. Knowing when to stop

That brings up an interesting question. If curl is just looping and asking for bytes, how does it actually know when to stop? The server has to signal it somehow, and HTTP allows three different ways.

flowchart TD A[Response headers arrive] --> B{What did the server say?} B -->|Content-Length: 1256| C[Read exactly 1256 bytes, then stop] B -->|Transfer-Encoding: chunked| D[Read chunks until one has size 0] B -->|Neither header set| E[Read until the socket closes] C --> F[Done] D --> F E --> F

Three protocols for "I'm finished." curl handles all of them.

The first and most common is Content-Length. The server sends this header up front saying exactly how big the response body will be — Content-Length: 1256 means "read 1256 bytes after the headers, then we're done." curl counts down from that number, and when the count hits zero, the response is complete.

The second is Transfer-Encoding: chunked. This is used when the server doesn't know the total size up front, which happens a lot in practice — an LLM API streaming a response token by token, for example, or a server that's generating output on the fly. Each chunk is prefixed with its own size in hex, and a chunk with size zero is the terminator. curl reads chunk after chunk until it sees that zero.

The third is the oldest and simplest: the server just hangs up. If neither Content-Length nor Transfer-Encoding: chunked is set, curl reads until the kernel returns zero bytes from recv(), which means the server closed the connection. That zero is the goodbye. Old HTTP/1.0 used this almost exclusively, and it still works today as the fallback when nothing else is specified.


7. Where the bytes leave the building

Every batch of bytes curl receives doesn't go directly to your terminal. It passes through a small chain of helpers first, and each helper does one transformation. One strips the chunk-size markers if the response was chunked, another decompresses the bytes if the server gzipped them, another might handle Brotli or zstd. The last helper in the chain is the one that finally hands the bytes out of the library.

flowchart LR A[Raw bytes from socket] --> B[Chunked decoder
strips chunk sizes] B --> C[gzip / brotli decoder
if Content-Encoding set] C --> D[Your write callback
CURLOPT_WRITEFUNCTION] D --> E[fwrite → stdout]

A composable stack. Same shape for plain, compressed, and chunked responses.

This chain is what makes libcurl flexible. The same architecture handles plain responses, compressed responses, chunked responses, and any combination of them — each transformation is just another link, added or skipped based on what the response headers said. All of it is orchestrated through one function:

// lib/sendf.c
CURLcode Curl_client_write(struct Curl_easy *data, int type,
                           const char *buf, size_t len);
Every byte you see in your terminal passed through this function.

The last link in the chain is the actual write callback, and that one is a function the caller provides. For the curl command-line tool, the callback is essentially one fwrite to standard output, which is your terminal. If you were using libcurl from your own program, you'd set the callback yourself via CURLOPT_WRITEFUNCTION and decide where the bytes go — to a file, into a buffer, straight into a database column, or wherever you need them.


8. The whole journey, one diagram

That's the whole machine. Let's run it back from the top in one diagram, with every actor talking at once:

sequenceDiagram participant U as You participant T as curl CLI participant L as libcurl participant K as Kernel participant S as Server U->>T: curl https://example.com T->>L: curl_easy_perform() L->>K: DNS lookup K-->>L: 93.184.216.34 L->>K: connect(socket, addr, len) K->>S: TCP SYN S-->>K: SYN-ACK L->>S: TLS handshake S-->>L: TLS ready L->>S: GET / HTTP/1.1 S-->>L: HTTP response (streamed) loop until done L->>K: recv() K-->>L: bytes (or EAGAIN) L->>L: decode + write callback L-->>U: fwrite to stdout end L->>K: close(socket)

Around 200 milliseconds, end to end.

About two hundred milliseconds, six syscalls that really matter, and one write callback at the end. Everything else is bookkeeping — keeping track of which bytes have arrived, which decoders to run, and when to stop asking. The library is doing a lot of work, but the moves are simple, and they happen in the same order every time.


9. Why this matters now

The next time your AI agent fires off a hundred curl calls in parallel against Anthropic, OpenAI, and whatever vector DB you spun up last week, you're standing on top of this exact machine. A DNS lookup, an encrypted phone call, a polite request, and a tight little loop that catches the answer as it streams back — the same nine steps, repeated a hundred times over, all happening on one process and all in the time it takes you to blink.

The fact that any of this is fast, reliable, and runs on every operating system on the planet is the result of decades of careful work, most of it done by one person and given away for free. The next time you see curl/8.x in a User-Agent string, you'll know what's happening underneath. If you want to see the rest of what curl does — every protocol handler, the connection-filter chain, the Easy vs Multi interfaces, the whole concurrency model — the full interactive breakdown of the curl codebase is on Revibe.

Want to go deeper into curl?

This post walked one GET end to end. curl's source covers a lot more — FTP, SCP, WebSockets, every protocol handler in lib/, and the multi-interface concurrency that lets one process drive hundreds of transfers at once. The complete interactive analysis lives on Revibe: modules, flows, system design Q&A, all explorable.