Primer Chapter 5: Viruses and Their Strange Relatives


This chapter is part of the companion primer to The Inhabited Body. It explores the biology of viruses in depth — with particular attention to bacteriophages, the viruses that infect bacteria. Phages are among the most important and least appreciated players in the human microbiome. This chapter also covers retroviruses, the concept of the virome, and the recently discovered obelisks.


The Most Abundant Entities on Earth

If biology were a popularity contest, viruses would win by a margin so vast it defies comprehension. The estimated number of viral particles on Earth — roughly 10³¹, or ten million trillion trillion — exceeds the combined total of every other biological entity [suttle2007?]. There are more viruses in a litre of seawater than there are people on the planet. There are more viruses in the human gut alone than there are stars in the Milky Way.

And yet, as we noted in Primer Chapter 1, viruses are not "alive" by most standard definitions. They have no metabolism. They cannot grow. They cannot reproduce on their own. A virus is a set of instructions — a genome made of DNA or RNA — wrapped in a protein shell. It is a biological message in a bottle, inert and purposeless until it reaches the right cell. Then it becomes something else entirely: a hijacker of extraordinary efficiency, commandeering the cell's own machinery to produce copies of itself.

In Primer Chapter 1, we introduced viruses briefly and placed them on the map of life alongside bacteria, archaea, and eukaryotes. Now it is time to look inside the bottle.


Anatomy of a Virus

Despite their dizzying diversity — there are viruses that infect every domain of cellular life, and probably every species — all viruses share a few basic structural features.

The Genome

Every virus has a genome: a set of genetic instructions encoded in nucleic acid. But unlike cells, which invariably use double-stranded DNA, viruses have explored almost every possible nucleic acid configuration. Some carry double-stranded DNA (like the herpes simplex virus). Some carry single-stranded DNA (like the parvoviruses). Some carry double-stranded RNA (like the rotaviruses). And some carry single-stranded RNA, which can be "positive-sense" (ready to be read directly by the host's ribosomes, like SARS-CoV-2) or "negative-sense" (needing to be copied into a complementary strand first, like the influenza virus).

This variety of genome types is one reason viruses are so diverse and so difficult to classify. In 1971, the virologist David Baltimore proposed a classification system based not on what viruses look like but on how they handle their genetic information — specifically, how they produce messenger RNA. The Baltimore classification groups all viruses into seven classes based on their genome type and replication strategy. It remains one of the most widely used frameworks in virology.

The Capsid

Surrounding the genome is the capsid — a shell made entirely of protein. Capsid proteins are typically encoded by the virus's own genome and self-assemble into one of a few highly symmetrical shapes.

Think of virus shapes like packaging options. Some viruses are icosahedral — roughly spherical, with twenty flat triangular faces, like a microscopic football. This is an extraordinarily efficient shape: it encloses the maximum volume for the minimum number of protein subunits. Many common viruses, from those that cause the common cold (rhinoviruses) to those that cause polio, use this design.

Other viruses are helical — their capsid proteins wrap around the genome in a spiral, producing a tube-like or rod-like structure. The tobacco mosaic virus, the first virus ever identified (in 1892 by Dmitri Ivanovsky), is a classic example.

Then there are the complex viruses, which combine elements of both designs or adopt entirely unique architectures. The most striking examples are the bacteriophages — the viruses that infect bacteria — many of which look like nothing else in biology: an icosahedral head sitting on top of a cylindrical tail, with spidery tail fibres splayed out at the base, like a lunar lander built from proteins. This structure is not decorative. It is a precision injection device, engineered by billions of years of evolution to dock onto a bacterial surface and deliver a viral genome with mechanical efficiency.

The Envelope

Some viruses — but not all — surround their capsid with an additional outer layer called the envelope. The envelope is a lipid bilayer, similar to a cell membrane, that the virus acquires by budding through the membrane of its host cell on the way out. Studded in this stolen membrane are viral proteins — often glycoproteins — that the virus uses to recognise and bind to new host cells.

The spike protein of SARS-CoV-2, which became famous during the COVID-19 pandemic, is one such envelope protein. It is the molecular key that the virus uses to unlock the ACE2 receptor on human cells.

Enveloped viruses tend to be more fragile outside the body than non-enveloped ones, because their lipid membrane is easily disrupted by soap, alcohol, and drying. This is why washing your hands with soap is effective against influenza and coronaviruses (both enveloped) but less so against norovirus (non-enveloped).


The Viral Life Cycle: Two Strategies

Once a virus has found its target cell, it faces a fundamental choice — one that has profound consequences for the microbiome.

The Lytic Cycle: Smash and Grab

In the lytic cycle, the virus is an unambiguous predator. It attaches to the cell surface, injects (or otherwise delivers) its genome, and immediately takes over the cell's machinery. The host cell becomes a virus factory, churning out copies of the viral genome and viral proteins. These components assemble into new virions inside the cell. When enough new virions have accumulated, the cell is lysed — broken open — releasing a burst of progeny viruses into the environment. The host cell is destroyed.

The analogy here is a factory raid. The virus breaks in, fires the management, reprograms the production lines to manufacture copies of itself, and then demolishes the building on the way out.

The lytic cycle is fast and violent. A single bacteriophage infecting an E. coli cell can produce 100 to 200 new phage particles in as little as 20 to 30 minutes. Multiply this across the trillions of phage infections happening every day in your gut, and you begin to appreciate the scale of viral predation within the microbiome.

The Lysogenic Cycle: The Sleeper Agent

The second strategy is subtler and, in many ways, more interesting. In the lysogenic cycle, the virus does not immediately destroy its host. Instead, it integrates its genome into the host cell's DNA and goes quiet.

The integrated viral genome — called a prophage in bacteria or a provirus in other organisms — is replicated along with the host's own DNA every time the cell divides. The host cell is unharmed. It may not even "know" it is infected. The viral genome rides along as a silent passenger, copied faithfully from one generation to the next, potentially for thousands of bacterial generations.

Think of this as a sleeper agent embedded in a foreign government. The agent lives a normal life, does normal work, raises no suspicion. But the instructions for a mission are still encoded, waiting. Under certain conditions — ultraviolet light, DNA damage, starvation, exposure to certain chemicals — the prophage can reactivate, excise itself from the bacterial chromosome, and switch to the lytic cycle, producing new virions and killing the host cell.

This switch from lysogeny to lysis is called induction, and it is not random. Many of the triggers that induce prophages are signals of stress — conditions in which the host bacterium is likely to die anyway. From the virus's perspective, it makes sense to abandon a sinking ship and seek a new host.

Why Lysogeny Matters for the Microbiome

Lysogeny is not merely a curiosity of virology. It is central to the microbiome story, for three reasons.

First, prophages are everywhere. Surveys of bacterial genomes have revealed that the majority of sequenced bacteria carry at least one prophage, and many carry several. Your gut bacteria are riddled with dormant viral genomes. Some of these prophages have been silent for so long that they have decayed — accumulated mutations that prevent them from ever reactivating — and are now permanent fixtures of their host's genome, indistinguishable from "normal" bacterial DNA except by careful sequence analysis.

Second, prophages can change what their host does. When a virus integrates into a bacterium's chromosome, it sometimes brings genes that have nothing to do with viral replication — genes that can give the host new capabilities. This is called lysogenic conversion, and its consequences can be dramatic. The cholera toxin, which causes the devastating diarrhoea of cholera, is encoded not by Vibrio cholerae itself but by a prophage (CTXφ) integrated into its genome. Without the phage, the bacterium is harmless. Similarly, the Shiga toxin produced by pathogenic strains of E. coli is carried by a prophage. In a very real sense, some of the most dangerous bacterial toxins are viral inventions.

Third, prophage induction reshapes microbial communities. When environmental conditions change — a shift in diet, a course of antibiotics, an inflammatory flare — dormant prophages across the gut can be induced simultaneously, triggering a wave of bacterial lysis. This mass killing event releases a flood of bacterial DNA, cell contents, and new phage particles into the gut environment, altering the composition of the microbial community. Researchers are beginning to suspect that prophage induction may be one of the mechanisms through which antibiotics cause lasting disruption to the microbiome — not just by directly killing bacteria, but by activating the viral time bombs already embedded in their genomes.


Bacteriophages: The Invisible Regulators

Bacteriophages — viruses that infect bacteria — are the most abundant biological entities in the human body. They outnumber the bacteria they infect by an estimated ratio of roughly ten to one. Yet until recently, they were almost entirely overlooked in microbiome research, which tended to focus exclusively on bacteria [shkoporov2019?].

The reason for this neglect is partly technical. Standard metagenomics pipelines (introduced in Primer Chapter 6 and discussed in detail in Chapter 4 of the main book) are designed to capture bacterial DNA. Phages, with their small genomes and high mutation rates, often slip through the cracks. Many phage sequences in metagenomic datasets are classified as "unknown" or "uncharacterised" — dark matter in the microbial universe.

But the picture is changing rapidly. Advances in viral metagenomics have revealed an astonishing diversity of phages in the human gut — a community now referred to as the phageome (or, more broadly, the gut virome, which includes all viruses, not just phages). Some findings:

The human gut virome is highly individual. No two people share the same phage community, even identical twins. Your phageome is as unique as your fingerprint — arguably more so.

The phageome is relatively stable over time within an individual, despite the rapid turnover of individual phage particles. This suggests that phages and their bacterial hosts co-exist in a dynamic equilibrium — a perpetual arms race in which neither side wins outright.

Phages shape bacterial community structure through what ecologists call "kill the winner" dynamics. When a particular bacterial species becomes very abundant (the "winner"), it becomes a more visible target for phages that specialise in infecting that species. The phages multiply, drive down the winner's numbers, and create space for other bacterial species to expand. This prevents any single species from dominating and helps maintain the diversity of the microbiome. It is a biological thermostat, keeping the ecosystem in balance.


Retroviruses: Information Flowing Backwards

Most viruses follow the standard flow of genetic information described in Primer Chapter 3: DNA → RNA → protein. But one group of viruses breaks this rule in a spectacular way.

Retroviruses carry their genome as single-stranded RNA. When a retrovirus infects a cell, it uses a special enzyme called reverse transcriptase to convert its RNA genome into DNA — the reverse of the normal flow. This viral DNA is then integrated into the host cell's genome by another viral enzyme called integrase. Once integrated, the viral DNA (now called a provirus) is transcribed and translated using the host cell's own machinery, producing new viral RNA genomes and viral proteins that assemble into new virions.

The discovery of reverse transcriptase in 1970, independently by David Baltimore and by Howard Temin and Satoshi Mizutani, was a landmark in molecular biology [baltimore1970?] [temin1970?]. It overturned the assumption that genetic information could only flow from DNA to RNA, never the reverse. Both groups shared the Nobel Prize in 1975.

The most infamous retrovirus is HIV (human immunodeficiency virus), the cause of AIDS. HIV specifically infects a type of immune cell called the CD4+ T cell — a critical coordinator of the immune response. By destroying these cells over years, HIV progressively dismantles the immune system, leaving the body vulnerable to infections that a healthy immune system would normally control [barresinoussi1983?].

HIV illustrates the insidiousness of the retroviral strategy. Because the provirus is integrated into the host cell's DNA, it is copied every time the cell divides. It becomes a permanent part of that cell's genome. This is why HIV cannot be cured by conventional antiviral drugs: even if every active virus particle in the body is suppressed, the proviral DNA persists in long-lived "reservoir" cells, ready to reactivate if treatment is stopped.

Endogenous Retroviruses: Your Viral Ancestry

As we mentioned briefly in Primer Chapter 1, retroviral integration is not always a dead end. If a retrovirus happens to infect a germ cell — a sperm or egg — its proviral DNA can be passed to the next generation. And the next. And the next. Over millions of years, the human genome has accumulated thousands of such insertions. These ancient sequences are called human endogenous retroviruses (HERVs), and they make up roughly 8 per cent of the human genome — far more than the 1.5 per cent that codes for proteins.

Most HERVs are fossils: mutated beyond recognition, incapable of producing functional viruses. But some fragments have been domesticated — co-opted by the human genome for its own purposes. The protein syncytin, derived from an ancient retroviral envelope gene, is essential for the formation of the human placenta. Without this repurposed viral protein, human pregnancy as we know it would not be possible.

This is a profound twist in the story of host and parasite. Over evolutionary time, the boundary between "our" DNA and "their" DNA has become thoroughly blurred. We carry viral genomes within us not as parasites but as integral components of our own biology.


The Virome: Thinking Beyond Bacteria

For most of the history of microbiome research, "microbiome" effectively meant "bacteriome" — the community of bacteria living in and on us. Viruses, fungi, archaea, and protists were afterthoughts. But as sequencing technologies have improved, it has become clear that the complete picture must include the virome — the total community of viruses associated with the human body.

The human virome includes:

Bacteriophages — by far the largest component, numbering in the trillions and targeting the bacteria of the microbiome.

Eukaryotic viruses — viruses that infect human cells. Many of these are present in healthy people without causing symptoms. Torque teno virus (TTV), for example, is found in more than 90 per cent of the human population and appears to cause no disease. Some researchers speculate that these persistent, seemingly benign viral passengers may actually modulate the immune system in ways we do not yet understand.

Endogenous viruses — the ancient retroviral sequences embedded in the human genome, discussed above.

Plant-derived and dietary viruses — sequences from viruses that infect the foods we eat, which pass through the gut without infecting human cells but can still interact with the immune system.

The virome is not static. It changes with diet, age, geography, and health status. Alterations in the virome have been associated with inflammatory bowel disease, type 1 diabetes, and malnutrition, though in most cases it remains unclear whether the viral changes are a cause or a consequence of disease. Untangling cause from correlation is one of the major challenges facing virome research — and microbiome science more broadly.


Phage Therapy: An Old Idea Whose Time May Have Come

The idea of using phages to treat bacterial infections is almost as old as the discovery of phages themselves. Félix d'Hérelle, who coined the term "bacteriophage" in 1917, began experimenting with phage therapy almost immediately. In the decades before antibiotics became widely available, phage therapy was practised in parts of Europe and the Soviet Union, particularly at the Eliava Institute in Tbilisi, Georgia, where it continues to this day.

The rise of antibiotics in the 1940s pushed phage therapy to the margins of Western medicine. Antibiotics were easier to manufacture, store, and administer. They worked against a broad range of bacteria, while phages were highly specific — a single phage might infect only one strain of one species. This specificity, which seemed like a disadvantage in the age of antibiotics, is now being reconsidered as a strength.

The logic is straightforward. Broad-spectrum antibiotics are blunt instruments. They kill pathogenic bacteria, but they also devastate the beneficial microbiome, causing side effects ranging from diarrhoea to life-threatening Clostridioides difficile infection. Phages, by contrast, are precision weapons. A phage that targets pathogenic E. coli will leave Bacteroides, Lactobacillus, and the rest of the gut community untouched.

As antibiotic resistance continues to erode the effectiveness of our existing drugs, phage therapy is experiencing a renaissance. Clinical trials are underway for phage treatments of wound infections, urinary tract infections, and lung infections in cystic fibrosis patients. Regulatory frameworks are being developed. And new techniques for engineering phages — modifying their host range or enhancing their killing ability — are expanding the therapeutic toolkit.

We will discuss phage therapy in depth in Chapters 16–18 of the main book. For now, the key point is that the very same specificity that makes phages so important for regulating the natural microbiome also makes them promising tools for medicine — if we can learn to deploy them wisely.


Obelisks, Viroids, and the Unknown Unknowns

At the margins of virology — indeed, at the margins of biology itself — lie entities that are simpler than viruses, stranger than anything in a textbook, and, in some cases, discovered so recently that we have barely begun to understand them.

Viroids, introduced briefly in Primer Chapter 1, are tiny circular RNA molecules — typically just 250 to 400 nucleotides long — that infect plant cells and cause disease. They have no capsid, no envelope, no proteins of any kind. They are naked RNA: the smallest known infectious agents. Despite their simplicity, viroids can replicate inside host cells by hijacking the cell's own RNA polymerase. They were discovered in 1971 by Theodor Diener, and for decades they were thought to be exclusively a problem for plants — causing diseases in potatoes, coconut palms, avocados, and other crops.

Then came the obelisks. In 2024, Ivan Zheludev and colleagues reported the discovery of a new class of viroid-like RNA elements in the human gut microbiome [zheludev2024]. These agents — which the researchers named obelisks for their predicted rod-shaped RNA secondary structure — are small circular RNA molecules of roughly 1,000 nucleotides. Like viroids, they have no protein coat. Unlike classical viroids, they encode a protein — a single, novel protein the researchers called "Oblin," which belongs to no known protein family.

Obelisks are not rare. They were detected in roughly 7 per cent of gut metatranscriptomes and in about 50 per cent of oral metatranscriptomes analysed by the team. Large-scale searches identified nearly 30,000 distinct obelisk sequences from samples spanning all seven continents. In at least one case, obelisks were shown to persist in an individual for more than 300 days, and one obelisk was traced to a specific bacterial host species, Streptococcus sanguinis, a common member of the oral microbiome.

What do obelisks do? We do not know. Whether they are parasites, commensals, or something else entirely remains an open question. They do not fit neatly into any existing category: they are not viruses (no capsid), not classical viroids (they encode a protein), and not plasmids (they are RNA, not DNA). They are, for now, a category unto themselves — a reminder that even in the most intensively studied environment on Earth (the human body), there remain biological entities that we have only just begun to notice.


Why This All Matters for the Microbiome

At first glance, a chapter on viruses might seem tangential to a book about the human microbiome. It is not. Viruses — particularly bacteriophages — are inseparable from the story of microbial communities.

Phages regulate bacterial populations through predation, keeping any single species from monopolising resources. They drive bacterial evolution by transferring genes between hosts — including genes for antibiotic resistance, toxin production, and metabolic capabilities. Prophages embedded in bacterial genomes can alter what bacteria do, turning harmless commensals into dangerous pathogens — or, occasionally, conferring advantages that help their hosts thrive.

The virome interacts with the immune system in ways we are only beginning to understand. Phage particles can cross the gut epithelium and enter the bloodstream, where they are recognised by immune cells. Phage DNA, once inside immune cells, can trigger innate immune responses through pattern recognition receptors — the same receptors discussed in the context of mRNA vaccines in Primer Chapter 3.

And at the very edge of our knowledge, entities like obelisks hint at an entire stratum of biological complexity that we have not yet mapped. The microbiome is not just bacteria. It is an ecosystem — and like all ecosystems, it includes predators, parasites, symbionts, and entities that defy easy classification.


Where This Matters in The Inhabited Body


Chapter References