Combinatorial Biochemistry

Consider the methods of molecular biology and recombinant DNA chemistry for a moment. Perhaps the most striking and central notion in these methods is that it is possible to generate a new strain of an organism and to produce as much of it as we want simply by arranging it so that one DNA molecule of the desired type enters one cell with the right background and results in a viable organism under some set of selection conditions. You only need one. Yet we might start with billions of cells and many more DNA molecules -- in the end we can select from this large number of possibilities the one combination that fits our criteria. You have all done this in the lab. It works.

Why does this work? Two things are essential. We must have an amplification capability (in this case cell growth and reproduction of the plasmid) and we must have the ability to select the cells that contain the genetic information that we want. Whenever we have these two things, we can find the needle in the haystack simply by allowing the needle to reproduce and not the hay. Eventually we end up with a needle-stack.

In the last decade or two, many clever chemists, biochemists and geneticists have realized the generality of these ideas and have used them to generate a variety of techniques for generating functional molecules or cells of interest simply by arranging it so that the molecules or cells of interest are amplified, but not all the other ones. These days this approach goes under various names from combinatorial chemistry to directed evolution, but it is really all the same thing. A large part of what chemists/biochemists/biologists/geneticists will be doing in the next few decades is going to revolve around using this approach to generate molecular and cellular functions of use to mankind.

I said above that to find our needle in the hay stack we needed an amplification scheme and a selection scheme. To do combinatorial chemistry or directed evolution, we need to couple the amplification and selection scheme to a mechanism for generating variation in the population of molecules/cells. The general algorithm is as follows:

1) Generate a large number of different molecules/cells from some starting point.

2) Select those molecules/cells that meet some set of criteria.

3) Amplify the selected molecules/cells.

4) Go to 1) and repeat.

Let's consider a couple examples. The key, however, is to realize that these are only examples and there are nearly in infinite number of possibilities.

Simple Biological mutantion/selection/amplification methods. Geneticists have been doing this type of thing for many decades (even hundreds of years, depending on how you want to look at it). Of course evolution has been doing this sort of thing for the past several billion years. You start with a population of cells or even organisms. You then either depend on the natural mutation background rate or you increase the rate of mutations by using some mutagenesis method (UV irradiation, chemical mutagens, etc.). This generates the required variation. Now you either screen for or select for the cells/organisms of interest. Screening means you simply look at a lot of different organisms and pick the ones that meet your criteria (in animals this is called selective breeding). With bacterial cells, traditional methods of screening are limited to about 10,000 different cells simply because it is physically difficult to look at many more than that. Selection is much better. Here you have set things up so that the cells or organisms you want have a selective growth advantage. Now you can start with 1012 or more bacterial cells and let your conditions (a simple example is antibiotic resistance) get rid of the bacterial that you do not want. Amplification is just growth in this case. You can then irradiate the selected cells again to get addition mutagenic variation and repeat the process as many times as you like. This is just an accelerated form of evolution. It is very powerful, but lacks direction at the molecular level. We do not know what, exactly, we are changing and the cell may come up with a solution we don't want. For example, let's say we were selecting for a cell that produced a molecule that could break down a particular toxin in the environment. Clearly we could select for this. However, the solution the cell may well come up with is not to break the molecule down, but rather to make its cell membrane or wall impermeable to the molecule. Fine for the cell, but we don't end up with the molecule to break down the toxin. We need a more directed approach in general.

The SELEX procedure (Selective Evolution of Ligands by Exponential Enrichment). This particular version of combinatorial chemistry was invented in the laboratory of Larry Gold. This happens to be a biochemical form of combinatorial chemistry. There are also much more chemical forms. To learn more details of SELEX see Larry Gold's web page and publications at

One of the things that modern molecular biology has done for us is to provide us with ways of amplifying specific molecules of DNA outside of the cell. This has opened up the possibility of performing the variation/selection/amplification process without having any living things involved at all. So, let's say we want to make a molecule that binds to some protein we need to assay for in small quantities. We could raise antibodies (a process that is also done by combinatorial chemistry inside your cells), but this is limited and time consuming. Another approach is to generate a bunch of random DNA molecules. Let's say that we make random 20-mers of DNA. There are 420 possible molecules which is about a trillion. A trillion molecules is only about a picomole. We can make micromoles of DNA molecules of this size synthetically without much trouble, so we can easily represent the entire possible population of 20-mers in a tube. Now we bind our protein of interest to something like a bead in a column. We then wash the DNA over this column. Everything goes through except the DNA molecules that stick to the protein. We take these DNA molecules and we amplify them by PCR (polymerase chain reaction). This is a repetitive method that uses DNA polymerase to make many copies of a DNA molecule when you provide it with a pair of primers from either end of the DNA molecule. This amplifies the DNA molecules that stick to the protein on the column. We take this amplified DNA and run it back over our column. Again the ones that stick best are amplified most and the process continues usually until only the very best sticking molecule remains.

Think about this. In a sense, any idiot could do this (of course, that is the beauty of the technique -- its conceptual simplicity). There is no real chemical knowledge involved. We did not really design our new binding molecule. We just made all the possible molecules and found the one that worked. Of course, we could get more sophisticated. We could start making much larger DNA molecules with more complex properties by starting with an initial guess of a molecule we think is close and allowing the procedure to zero in on the best possibility from our initial guess. For those of you that are familiar with non-linear least squares curve fitting or finding potential minima in a complex multidimensional surface, this is exactly the same process. It is simply a physical manifestation of it. In the example described above, we set the system up so that we could actually look at every single point on the surface and find the lowest minimum. However, we could work with a more complicated surface with a larger number of dimensions, and then we may not be able to make enough molecules to sample every point on the surface. We just have to make enough so that we sample all the features on the surface or make our molecules close to the right thing to begin with and then optimize the molecules in subsequent rounds of changes and amplification. This is just like making a good initial guess in a complex curve fitting algorithm.

Directed Evolution. Back to living things for a moment. There are a couple of problems. First, as discussed above, you have no control over what nature does to solve the problem you have given it (that can be good or bad, depending on what you want). Second, mutation frequencies cannot be very high when you are applying them to the whole cell. High mutation frequencies would just kill the cell since they would affect all the genes in it. There is another way and its called directed evolution.

Let's say I want a new enzyme activity that does not exist in nature. Natural enzymes are very specific, but often you can find that an enzyme will catalyze a reaction similar to but different from the one that it is designed to catalyze but at a thousand or ten thousand times slower rate. OK, if we can make up a sensitive assay for the enzyme activity then we can tell if a group of cells on a plate is generating an enzyme that performs the desired activity or not (it needs to be sensitive enough to see very low levels of activity). Now, let's put the gene for the enzyme that does not work very well for our desired activity on a plasmid. Next, let's replace the DNA that codes for the active site of the enzyme with a random library of mutant sequences. These sequences may each contain a few mutations on average in the active site region. We can synthetically put random bases into oligonucleotides when we make them, and then splice these into our gene to generate these libraries. Now we put the library into E. coli for example and express the gene. We screen many colonies and look for the few that making enzymes with a larger amount of our desired activity. We take the plasmid out of these, use error prone PCR (just PCR with a high rate of sticking in the wrong base) to amplify the region of the gene involved in the active site, make a new library by splicing this back into our plasmid with the original enzyme sequence and do it all again. We can keep doing this until we get an enzyme with a high activity for our desired reaction.

This overcomes the problems with natural selection procedures. We are specifically selecting for an enzyme with an activity of interest. When we get it, we know it will be encoded on our plasmid (easy to recover and move to expression systems for production). Finally, we can make the rate of mutation as high as we want without killing the cell since the high rate of mutation only applies to the gene on the plasmid, not all the essential genes in the cell.

This, of course, is only the simplest version. There is a whole field of people now working on ways to do this better. Again, instead of just making little mutations in one place in the enzyme, how about starting with lots of enzyme domains and swapping whole domains around? It's been done. Also, since in general the sequence space is too large to cover in a test tube (consider the number of possible 20 amino acid sequences there are -- 2020 or about 1026 -- a thousand moles!), we need to get better at making good guesses, so people are doing computer simulations and using chemical models to predict a good starting sequence for the evolution directed process. Finally, looking at every colony to find the best ones is laborious and is limited to about 10,000 for manual searching. Other methods of searching through much larger numbers of cells have been created (FACS - fluorescence activated cell sorting, for example) and still others are in the works. This will ultimately be limited by how large a library we can actually make (this is limited by transformation efficiencies), but again, there are methods being generated to overcome this problem as well.

I would say that within the next decade or so, it will pretty much be the case that if you can come up with a sensitive assay for your activity, you will be able to select an enzyme that catalyzes it. The consequences of that are staggering. How about an enzyme that catalyzes the conversion of water into hydrogen and oxygen in the presence of light? Run the whole world off of hydrogen. No polution. Pretty cool. There are a number of people working on this. Your imagination is the limit.

And there's more…These are actually fairly simple examples of what people are doing or thinking about. Why stick with biological materials at all? There are disadvantages (and advantages) to dealing with materials that can rot (be broken down by other biological organisms). If we can make DNA molecules that have almost any property we like, how about other polymers that are not degraded by common bugs and have side groups with chemical properties that DNA and protein molecules do not have? Can we make "smart materials" that react to their environment in specific ways? A lining for dishes that catalyzes the breakdown of grease and other food substances (no more detergents to wash into the ecosystem)? Walls that catalyze the destruction of household toxins or allergens? How can we generate these new polymers when we don't have an enzyme to polymerize them? Design one and select for it via directed evolution! The material becomes the device. Perhaps we can someday select for faster, more powerful computer chips that we "grow" from a variety of polymers with different chemical/physical properties. These ideas are going to change the world. You are going to do it. What are you going to do first?