Workflow of the HPA project

Goal and current status

The overall aim of the Human Protein Atlas project is to map the expression of human proteins in normal tissues, cancers and cell lines through antibody based proteomics. The strategy to meet this aim is large-scale, high throughput generation and validation of antibodies agianst at least one isoform of all the roughly 20.000 protein-coding genes in the human genome, and to use these antibodies in a variety of applications, eg. immunohistochemical staining of normal tissues and cancers, immunofluorecence on cell lines and Western blots. This effort to map the human proteome can be considered as a natural progression of the human genome project, and the Human Protein Atlas project is to a large extent based on the available information derived from the human genome.

The Human Protein Atlas project was initated in 2003 and is funded by the Kunt and Alice Wallenberg foundation. The first on-line version of the Atlas was made public in 2005, and then contained data from a little over 700 antibodies. In the latest version of the Human Protein Atlas (v.13, released November 2014), the expression profiles of nearly 17.000 human proteins have been analyzed using over 24.000 antibodies, which corresponds to 84% of the human protein-coding genome.

Antigens and antibodies

The antibody generation process starts by identifying protein-coding sequences translating into a stretch of 50-150 amino acids with the criteria of being as unique as possible with respect to all other protein-coding genes in the genome. This region, which is called a PrEST (Protein Eptoipe Signature Tag) is cloned by standard molecular methods (primer design, cDNA library, RT-PCR, ligation, transformation, etc.) into bacteria that produce the recombinant PrEST-peptide. The PrEST is then used for i) antigen for immunization, ii) to construct PrEST arrays (see below) and, iii) for affinity purification to generate so called mono-specific antibodies (msAb).

Following immunization, the polyclonal antisera is processed to eliminate unwanted and/or unspecific antibodies through affinity purification using the PrEST as bait. The antibodies that have bound to the PrEST are saved and used in further applications. These affinity purified msAbs are first tested on PrEST arrays, where 384 different PrESTs have been spotted in a microarray format, and the ability and specificity of the antisera to detect its corresponding PrEST among the other PrESTS is analyzed. Antibodies that are approved are then used for Western blots and immunohistochemical test-staining on tissues. Antibodies that are approved on the test staining are then used for large scale protein profiling (sharp staining) and annotation.

Schematic figure of PrEST production, from
bioinformatic design, via cloning to the protein factory.

Protein profiling in tissues and cells

All antibodies that have been approved for protein profiling are stained on a series of 9 different tissue microarrays (TMAs) that combined contain samples from 48 normal human tissues (in triplicate), 20 different forms of cancer (typically 12 different patients per cancer form, in duplicate), 47 different human cell lines and 12 different leukemias. This ammounts to a total of 708 different stained samples for each antibody, which are then scanned as high definition digital images.  

The cell line images are digitally annotated using an image-analasys software, whereas the tissue images are annotated manually by certified pathologists. After the annotation, the result is evaluated and the antibody is given a reliability score based on eg. how well the observed expression pattern correlates to previously known data for the particular protein. Finally, all images along with the annotation data and antibody information is published in the up-coming version of The Human Protein Atlas.

In parallell to the immunohistochemical analysis of cells and tissues, immunofluroescent analyses of the antibody's sub-cellular expression pattern is preformed on several cell lines using confocal microscopy. The sub-cellular localization is manually annotated and the data and images are published together with the other information obtained from the particular antibody. 

Tissue microarray in a paraffin block and
as a Hematoxylin-Eosin stained section.

The website

All data is freely and publically available on The Human Protein Atlas website, which at the current rate is updated yearly. To use the website, an initial search is required to browse the data. One can either simply type the name of the protein of interest (simple search), or one can use the intuitive interface to build advanced queries to include or exclude among tissue types, cell types, cell lines, protein classes, sub-cellular localization, etc.  

The website is centered around a summary page for every protein that provides a general and large-scale overview on the protein's expression pattern. From the summary page, one is then able to browse deeper into the specific expression patterns of all tissues, cancers and cells, and to navigate among the images that form the basis for the annotation, as if one was looking through a microscope.