Bioinformatics

Die angewandte Bioinformatik ist in den Lebenswissenschaften eine der Schlüsseldisziplinen des 21. Jahrhunderts. Insbesondere zur Entwicklung hochinnovativer und individueller Therapieansätze, welche die krankheitsspezifischen molekularen Veränderungen jedes einzelnen Patienten in den Mittelpunkt stellen, ist der Einsatz modernster bioinformatischer Lösungsansätze unverzichtbar.

Die Abteilung Bioinformatik des TRON gliedert sich in die Gruppen Computational Medicine, Data Management und Personalized Integrative Computational Genomics. Unser multidiziplinäres Team setzt sich aus Bioinformatikern, Biotechnologen, Mathematikern und Physikern zusammen und interagiert als zentrale Technologieplattform intensiv mit den verschiedenen Fachgruppen des TRON.

Die Gruppen Computational Medicine und Personalized Integrative Computational Genomics entwickeln neuartige Methoden für die Datenverarbeitung und -analyse. Sie stellen somit die konzeptionelle Entwicklung und Umsetzung von innovativen neuen Strukturen, Therapien und Diagnostika im Bereich von Krebserkrankungen, Autoimmunität und Infektionskrankheiten sicher:

Die Gruppe Data Management stellt die infrastrukturelle und technische Basis für die Arbeit der anderen Abteilungen im TRON bereit. Die etablierten und leistungsfähigen Datenbank-, Laborinformations- und Managementsysteme ermöglichen

  • den reibungslosen Ablauf der Hochdurchsatzexperimente wie z.B. Next Generation Sequencing (NGS).
  • die Betreuung unserer Hochleistungsrechner-Infrastruktur (High Performance Computing, HPC).
  • die Überwachung unserer Qualitätssicherungsstandards, welche die Entwicklung von Software für den Einsatz in GxP-regulierten Bereichen erlauben, wie es auch in der Lebensmittel- und Pharmaindustrie zum Einsatz kommt.

Durch unsere Kooperation mit dem Zentrum für Datenverarbeitung (ZDV) der Johannes Gutenberg-Universität Mainz haben wir Zugriff auf einen der leistungsstärksten HPC Cluster der Welt.

TRON CELL LINE PORTAL

TCLP: an online cancer cell line catalogue integrating HLA type, predicted neo-epitopes, virus and gene expression.

Human cancer cell lines are an important resource for research and drug development. However, the available annotations of cell lines are sparse, incomplete, and distributed in multiple repositories. Re-analyzing publicly available raw RNA-Seq data, we determined the human leukocyte antigen (HLA) type and abundance, identified expressed viruses and calculated gene expression of 1,082 cancer cell lines. Using the determined HLA types, public databases of cell line mutations, and existing HLA binding prediction algorithms, we predicted antigenic mutations in each cell line. We integrated the results into a comprehensive knowledgebase. Using the Django web framework, we provide an interactive user interface with advanced search capabilities to find and explore cell lines and an application-programming interface to extract cell line information. The portal is available at http://celllines.tron-mainz.de.

Scholtalbers J, Boegel S, Bukur T, Byl M, Goerges S, Sorn P, Loewer M, Sahin U, Castle JC: TCLP: an online cancer cell line catalogue integrating HLA type, predicted neo-epitopes, virus and gene expression. Genome medicine 2015, 7:118.
DOI; PubMed

SEQ2HLA

HLA typing from RNA-Seq sequence reads.

Boegel S, Löwer M, Schäfer M, Bukur T, Graaf J de, Boisguérin V, Türeci O, Diken M, Castle JC, Sahin U: HLA typing from RNA-Seq sequence reads. Genome medicine 2012, 4:102.
DOI; PubMed

Show abstract

Boegel S, Löwer M, Bukur T, Sahin U, Castle JC: A catalog of HLA type, HLA expression, and neo-epitope candidates in human cancer cell lines. Oncoimmunology 2014, 3:e954893.
DOI; PubMed

Show abstract

Boegel S, Scholtalbers J, Löwer M, Sahin U, Castle JC: In Silico HLA Typing Using Standard RNA-Seq Sequence Reads. Methods in molecular biology (Clifton, N.J.) 2015, 1310:247-258.
DOI; PubMed

Show abstract

DOWNLOAD

Galaxy workflow

Latest seq2hla code can be found in our Github repository.

GALAXY LIMS

Galaxy LIMS for next-generation sequencing

We have developed a laboratory information management system (LIMS) for a next-generation sequencing (NGS) laboratory within the existing Galaxy platform. The system provides lab technicians standard and customizable sample information forms, barcoded submission forms, tracking of input sample quality, multiplex-capable automatic flow cell design and automatically generated sample sheets to aid physical flow cell preparation. In addition, the platform provides the researcher with a user-friendly interface to create a request, submit accompanying samples, upload sample quality measurements and access to the sequencing results. As the LIMS is within the Galaxy platform, the researcher has access to all Galaxy analysis tools and workflows. The system reports requests and associated information to a message queuing system, such that information can be posted and stored in external systems, such as a wiki. Through an API, raw sequencing results can be automatically pre-processed and uploaded to the appropriate request folder. Developed for the Illumina HiSeq 2500 instrument, many features are directly applicable to other instruments.

Scholtalbers J, Rößler J, Sorn P, Graaf J de, Boisguérin V, Castle J, Sahin U: Galaxy LIMS for next-generation sequencing. Bioinformatics (Oxford, England) 2013, 29:1233-1234.
DOI
; PubMed

DOWNLOAD

We are on bitbucket.

Latest Galaxy LIMS code can be found on our bucket:

SOMATIC MUTATION FDR

Confidence-based somatic mutation evaluation and prioritization.

Next generation sequencing (NGS) has enabled high throughput discovery of somatic mutations. Detection depends on experimental design, lab platforms, parameters and analysis algorithms. However, NGS-based somatic mutation detection is prone to erroneous calls, with reported validation rates near 54% and congruence between algorithms less than 50%. Here, we developed an algorithm to assign a single statistic, a false discovery rate (FDR), to each somatic mutation identified by NGS. This FDR confidence value accurately discriminates true mutations from erroneous calls. Using sequencing data generated from triplicate exome profiling of C57BL/6 mice and B16-F10 melanoma cells, we used the existing algorithms GATK, SAMtools and SomaticSNiPer to identify somatic mutations. For each identified mutation, our algorithm assigned an FDR. We selected 139 mutations for validation, including 50 somatic mutations assigned a low FDR (high confidence) and 44 mutations assigned a high FDR (low confidence). All of the high confidence somatic mutations validated (50 of 50), none of the 44 low confidence somatic mutations validated, and 15 of 45 mutations with an intermediate FDR validated. Furthermore, the assignment of a single FDR to individual mutations enables statistical comparisons of lab and computation methodologies, including ROC curves and AUC metrics. Using the HiSeq 2500, single end 50 nt reads from replicates generate the highest confidence somatic mutation call set.

Löwer M, Renard BY, Graaf J de, Wagner M, Paret C, Kneip C, Türeci O, Diken M, Britten C, Kreiter S, Koslowski M, Castle JC, Sahin U: Confidence-based somatic mutation evaluation and prioritization. PLoS computational biology 2012, 8:e1002714.
DOI
; PubMed

rapmad

Robust analysis of peptide microarray data.

Peptide microarrays offer an enormous potential as a screening tool for peptidomics experiments and have recently seen an increased field of application ranging from immunological studies to systems biology. By allowing the parallel analysis of thousands of peptides in a single run, they are suitable for high-throughput settings. Since data characteristics of peptide microarrays differ from DNA oligonucleotide microarrays, computational methods need to be tailored to these specifications to allow a robust and automated data analysis. While follow-up experiments can ensure the specificity of results, sensitivity cannot be recovered in later steps. Providing sensitivity is thus a primary goal of data analysis procedures. To this end, we created rapmad (Robust Alignment of Peptide MicroArray Data), a novel computational tool implemented in R. We evaluated rapmad in antibody reactivity experiments for several thousand peptide spots and compared it to two existing algorithms for the analysis of peptide microarrays. rapmad displays competitive and superior behavior to existing software solutions. Particularly, it shows substantially improved sensitivity for low intensity settings without sacrificing specificity. It thereby contributes to increasing the effectiveness of high throughput screening experiments. rapmad allows the robust and sensitive, automated analysis of high-throughput peptide array data.
from http://www.tron-mainz.de/compmed.

Renard BY, Löwer M, Kühne Y, Reimer U, Rothermel A, Türeci O, Castle JC, Sahin U: rapmad: Robust analysis of peptide microarray data. BMC bioinformatics 2011, 12:324.
DOI
; PubMed

DOWNLOAD THE RAPMAD R-PACKAGE AS WELL AS THE DATA SETS

CT26

Characterization of the CT26 colorectal carcinoma genome, transcriptome and immunome.

Tumor models are critical for our understanding of cancer and the development of cancer therapeutics. Here, we present an integrated map of the genome, transcriptome and immunome of an epithelial mouse tumor, the CT26 colon carcinoma cell line. We found that Kras is homozygously mutated at p.G12D, Apc and Tp53 are not mutated, and Cdkn2a is homozygously deleted. Proliferation and stem-cell markers, including Top2a, Birc5 (Survivin), Cldn6 and Mki67, are highly expressed while differentiation and top-crypt markers Muc2, Ms4a8a (MS4A8B) and Epcam are not. Myc, Trp53 (tp53), Mdm2, Hif1a, and Nras are highly expressed while Egfr and Flt1 are not. MHC class I but not MHC class II is expressed. Several known cancer-testis antigens are expressed, including Atad2, Cep55, and Pbk. The highest expressed gene is a mutated form of the mouse tumor-specific antigen gp70. Of the 1,688 non-synonymous somatic point mutations, 154 are both in expressed genes and in peptides predicted to bind MHC and thus potential targets for immunotherapy development. Based on its molecular signature, we predicted that CT26 is refractory to anti-EGFR mAbs and sensitive to MEK and MET inhibitors, as have been previously reported. CT26 cells share molecular features with aggressive, undifferentiated, refractory human colorectal carcinoma cells. As CT26 is one of the most extensively used syngeneic mouse tumor models, our data provide a map for the rationale design of mode-of-action studies for pre-clinical evaluation of targeted- and immunotherapies.

Castle JC, Loewer M, Boegel S, Graaf J de, Bender C, Tadmor AD, Boisguerin V, Bukur T, Sorn P, Paret C, Diken M, Kreiter S, Türeci Ö, Sahin U: Immunomic, genomic and transcriptomic characterization of CT26 colorectal carcinoma. BMC genomics 2014, 15:190.
DOI; PubMed