CRPC-TR99810-S August 1999 Title: Gene Trees: Phylogenetic Trees from Amino Acid Sequences Author: Beckie Chan Submitted November 1999 The goal of this program is to classify genes into families based on their nucleotide and amino acid sequences. From one gene, the program will find related genes, sort them into subfamilies, families, and super families, and organize them into phylogenetic trees. Svjetlana Miocinovic, an undergraduate at the California Institute of Technology, and I worked with biologist from the University of California, Irvine, Dr. George A. Gutman and Dr. K. George Chandy. To maximize efficiency, this program has been broken up into two parts. Svjetlana used similarity search programs to find related genes given one gene sequence. My part involved taking these related gene sequences to create a phylogenetic tree. The whole Program has been written in Java, so it can easily be made web accessible in the future. Once this website has been created, biologist, doctors and many others can take advantage of it. The current process used to generate sequence alignments and trees can be extremely long and tedious. One begins by obtaining sequence files from a genetic sequence database (e.g. NCBI's GenBank [14]. Then, coding region nucleotide sequence files and amino acid sequence files are created. These sequences are then formatted into another file for alignment. Once the sequences are aligned, the output is converted into a NEXUS format. This NEXUS file must be manually edited to include only the regions where significant alignment exists among the sequences. Finally, after this has been done, a phylogenetic tree program is used to create the tree. To make this process less laborious, a Java application is now available to go directly from the given sequences to the phylogenetic tree, without the need for manual tasks. The program is composed of four classes: Alignment, GroupMatch, MakeTree, and MainTree. This document explains the process of going from amino acid sequence to phylogenetic tree. ------------------------------------------------------------------------------ Beckie Chan pchan543@cs.caltech.edu Department of Computer Science California Institute of Technology