1-20 of 37 Results  for:

  • Bioinformatics and Computational Biology x
Clear all

Chapter

Cover Concepts in Bioinformatics and Genomics

Advanced Probability for Bioinformatics Applications  

This chapter starts with the subject of a continuous random variable and then moves to a discussion of the extreme value distribution and its use in analysing the significance of an alignment. It looks into the computation and interpretation of P- and E-values to evaluate sequence alignments. The chapter also discusses the main characteristic of a Markov process (probability of current state dependent only on previous state), then examines how to translate information about a Markov process into a state diagram and the associated transition matrix. It also points out the probability that a particular sequence of states resulting from a Markov process occurs. Next, the chapter analyses the stochastic processes, specifically Markov chains and hidden Markov models, as well as a mathematical derivation of the Jukes-Cantor model.

Chapter

Cover Introduction to Bioinformatics

Alignments and phylogenetic trees  

This chapter examines the concept of sequence alignment, which is the identification of residue-residue correspondences. It is the basic tool of bioinformatics. The chapter presents a comparison of pairwise sequence alignments and multiple sequence alignments. Multiple sequence alignments are much more informative than pairwise sequence alignments, in terms of revealing patterns of conservation. The chapter then looks at the process of constructing and interpreting dot plots, before considering the use of the Hamming distance and Levenshtein distance as measures of dissimilarity of character strings. It also explains the basis of scoring schemes for string alignment, including substitution matrices and gap penalties. Finally, the chapter studies the applications of multiple sequence alignments to database searching, before exploring the contents and significance of phylogenetic trees, and the methods available for deriving them.

Chapter

Cover Introduction to Bioinformatics

Artificial intelligence and machine learning  

This chapter addresses the topics of artificial intelligence and machine learning. The idea of artificial intelligence is that a computer is doing something that, at least casually, can be regarded as ‘thinking’. However, the sophistication of the program might be a translation of some aspect of the intelligence of the programmer. Machine learning goes beyond this; for machine learning, the program has to take the initiative to improve the performance of the algorithm. The chapter then looks at the difference between classification and clustering, as well as between supervised learning and unsupervised learning. The Receiver Operating Characteristic (ROC) curves reveal how the quality of a prediction depends on the choice of separation threshold. The chapter also considers the structure and activity of artificial neurons, and how they can be combined into artificial neural networks; decision trees; and support vector machines.

Chapter

Cover Concepts in Bioinformatics and Genomics

Basic Local Alignment Search Tool (Blast) and Multiple Sequence Alignment  

This chapter look at the topic of pairwise sequence comparison by describing the Basic Local Alignment Search Tool (BLAST). It discusses multiple sequence alignment programs with an emphasis on the first popular program of this class–Cluster Alignment Weighted (ClustalW). In the 1990s, advances in DNA sequencing technology led to a significant expansion of the number of sequences deposited in databases. The chapter examines the logic of how BLAST aligns a query sequence with a subject sequence from a database. It emphasizes the significant advantages of BLAST over other sequence alignment programs: its increased speed and its use of a statistical measurement, the E-value, to assess the significance of the similarity score. The chapter then shifts to look into major BLAST programs available through NCBI. It then analyses the logic of how ClustalW aligns multiple sequences, then considers other multiple sequence alignment programs.

Chapter

Cover Concepts in Bioinformatics and Genomics

Basic Probability  

This chapter concentrates on probability, a requisite component of bioinformatics research, with an emphasis on counting methods, dependence, Bayesian inference, and random variables. It develops the basic tools of probability needed to understand bioinformatics applications such as the interpretation of the E-value in the output of a BLAST search, hidden Markov models in algorithms for multiple sequence alignment, and the derivation of the Jukes-Cantor model of evolutionary distance. The chapter opens with a discussion on the basic operations on sets, such as union, intersection, and complement. It discusses the meanings of dependence and independence, then examines how to compute conditional probabilities from the definition or by using Bayes' law. Next, the chapter investigates how to use Bayes' law to compute posterior probabilities from prior probabilities and observed data. It then analyses the probability mass, and the probability density functions in relation to the distribution of a random variable.

Chapter

Cover Introduction to Protein Science

Bioinformatics of protein sequence and structure  

This chapter examines bioinformatics, a new field which is a hybrid of biology and computer science. Biology, especially high-throughput data streams such as DNA sequencing and structural genomics projects, provides its input. Computer science permits the effective use of information-processing equipment to support research based on these data. Databases organize knowledge and make it accessible. Algorithms then allow analysis of the information, thereby producing additional data streams to be incorporated into the repository. The chapter then looks at the concept of sequence alignment, the methods for aligning sequences, and facilities for sequence database searching based on them, notably BLAST (Basic Local Alignment Search Tool) and PSI-BLAST. It also considers the characteristics of pairwise sequence alignment, multiple sequence alignment, and structural alignment.

Book

Cover Concepts in Bioinformatics and Genomics

Jamil Momand, Alison McCurdy, Silvia Heubach, and Nancy Warter-Perez

Concepts in Bioinformatics and Genomics starts with a review of molecular biology and looks at its relevance to the topic. It then goes on to consider information organization and sequence databases, molecular evolution, substitution matrices, and pairwise sequence alignment. Other topics covered include the basic local alignment sequence tool and multiple sequence alignment, protein structure prediction, phylogenetics, genomics, transcript and protein expression analysis, and basic probability. There are also chapters on advanced probability for bioinformatics applications, programming basics and applications to bioinformatics, and how to develop a basic bioinformatics tool.

Chapter

Cover Introduction to Bioinformatics

Control of organization and organization of control  

This chapter studies more general control mechanisms, including gene expression. Life is a dynamic process, requiring robust control mechanisms. The chapter then looks at the goals of transcriptomics and proteomics—the measurement of amounts and distributions of RNAs and proteins within a cell or organism. Two major techniques for exploring the transcriptome are microarrays and RNA sequencing (RNAseq). DNA microarrays, or DNA chips, are devices for checking a sample simultaneously for the presence of many sequences. The chapter also considers the importance of protein–protein interaction networks, the methods available for generating them, and some of the databases that collect and present them. It details the structures and some of the building blocks of regulatory networks.

Chapter

Cover Concepts in Bioinformatics and Genomics

Developing a Bioinformatics Tool  

This chapter guides through the development of a pairwise sequence alignment tool that implements global, ends-free global, and local alignment. It focuses on the algorithms needed to implement the tool. However, because Python is a commonly used programming language for bioinformatics, the chapter encourages you to apply the Python concepts to implement your pairwise sequence alignment tool. The chapter begins by analysing the output report of an existing local sequence alignment tool, EMBOSS Water, to familiarize ourselves with its inputs, outputs, and functionality. It then looks at an overview of simple pairwise alignment (SPA), and introduces the concept of algorithms–taking a look at different ways to express algorithms. Towards the end, the chapter explains the longest common subsequence (LCS) algorithm and how it can be extended to implement local and global pairwise alignment. It then assesses the complexity of algorithms–that is, how much memory and time they require.

Chapter

Cover Introduction to Protein Science

Evolution of protein structure and function  

This chapter looks at the evolution of protein structure and function. Protein evolution is characterized by the exploration by a set of genomes of the space of amino acid sequences in search of selectively advantageous variants. Evolution acts at the level of protein functions, in a feedback cycle that selects gene sequences. Two or more proteins are homologous if they are descended from a common ancestor. The chapter then distinguishes between two types of homologues: orthologues and paralogues. Orthologues are homologous proteins in different species, descended from a single ancestral protein, while paralogues are homologues in the same species arising from gene duplication, and their descendants. The chapter also looks at evolutionary variations in protein families, including globins, NAD-binding domains, serine proteases, and opsins. Finally, it explores how proteins can develop new functions during evolution: the mechanisms, pathways, and limitations.

Chapter

Cover Introduction to Bioinformatics

From genetics to genomes  

This chapter provides a background on genetics and genomes, and the development of DNA sequencing by biochemist Frederick Sanger. It begins by outlining some of the important landmarks in the history of genomics, from the classical work of Charles Darwin and Gregor Mendel, through Thomas Hunt Morgan and Alfred Sturtevant, to the discovery of the double-helical structure of DNA and the development of the human genome project. The chapter then distinguishes different types of maps: genetic linkage maps, chromosome banding patterns, restriction maps, and DNA sequences. It also looks at the basic computational problems of pattern matching. DNA sequence data can be used for personal identification, including the verification of family relationships, and crime investigation. Finally, the chapter considers the ethical, legal, and social problems associated with DNA sequence databases.

Chapter

Cover Concepts in Bioinformatics and Genomics

Genomics  

This chapter displays genomics analysis with an emphasis on next-generation sequencing (NGS), and annotation of bacterial genomes. It uncovers the underlying principles of dideoxy sequencing and selected next-generation sequencing technologies. The chapter also examines the theory of polymerase chain reaction (PCR) and the general categories of DNA that constitute the human genome (the genomic landscape). The chapter then shifts to investigate how DNA fingerprinting is performed and the degree of synteny shared between genomes. It reviews the whole genome shotgun strategy approach to genome sequencing, then examines how gene prediction software programs predict and annotate genes in genomes. Finally, the chapter describes the haplotype, the HapMap Project, and the significance of the HapMap project. It then looks into the benefits and concerns of having knowledge of personal genome data.

Chapter

Cover Concepts in Bioinformatics and Genomics

Information Organisation and Sequence Databases  

This chapter talks about the major public databases that are repositories for sequence data. It describes in detail one of these databases, GenBank, the grandparent of all nucleic acid sequence databases. GenBank is the database that stores the vast amounts of DNA and RNA-sequence data crucial for bioinformatics research. The chapter also looks into Reference Sequence (RefSeq), a database that contains natural (wild-type) sequences, and Protein Knowledge Database (UniProtKB), a well-annotated database focused on the protein products of the genes found in GenBank. Ultimately, the chapter explores the basic organization of a gene or, in other words, how the sequence segments are arranged. The chapter also discusses the method of adding more variability to messenger RNAs and their protein products known as alternative splicing. It then reviews the nomenclature associated with gene organization and alternative splicing to deepen our understanding of bioinformatics and genomics and their applications.

Chapter

Cover Concepts in Bioinformatics and Genomics

Introduction  

This introductory chapter provides a definition of bioinformatics and a brief exploration of the origins of the term. The chapter then provides examples of the application of bioinformatics in society, academia, and the workplace. Finally, the chapter raises the importance of bioinformatics and its application to genomics.

Chapter

Cover Introduction to Protein Science

Introduction  

This introductory chapter provides an overview of proteins, which are a family of biological macromolecules that provide a variety of three-dimensional structures exquisitely shaped for their many different individual functions. They include structural proteins, enzymes, antibodies, regulatory proteins, sensors, transporters and pumps, and transducers. The common features of the chemical structures of proteins make possible a common synthetic mechanism: ribosomes assemble the great variety of proteins under the direction of different messenger RNA (mRNA) sequences. Both the DNA sequences of genes and the amino acid sequences of proteins are one-dimensional. The chapter asks: how then do proteins achieve their three-dimensional biologically active states? The three-dimensional structures of proteins are inherent in their amino acid sequences. It is the combination of a common synthetic machinery — the gene sequence dictating the amino acid sequence — with spontaneous folding to the native three-dimensional structure, that underlies the mechanism of molecular evolution.

Chapter

Cover Introduction to Bioinformatics

Introduction  

This introductory chapter presents the major components of bioinformatics: DNA and protein sequences and structures, genomes and proteomes, databases and information retrieval, the World Wide Web, and computer programming. Before the advent of modern technologies and the internet, biological observations were fundamentally anecdotal and fragmentary. In recent generations, the data have become not only much more quantitative, but also more precise and comprehensive. Biological databases have recently supplemented the archives of nucleic acid sequences, amino acid sequences of proteins, and structures of proteins and protein–nucleic acid complexes. Given the data streams, analysis has become ever more challenging. Not only has bioinformatics developed powerful tools, but its methods are becoming more deeply integrated into the biomedical enterprise.

Book

Cover Introduction to Bioinformatics

Arthur M. Lesk

Introduction to Bioinformatics starts off by introducing the topic. It then looks at genetics and genomes. It moves on to consider the panorama of life. The text also considers alignments and phylogenetic trees. There is a chapter on structural bioinformatics and drug discovery. The text also examines scientific publications and archives, particularly media, content, access, and presentation. Artificial intelligence is considered as well, in addition to machine learning. There is an introduction to systems biology that follows towards the end. The book's final chapters look at metabolic pathways and control of organization.

Book

Cover Introduction to Protein Science
Introduction to Protein Science firstly outlines the topics ahead. The first main topic is protein structure and protein structure determination. The next subject the text considers is bioinformatics of protein sequence and structure. Proteins as catalysts is examined after that. This discussion particularly looks at enzyme structure, kinetics, and mechanisms. The text then moves on to describe proteins with partners, the evolution of protein structure and function, and protein folding and design. Finally, it looks at proteomics and systems biology.

Chapter

Cover Introduction to Bioinformatics

Introduction to systems biology  

This chapter describes systems biology. The key idea of systems biology is integration. Indeed, an initial goal of systems biology is to identify the active networks in cells, organisms, and ecosystems, and to understand the properties of their components and the interactions among them. The integrated activities of components of cells depend on networks of interactions. The chapter then looks at the general features of graphs, and the representation of networks by graphs. It considers which kinds of biological interaction patterns can profitably be thought of as networks. The chapter also identifies the distinction between static and dynamic properties of networks, before assessing the concepts of entropy and complexity and how to apply them to biological data. Finally, it outlines the properties of the Burrows-Wheeler transform and its applications.

Chapter

Cover Introduction to Bioinformatics

Metabolic pathways  

This chapter explores metabolic pathways, which are the road maps defining the possible transformations of metabolites. They form a network, representable as a graph, usually with the metabolites as nodes, and reactions connecting them as edges. The enzyme that catalyses each reaction labels the edge. The chapter then looks at the defining principles of the Enzyme Commission and the Gene Ontology ConsortiumTM classifications of the functions of biological molecules. It considers the importance of accurate annotation of enzyme function in databases, before outlining the databases of metabolic networks. The chapter also discusses the physicochemical basis of enzymatic catalysis, and the quantities needed to characterize their kinetics. Finally, it examines how the algorithms for comparison of nucleic acid and amino acid sequences can be generalized to compare and align metabolic pathways.