ABSTRACT of educational websites. Research was never

 

ABSTRACT

In this issue,
the websites tools are a continuation of a series of educational websites.
Research was never being so fast in the past. Bioinformatics is one of such
newly emerging fields, which makes use of computer, these include sites that
are valuable resources for many research needs in genomics and proteomics.
Bioinformatics has become a laboratory tool to map sequences to databases,
develop models of molecular interactions, evaluate structural compatibilities,
describe differences between normal and disease-associated DNA, identify
conserved motifs within proteins, and chart extensive signaling networks, all
in silico. It is getting popular due to its ability to analyze huge amount of
biological data quickly and cost-effectively. Bioinformatics can assist a
biologist to extract valuable information from biological data providing
various web- and/or computer-based tools, mostly are freely available. The
present review gives a comprehensive summary of some of these tools available
to a life scientist to analyze biological data. This review will focus on those
areas of biological research, which can be greatly assisted by such tools like
analyzing a DNA and protein sequence to identify various features, prediction
of 3D structure of protein molecules, to study molecular interactions, and to
perform simulations to mimic a biological phenomenon to extract useful
information from the biological data. The functioning of the tools like ENTREZ,
iTasser, GENSCAN, ORF finder; Modeller, etc. will discussed in the following review.

We Will Write a Custom Essay Specifically
For You For Only $13.90/page!


order now

Introduction

Bioinformatics is an interdisciplinary
science, emerged by the combination of various other disciplines like biology,
mathematics, computer science, and statistics, to develop methods for storage,
retrieval and analyses of biological data 1. Paulien Hogeweg, a Dutch
system-biologist, was the first person who used the term “Bioinformatics” in
1970, referring to the use of information technology for studying biological
systems 2,3. The launch of user-friendly interactive automated modeling along
with the creation of SWISS-MODEL server around 18 years ago 4 resulted in
massive growth of this discipline. Since then, it has become an essential part
of biological sciences to process biological data at a much faster rate with
the databases and informatics working at the backend.

Computational tools are routinely
used for characterization of genes, determining structural and physiochemical
properties of proteins, phylogenetic analyses, and performing simulations to
study how biomolecule interact in a living cell.  Earlier, a number of reviews on various
specialized aspects of bioinformatics have been written 5, 6. However, none
of these articles makes it suitable for a scientist who does not belong to
computational biology. Here, we take the opportunity to introduce various tools
of bioinformatics to a non-specialist reader to help extract useful information
regarding his/her project.

  

i.                  
iTassar

Iterative Threading Assembly Refinements
a bioinformatics method for predicting three-dimensional structure model
of protein molecules from amino acid sequences. 6

Specificity

It detects structure templates
from the Protein Data Bank by a technique called
fold recognition or threading. The full-length structure models are
constructed by reassembling structural fragments from threading templates using
Replica Exchange Monte Carlo Simulation. I-TASSER is one of the most
successful protein structure prediction methods
in the community-wide CASP experiments.
I-TASSER has been extended for structure-based protein function predictions,
which provides annotations on ligand binding site, gene ontology and enzyme
commission by structurally matching structural
models of the target protein to the known proteins in protein function
databases 9,10. It has an on-line server built in the Yang Zhang Lab at the University of Michigan, Ann Arbor, allowing users to submit sequences and obtain structure
and function predictions. A standalone package of I-TASSER
is available for download at the I-TASSER website.

Functioning

The I-TASSER server allows users
to generate automatically protein structure and function predictions.

·        
Input

·        
Mandatory:

·        
Amino
acid sequence with length from 10 to 1,500 residues

·        
Optional
(user can provide optionally restraints and templates to assist I-TASSER
modeling):

·        
Contact
restraints

·        
Distance
maps

·        
Inclusion
of special templates

·        
Exclusion
of special templates

·        
Secondary
structures

·        
Output

·        
Structure
prediction:

·        
Secondary
structure prediction

·        
Solvent
accessibility prediction

·        
Top
10 threading alignment from LOMETS

·        
Top
5 full-length atomic models (ranked based on cluster density)

·        
Top
10 proteins in PDB which are structurally closest to the predicted models

·        
Estimated
accuracy of the predicted models (including a confidence score of all models,
predicted TM-score and RMSD for the first model, and per-residue error of all
models)

·        
B-factor
estimation

·        
Function prediction:

·        
Enzyme
Classification (EC) and the confidence score

·        
Gene
Ontology (GO) terms and the confidence score

·        
Ligand-binding
sites and the confidence score

·        
An
image of the predicted ligand-binding sites

 

ii
     GenBank

The GenBank sequence
database is an open
access, annotated collection of all publicly
available nucleotide sequences and their protein translations. This database is produced and maintained by
the National Center for
Biotechnology Information as part
of the International
Nucleotide Sequence Database Collaboration. The National Center for
Biotechnology Information is a part
of the National Institutes
of Health in the United
States.

Specificity

GenBank and its collaborators receive sequences produced in
laboratories throughout the world from more than 100,000 distinct organisms. The database started in 1982 by Walter Goad and Los
Alamos National Laboratory. GenBank has become an
important database for research in biological fields and has grown in recent
years at an exponential rate by
doubling roughly every 18 months. 7

Release 194, produced in February 2013,
contained over 150 billion nucleotide bases in more than 162 million sequences.
GenBank is built by direct submissions from individual laboratories, as well as
from bulk submissions from large-scale sequencing centers.

Functioning

The GenBank database is designed to provide and
encourage access within the scientific community to the most up-to-date and
comprehensive DNA sequence information. Therefore, NCBI places no restrictions
on the use or distribution of the GenBank data. However, some submitters may
claim patent, copyright, or other intellectual property rights in all or a
portion of the data they have submitted. NCBI is not able to assess the
validity of such claims, and therefore cannot provide comment or unrestricted
permission concerning the use, copying, or distribution of the information
contained in GenBank

Uniprot

UniProt is a freely accessible
database of protien sequence and functional information, many entries
being derived from genome sequencing projects. It contains a large amount of
information about the biological function of proteins derived from the research
literature.

Specificity

it is the
protein sequence database. Genome sequencing is the project, from which many
entries of protein sequences are derived. From the side of research literature,
the large amount of biological information of function of proteins are derived.
8

Functioning

The UniProt Reference Clusters consist of three databases of
clustered sets of protein sequences from UniProtKB and selected UniParc
records. 9 The UniRef100 database combines identical sequences and sequence
fragments into a single UniRef entry. The sequence of a representative protein,
the accession numbers of all the merged entries and links to the corresponding
UniProtKB and UniParc records are displayed. UniRef100 sequences are clustered
using the CD-HIT algorithm to build UniRef90 and UniRef50. Each cluster is composed of sequences
that have at least 90% or 50% sequence identity, respectively, to the longest
sequence. Clustering sequences significantly reduces database size, enabling
faster sequence searches.

UniRef is available from the UniProt FTP site.
10

UniProtKB/Swiss-Port is a
manually annotated, non-redundant protein sequence database. It combines
information extracted from scientific literature and biocurator-evaluated computational analysis. The aim of
UniProtKB/Swiss-Port is to provide all known relevant information about a
protein. Annotation is regularly reviewed to keep up with current scientific
findings. The manual annotation of an entry involves detailed analysis of the
protein sequence and of the scientific literature. 11

ExPASy

ExPASy is a bioinformatics resource
portal operated by the Swiss
Institute of Bioinformatics (SIB) and
the SIB Web Team. 12

Specificity

 A single web
portal provides a common entry point to a wide range of resources developed and
operated by many different SIB groups and external institutions. The portal
features a search function across selected resources. Internally, the
availability and usage of resources are monitored. The portal is aimed for both
expert users and for people who are not familiar with a specific domain in life
sciences: , the  new web interface
provides visual guidance for newcomers to ExPASy.

Functioning

 It is an extensible and integrative portal
accessing many scientific resources, databases and software tools in different
areas of life sciences. Scientists can access a wide range of resources in many
different domains, such as proteomics, genomics, phylogeny/evolution, systems
biology, population
genetics, and transcriptomics. The individual resource is hosted in a decentralized
way by different groups of the SIB Swiss Institute of Bioinformatics and
partner institutions.

 

DDBJ

The DNA Data
Bank of Japan (DDBJ) is a biological
database that collects DNA sequences. It is located at the National
Institute of Genetics (NIG) in the Shizuoka prefecture of Japan.
It is also a member of the International Nucleotide Sequence Database
Collaboration or INSDC. 13

Specificity

It
exchanges its data with European Molecular Biology Laboratory at the European
Bioinformatics Institute and with GenBank at the National
Center for Biotechnology Information daily. Thus, these three databanks
contain the same data at any given time.

Functioning

DDBJ
began data bank activities in 1986 at NIG and remains the only nucleotide
sequence data bank in Asia. Although DDBJ mainly receives its data from
Japanese researchers, it can accept data from contributors from any other
country. DDBJ is primarily funded by the Japanese Ministry of Education,
Culture, Sports, Science and Technology (MEXT). DDBJ has an international
advisory committee which consists of nine members, 3 members each from Europe,
US, and Japan. This committee advises DDBJ about its maintenance, management
and plans once a year. Apart from this DDBJ also has an international
collaborative committee which advises on various technical issues related to
international collaboration and consists of working-level participants

FASTA

FASTA is a DNA and protein sequence
alignment software package first described (as FASTP) by David
J. Lipman and William R Pearson in 1985. Its
legacy is the FASTA format which is now ubiquitous in bioinformatics.
14

FASTA
is pronounced “fast A”, and stands for “FAST-All”, because
it works with any alphabet, an extension of “FAST-P” (protein) and
“FAST-N” (nucleotide) alignment.

Specificity

The
current FASTA package contains programs for protein: protein, DNA: DNA, protein:
translated DNA (with frameshifts), and ordered or unordered peptide searches.
Recent versions of the FASTA package include special translated search
algorithms that correctly handle frameshift errors (which
six-frame-translated searches do not handle very well) when comparing
nucleotide to protein sequence data. 15

In addition to rapid heuristic search
methods, the FASTA package provides SSEARCH, an implementation of the optimal Smith-Waterman
algorithm.

Functioning

A
major focus of the package is the calculation of accurate similarity
statistics, so that biologists can judge whether an alignment is likely to have
occurred by chance, or whether it can be used to infer homology. The FASTA package is available from fasta.bioch.virginia.edu.

The web-interface to
submit sequences for running a search of the European Bioinformatics
Institute (EBI)’s online databases is also available using the FASTA
programs.

The FASTA
file format used as input for this software is now largely used by other
sequence database search tools (such as BLAST) and sequence alignment
programs (Clustal, T-Coffee, etc.).

BLAST

In bioinformatics, BLAST for Basic Local Alignment Search Tool
is an algorithm for comparing primary biological sequence
information, such as the amino-acid sequences of proteins or
the nucleotides of DNA sequences.

Specificity

Different
types of BLASTs are available according to the query sequences. For example,
following the discovery of a previously unknown gene in the mouse, a
scientist will typically perform a BLAST search of the human genome to
see if humans carry a similar gene; BLAST will identify sequences in the human
genome that resemble the mouse gene based on similarity of sequence. The BLAST
algorithm and program were designed by Stephen Altschul, Warren Gish, Webb
Miller, Eugene Myers, and David J. Lipman at the National
Institutes of Health and was published in the Journal of Molecular
Biology in 1990 and cited over 50,000 times. 16

Functioning

A BLAST search
enables a researcher to compare a query sequence with a library or database of
sequences, and identify library sequences that resemble the query sequence
above a certain threshold. 17

Entrez

The Entrez Global Query Cross-Database Search
System is a federated search engine, or web
portal that allows users to search many discrete health sciences databases at the National
Center for Biotechnology Information (NCBI)
website. 18

Specificity

 The NCBI is a part of the National
Library of Medicine (NLM), which is itself a department of the National Institutes of Health (NIH), 19 which in turn is a part of the United States Department of
Health and Human Services. The name
“Entrez” (a greeting meaning “Come in!” in French) was
chosen to reflect the spirit of welcoming the public to search the content
available from the NLM. 20

Functioning

Entrez Global Query is an integrated search
and retrieval system that provides access to all databases simultaneously with
a single query string and user interface. Entrez can efficiently retrieve
related sequences, structures, and references. The Entrez system can provide views of gene and protein sequences and chromosome maps. Some textbooks are also available online through
the Entrez system 21

 

Conclusion and Prospects

Bioinformatics is a comparatively young
discipline and has progressed very fast in the last few years. It has made it
possible to test our hypotheses virtually and therefore allows to take a better
and an informed decision before launching costly experimentations. Although,
more and more tools for analyzing genomes, proteomes, predicting structures,
rational drug designing, and molecular simulations are being developed; none of
them is ‘perfect’. Therefore, the hunt for finding a better package for solving
the given problems will continue. One thing is clear that the future research
will be guided largely by the availability of databases, which could be either
generic or specific. It can also be safely assumed, based on the developments
in the field of bioinformatics, that the bioinformatics tools and software
packages would be able to give results that are more accurate and thus more
reliable interpretations. Prospects in the field of bioinformatics include its
future contribution to functional understanding of the human genome, leading to
enhanced discovery of drug targets and individualized therapy. Thus,
bioinformatics and other scientific disciplines have to move hand in hand to
flourish for the welfare of humanity.