Modern Genome Annotation: The Biosapiens Network

Front Cover
D. Frishman, Alfonso Valencia
Springer Science & Business Media, Oct 2, 2009 - Science - 490 pages
0 Reviews

An accurate description of current scientific developments in the field of bioinformatics and computational implementation is presented by research of the BioSapiens Network of Excellence. Bioinformatics is essential for annotating the structure and function of genes, proteins and the analysis of complete genomes and to molecular biology and biochemistry.

Included is an overview of bioinformatics, the full spectrum of genome annotation approaches including; genome analysis and gene prediction, gene regulation analysis and expression, genome variation and QTL analysis, large scale protein annotation of function and structure, annotation and prediction of protein interactions, and the organization and annotation of molecular networks and biochemical pathways. Also covered is a technical framework to organize and represent genome data using the DAS technology and work in the annotation of two large genomic sets: HIV/HCV viral genomes and splicing alternatives potentially encoded in 1% of the human genome.

 

What people are saying - Write a review

We haven't found any reviews in the usual places.

Contents

43 FLORA method
250
5 An integrated pipeline for functional prediction
252
52 The ProFunc server
253
521 Sequencebased searches
254
522 Structurebased searches
255
53 Case studies
257
function unclear
259
Harvesting the information from a family of proteins
263

322 Generalized hidden Markov models
21
324 PhyloHMMs or evolutionary HMMs
23
33 Discriminative learning
24
332 SemiMarkov conditional random fields
25
4 Training
26
5 Evaluation of gene prediction methods
27
52 Systematic evaluation
28
53 The community experiments
29
532 EGASP
30
6 Discussion
32
63 Outstanding challenges to gene annotation
33
64 What is the right gene prediction strategy?
34
Quality control of gene predictions
41
2 Quality control of gene predictions
42
2111 Conflict between the presence of extracellular PfamA domains in a protein and the absence of appropriate sequence signals
43
2114 Domain size deviation
44
321 Analysis of the TrEMBL section of UniProtKB
46
322 Analysis of sequences predicted by the EnsEMBL and GNOMON gene prediction pipelines
47
4 Alternative interpretations of the results of MisPred analyses
50
43 MisPred detects errors of biological processes
51
5 Conclusions
52
Evaluating the prediction of cisacting regulatory elements in genome sequences
55
2 Transcription factor binding sites and motifs
58
3 Scanning a sequence with a positionspecific scoring matrix
59
31 Background probability
61
32 Probability of a sequence segment given the motif
62
33 Scanning profiles
63
4 Evaluating pattern matching results
64
42 Accuracy profiles
66
43 Avoiding circularity in the evaluation
67
45 Difficulties for the evaluation of pattern matching
68
5 Discovering motifs in promoter sequences
69
51 Example of pattern discovery result
70
52 Evaluation statistics
72
53 Correctness of predicted motifs for a collection of annotated regulons
73
54 Distributions of motif scores in positive and negative testing sets
77
55 The Receiver Operating Characteristics ROC curve
81
56 Using ROC curves to find optimal parameters
83
7 Good practices for evaluating predictive tools
84
71 Use comprehensive data sets
85
8 What has not been covered in this chapter
86
9 Materials
87
A biophysical approach to largescale proteinDNA binding data
91
1 Binding site predictions
92
2 Affinity model E affinity model TRAP
95
3 Affinity statistics XE affinity statistics
99
4 Applications
101
5 Summary
102
From gene expression profiling to gene regulation
105
2 Generating sets of coexpressed genes
106
3 Finding putative regulatory regions using comparative genomics
109
4 Detecting common transcription factors for coexpressed gene sets
111
5 Combining transcription factor information
114
6 De novo prediction of transcription factor binding motifs
115
Annotation genetics and transcriptomics
123
2 Genetics and gene function
125
3 Use of animal models
128
gene expression microarrays
130
5 Gene annotation
132
Resources for functional annotation
139
2 Resources for functional annotation protein sequence databases
140
3 UniProt The Universal Protein Resource
141
4 The UniProt Knowledgebase UniProtKB
142
411 Sequence curation in UniProtKBSwissProt
145
412 Computational sequence annotation in UniProtKBSwissProt
147
414 Annotation of protein structure in UniProtKBSwissProt
149
416 Annotation of protein interactions and pathways in UniProtKBSwissProt
150
42 UniProtKBTrEMBL
151
5 Protein family classification for functional annotation
152
512 Profiles and the PRINTS database
153
514 Structurebased protein signature databases
154
53 Using InterProScan for sequence classification and functional annotation
155
532 Interpreting InterProScan results
156
533 Largescale automatic annotation
159
6 From genes and proteins to genomes and proteomes
160
7 Summary
161
Annotating bacterial genomes
165
2 Global sequence properties
170
3 Identifying genomic objects
172
4 Functional annotation
174
5 A recursive view of genome annotation
176
parallel analysis and comparison of multiple bacterial genomes
178
new developments for the construction of genome databases metagenome analyses and userfriendly platforms
180
databases and platforms for annotating bacterial genomes
182
Data mining in genome annotation
191
2 An overview of large biological databases
193
the SwissProt example
196
the PEDANT example
198
3 Data mining in genome annotation
200
32 Supervised learning
201
34 Clustering
202
35 Association rule mining
203
4 Applying association rule mining to the SwissProt database
205
5 Applying association rule mining to the PEDANT database
207
6 Conclusion
210
Modern genome annotation the BioSapiens network
213
12 Homologs orthologs paralogs
216
13 The HAMAP resource for the annotation of prokaryotic protein sequences and their orthologues
219
14 CATH Gene3D GeMMA
222
diverse tools for deducing function from sequence
228
16 General approaches for inheriting functions between homologous proteins
230
17 Nonhomologous methods for predicting protein function from sequence
234
Structure to function
239
2 FireDB and firestar the prediction of functionally important residues
241
22 FireDB
242
23 Firestar
244
3 Modelling local function conservation in sequence and structure space for predicting molecular function
246
33 Application
247
4 Structural templates for functional characterization
249
11 Information transfer
264
2 Molecular classspecific information systems
265
21 Gproteincoupled receptors
266
3 Extracting information from sequences
267
31 Correlated mutation analysis
268
4 Correlation studies on GPCRs
269
41 Evolutionary trace method
271
42 Entropyvariability analysis
273
43 Sequence harmony
274
Structure prediction of globular proteins
283
2 The evolution of protein structures and its implications for protein structure prediction
286
3 Template based modelling
287
31 Homologybased selection of the template
288
33 Using sequence based tools for selecting the template
289
34 Completing and refining the model
291
35 Current state of the art in template based methods
292
4 Templatefree protein structure prediction
293
41 Energy functions for protein structure prediction
296
42 Lattice methods
297
43 Fragment assembly methods
298
44 Practical considerations
299
5 Automated structure prediction
300
51 Practical lessons from benchmarking experiments
302
6 Conclusions and future outlook
304
The state of the art of membrane protein structure prediction from sequence to 3D structure
309
2 Many functions
311
4 Predicting the topology of membrane proteins
312
5 How many methods to predict membrane protein topology?
314
6 Benchmarking the predictors of transmembrane topology
316
62 Topological experimental data
317
63 Validation towards experimental data
318
7 How many membrane proteins in the Human genome?
319
PhDSNP at work
320
3D MODELLING of membrane proteins
322
10 What can currently be done in practice?
323
11 Can we improve?
324
Computational analysis of metabolic networks
329
2 Computational ressources on metabolism
331
212 BioCyc
332
214 Querying and exporting data
333
221 From annotated genomes to metabolic networks
334
3 Basic notions of graph theory
335
32 Node degree
336
41 Node degree distribution
337
411 Robustness to random deletions and targeted attacks
339
412 Generative models for powerlaw networks
340
42 Paths and distances in metabolic networks
341
5 Assessing reconstructed metabolic networks against physiological data
342
51 Constraintsbased models of metabolism
343
512 Modelling the growth medium
344
513 Biomass function
345
522 Predicting gene essentiality
346
53 Assessing and correcting models using experimental data
347
55 Working with constraintsbased models
348
Proteinprotein interactions analysis and prediction
353
2 Experimental methods
354
3 Protein interaction databases
356
5 The IntAct molecular interaction database
360
6 Interaction networks
362
7 Visualization software for molecular networks
365
8 Estimates of the number of protein interactions
371
9 Multiprotein complexes
372
10 Network modules
373
11 Diseases and protein interaction networks
376
12 Sequencebased prediction of protein interactions
380
121 Phylogenetic profiling
381
122 Similarity of phylogenetic trees
383
123 Gene neighbourhood conservation
384
124 Gene fusion
385
14 Domaindomain interactions
389
15 Biomolecular docking
395
151 Proteinligand docking
396
152 Proteinprotein docking
398
Infrastructure for distributed protein annotation
413
2 The Distributed Annotation System DAS
415
31 DASTY2 a protein sequenceoriented DAS client
418
33 Ensembl
420
34 DAS servers
422
5 Conclusion
425
Viral bioinformatics
429
2 Viral evolution in the human population
430
22 Vaccine strain selection for endemic influenza
431
23 Pandemic influenza
433
24 Conclusion
434
32 Epitopes
436
33 Prediction of epitopes
437
34 Epitope prediction in viral pathogens in a vaccine perspective
441
4 Viral evolution in the human host
442
42 Replication cycle of HIV
443
43 Targets for antiviral drug therapy
444
45 Data sets for learning viral resistance
445
46 Computational procedures for predicting resistance
446
47 Clinical impact of bioinformatical resistance testing
449
48 Bioinformatical support for applying coreceptor inhibitors
450
Alternative splicing in the ENCODE protein complement
453
2 Prediction of variant location
455
3 Prediction of variant function analysis of the role of alternative splicing in changing function by modulation of functional residues
458
321 Tafazzin
459
322 Phosphoribosylglycinamide formyltransferase GARSAIRSGART
461
33 Analysis across the ENCODE dataset
462
4 Prediction of variant structure
463
5 Summary of effects of alternative splicing
467
6 Prediction of principal isoforms
472
61 A series of automatic methods for predicting the principal isoform
473
611 Methods
474
7 The ENCODE pipeline an automated workflow for analysis of human splice isoforms
477
IFN alphabeta receptor protein
479
73 Future perspectives
480
Copyright

Other editions - View all

Common terms and phrases

Bibliographic information