2012年11月27日 星期二

Domain Motif

1. Motif 是多個二級結構形成重複性的結構單位。

是多個二級結構所呈現規律性的結構單位,
此單位可能重複的出現於同一蛋白質中,
或見於多種不同的蛋白質中者,
便可稱為一種特定的 motif。

在motif的層次上,主要是強調結構的概念而不是功能,
因此多種不同的蛋白質,也可能具有相同名字的motif。

課本 (Campbell 5th) 將motif視為重複性超二級結構
(repetitive supersecondary structure)。

註:supersecondary structure  Link
超二級結構是介於蛋白質二級結構和三級結構之間的空間結構,
指相鄰的二級結構單元組合在一起,彼此相互作用,排列形成規則的、
在空間結構上能夠辨認的二級結構組合體,並充當三級結構的構件
(block building),其基本形式有αα、βαβ和βββ等。 多數情況下只
有非極性殘基側鏈參與這些相互作用,而親水側鏈多在分子的外表面。




2. Domain 蛋白質中『局部的』立體結構所呈現獨立的功能單位。
例如:酵素分子中可能具有
a. catalytic domain 主要表現活性的立體位置;
b. regulatory domain 主要表現調節的區域。

domain 是指蛋白質中『局部的』三級結構,
比較強調的是功能單位,因此多半是以『功能』做為稱呼domain的名字。

2012年11月2日 星期五

Protein aggregation

Cause  Link

Protein aggregation can occur due to a variety of causes. Individuals may have mutations that encode for proteins that are particularly sensitive to misfolding and aggregation. Alternatively, disruption of the pathways to refold proteins (chaperones) or to degrade misfolded proteins (the ubiquitin-proteasome pathway) may lead to protein aggregation. As many of the diseases associated with protein aggregation increase in frequency with age, it seems that cells lose the ability to clear misfolded proteins and aggregates over time. Several new studies suggests that protein aggregation is a second line of the cellular reaction to an imbalanced protein homeostasis rather than a harmful, random process.[7]. A groundbreaking study[8] showed that sequestration of misfolded, aggregation-prone proteins into inclusion sites is an active organized cellular process, that depends on quality control components, such as HSPs and co-chaperones. Moreover, it was shown that eukaryotic cells have the ability to sort misfolded proteins in to two quality control compartments: 1. The JUNQ (JUxta Nuclear Quality control compartment). 2. The IPOD (Insoluble Protein Deposit). The partition into two quality control compartments is due to the different handling and processing of the different kinds of misfolded aggregative proteins: The IPOD serves as a sequestration site for non-ubiquitinated terminally aggregated proteins, such as the huntingtin protein. Under stress conditions, such as heat, when the cellular quality control machinery is saturated, ubiquitinated proteins are sorted to the JUNQ compartment, where they are eventually degraded. Thus, aggregation is a regulated, controlled process.



Exposed hydrophobicity is a key determinant of nuclear quality control degradation

Protein quality control (PQC) degradation protects the cell by preventing the toxic accumulation of misfolded proteins. In eukaryotes, PQC degradation is primarily achieved by ubiquitin ligases that attach ubiquitin to misfolded proteins for proteasome degradation. To function effectively, PQC ubiquitin ligases must distinguish misfolded proteins from their normal counterparts by recognizing an attribute of structural abnormality commonly shared among misfolded proteins. However, the nature of the structurally abnormal feature recognized by most PQC ubiquitin ligases is unknown. Here we demonstrate that the yeast nuclear PQC ubiquitin ligase San1 recognizes exposed hydrophobicity in its substrates. San1 recognition is triggered by exposure of as few as five contiguous hydrophobic residues, which defines the minimum window of hydrophobicity required for San1 targeting. We also find that the exposed hydrophobicity recognized by San1 can cause aggregation and cellular toxicity, underscoring the fundamental protective role for San1-mediated PQC degradation of misfolded nuclear proteins. 



Amyloidogenic Regions and Interaction Surfaces Overlap in Globular Proteins Related to Conformational Diseases

Protein aggregation underlies a wide range of human disorders. The polypeptides involved in these pathologies might be intrinsically unstructured or display a defined 3D-structure. Little is known about how globular proteins aggregate into toxic assemblies under physiological conditions, where they display an initially folded conformation. Protein aggregation is, however, always initiated by the establishment of anomalous protein-protein interactions. Therefore, in the present work, we have explored the extent to which protein interaction surfaces and aggregation-prone regions overlap in globular proteins associated with conformational diseases. Computational analysis of the native complexes formed by these proteins shows that aggregation-prone regions do frequently overlap with protein interfaces. The spatial coincidence of interaction sites and aggregating regions suggests that the formation of functional complexes and the aggregation of their individual subunits might compete in the cell. Accordingly, single mutations affecting complex interface or stability usually result in the formation of toxic aggregates. It is suggested that the stabilization of existing interfaces in multimeric proteins or the formation of new complexes in monomeric polypeptides might become effective strategies to prevent disease-linked aggregation of globular proteins.

2012年10月30日 星期二

Write 1 Bin

來自引述 WCN 的 paper - Using catalytic atom maps to predict .........

Seven bins for each data set 

2012年10月24日 星期三

2012年10月23日 星期二

aggregation region

Find the aggregation region in thermophilic protein ( PDB )

1. WCN
2. Hydrophobic
3. Conservation
4. B-factor
5. CN (cutoff 9 angstrom)
6. RSA


催化 catalytic site

結構相關因子:
pocket,  RSA, rigidity, packing density (WCN),
fixed geometry (structure conserved , distance, angle -> function ) 
 
降低活化能 free energy barrier
加速化學反應




ref.
Coupling between Catalytic Site and Collective Dynamics: A Requirement for Mechanochemical Activity of Enzymes

2012年10月2日 星期二

Zhiping Weng

Structure, function, and evolution of transient and obligate protein–protein interactions  Link

Recent analyses of high-throughput protein interaction data coupled with large-scale investigations of evolutionary properties of interaction networks have left some unanswered questions. To what extent do protein interactions act as constraints during evolution of the protein sequence? How does the type of interaction, specifically transient or obligate, play into these constraints? Are the mutations in the binding site of an interacting protein correlated with mutations in the binding site of its partner? We address these and other questions by relying on a carefully curated dataset of protein complex structures. Results point to the importance of distinguishing between transient and obligate interactions. We conclude that residues in the interfaces of obligate complexes tend to evolve at a relatively slower rate, allowing them to coevolve with their interacting partners. In contrast, the plasticity inherent in transient interactions leads to an increased rate of substitution for the interface residues and leaves little or no evidence of correlated mutations across the interface.

LabLink

2012年10月1日 星期一

ggplot2 Shapes and line types

Link

Note that the filled symbols 15-18 often render without proper anti-aliasing; they can appear jagged, pixelated, and not properly centered. Use symbols 19 and 21-25 to avoid these problems. For symbols 21-25 to appear solid, you will also need to specify a fill (bg) color that is the same as the outline color (col); otherwise they will be hollow.





Use the pch option to set the shape, and use lty and lwd to 
set the line type and width. The line type can be specified 
by name or by number. 
 
  
set.seed(331)

# Plot some points with lines
# Set up the plotting area
plot(NA, xlim=c(1,4), ylim=c(0,1))

# Plot solid circles with solid lines
points(1:4, runif(4), type="b", pch=19)
# Add open squares with dashed line, with heavier line width
points(1:4, runif(4), type="b", pch=0,  lty=2, lwd=3)

points(1:4, runif(4), type="b", pch=23,   # Diamond shape
       lty="dotted", cex=2,               # Dotted line, double-size shapes
       col="#000099", bg="#FF6666")       # blue line, red fill
 
 
 
 

ggplot2 語法 1

原著  Link

1. 下面用ggplot2包內帶的汽車測試數據(mpg)來舉個例子,用到的三個變量分別是
     發動機容量(displ)、
     高速公路上的每加侖行駛里數(hwy)、
     汽缸數目(cyl)
     
首先加載ggplot2包,然後用ggplot定義第一層即數據來源。 其中aes參數非常關鍵,它將displ映射到X軸,將hwy映射到Y軸,將cyl變為分類數據後映射為不同的顏色。 然後使用+號添加了兩個新的圖層,第二層是加上了散點,第三層是加上了loess平滑曲線。

library(ggplot2)
p <- ggplot(data=mpg, aes(x=displ, y=hwy, colour=factor(cyl)))
p + geom_point() + geom_smooth()





















上圖是對幾種不同汽缸的數據分別平滑,如果需要對整體數據進行平滑,可將colour參數設置在散點圖層內而非第一層,這樣第三層的平滑圖形就不會受到colour參數的影響。

p <- ggplot ( mpg , aes ( x=displ , y=hwy ) )
p + geom_point ( aes ( colour= factor ( cyl ) ) ) + geom_smooth ( )








2012年9月2日 星期日

MSA - multi sequence alignment

Muscle  -  Link1  Link2  Link3

Usage:

MUSCLE v3.8.31 by Robert C. Edgar

http://www.drive5.com/muscle
This software is donated to the public domain.
Please cite: Edgar, R.C. Nucleic Acids Res 32(5), 1792-97.


Basic usage

    muscle -in <inputfile> -out <outputfile>

Common options (for a complete list please see the User Guide):

    -in <inputfile>    Input file in FASTA format (default stdin)
    -out <outputfile>  Output alignment in FASTA format (default stdout)
    -diags             Find diagonals (faster for similar sequences)
    -maxiters <n>      Maximum number of iterations (integer, default 16)
    -maxhours <h>      Maximum time to iterate in hours (default no limit)
    -html              Write output in HTML format (default FASTA)
    -msf               Write output in GCG MSF format (default FASTA)
    -clw               Write output in CLUSTALW format (default FASTA)
    -clwstrict         As -clw, with 'CLUSTAL W (1.81)' header
    -log[a] <logfile>  Log to file (append if -loga, overwrite if -log)
    -quiet             Do not write progress messages to stderr
    -version           Display version information and exit

Without refinement (very fast, avg accuracy similar to T-Coffee): -maxiters 2
Fastest possible (amino acids): -maxiters 1 -diags -sv -distance1 kbit20_3
Fastest possible (nucleotides): -maxiters 1 -diags





T_Coffee  -  Link1  Link2  Link3

T-Coffee (中文直翻:茶與咖啡) (Tree-based Consistency Objective Function For alignment Evaluation) (以樹形基礎的一致性做多重序列比對) 是利用漸進似演算法來作多重序列比對的軟體。 它利用兩兩序列比對所產生的資訊來進行多重序列比對。 在最新的版本 (3D-Coffee) 中,亦可結合結構的資訊來作多重序列比對。 此外,該軟體可以評估比對結果的品質及找出在比對中所出現特殊的模板 (Mocca)。 預設比對結果輸出的格式為 aln (Clustal), 但也可產生其他 PIR, MSF, FASTA ... 等格式。 常用的輸入格式多有支援 (FASTA, PIR)。

子方法

M-Coffee

M-Coffee 是 T-Coffee 中一個特別的方法,它可以結合許多常用的多重序列比對的軟體,例如:Muscle, ClustalW, Mafft, ProbCons ... 等。 所產生出來的結果將比個別方法來的好一些,然而更重要的一點是在 M-Coffee 將指出比對結果中各方法所同意的區段出來,各方法所同意的區段通常是可靠的比對結果。

Link

Linux/Unix users
  1. download the installer package available here (or the latest beta here);
  2. open a terminal window, move to the download path, and grant execute permission to the installer typing the following command:
    chmod +x T-COFFEE_installer_Version_9.03.r1318.bin
  3. launch the installer
    ./T-COFFEE_installer_Version_9.03.r1318.bin
  4. when the installation procedure has finisched open a new terminal window (so that changes made by the installer are effective) and type the following command to verify your installation:
    t_coffee -version

Error: >> t_coffee: /lib64/libc.so.6: version `GLIBC_2.7' not found (required by t_coffee)

Code:
find / -name /lib/libc.so*
This will give some files like /lib/libc.so.4 or /lib/libc.so.5 or some number at the end.

Then, just make a soft link.
Code:
ln -s /lib/libc.so.4  /lib/libc.so.6.
This has worked for me several times, should work for you too.


Command line

 t_coffee ZF.txt -output aln, score_html -mode mcoffee

2012年8月24日 星期五

Cp vs G

deltaCp is the heat capacity change of unfolding 類似 deltaG 只是單位有點不一樣
指的是蛋白質UNFOLDING前後的熱量變化  單位是Kcal/mol/度西

delta G的單位是 KJ/mole  KJ, Kcal 是可互換的熱量單位

2012年7月24日 星期二

專欄文章 分類表

2011 - 11  句子精選
2011 - 12  蛋白質結構

2012 - 01  Performance
2012 - 02  Free energy
2012 - 03  Salt-bridge
2012 - 04
2012 - 05  Delphi
2012 - 06  Thermophilic and interface
2012 - 07  隨意  再分類

Conservation Score 寫法

Link

Conservation scores of residues

Conservation of residues is identified by comparing the sequence of PDB (14) entries with sequences deposited in Swiss-Prot (15) using a local implementation of the public server ConSurf (10) (http://consurf.tau.ac.il). The ClustalW (17) aligned homologous sequences found by PSI-BLAST (19) are used to calculate the measure of conservation by the Rate4Site algorithm (20). Residues are classified into nine categories according to their real conservation score. A score of 1 represents the most variable residues and a score of 9 represents the most conservative ones.

In our study of TIM-barrel proteins (11), we have used the following conditions to predict the SRs: (i) HP ≥ 20 kcal/mol; (ii) LRO ≥ 0.02; (iii) SC ≥ 1; and (iv) conservation score ≥ 6.

2012年7月22日 星期日

Surrounding hydrophobicity

The experimental values are given below:

Ala     Asp    Cys    Glu    Phe    Gly     His     Ile    Lys    Leu    Met    Asn       
0.87   0.66   1.52   0.67   2.87   0.10   0.87   3.15  1.64   2.17  1.67   0.09    

Pro    Gln    Arg    Ser    Thr    Val     Trp     Tyr
2.77   0.00   0.85  0.07   0.07  1.87   3.77   2.67



Form

1. Book ( Google book)

    Protein Bioinformatics From Sequence to Function



2. Papers:  Nature 1978   Manavalan