Sequence and structural features of carbohydrate binding in proteins and assessment of predictability using a neural network.

(2007) BMC Struct Biol 7

PubMed: 17201922 | PubMedCentral: PMC1780050 | DOI: 10.1186/1472-6807-7-1

PDB ID Pfam ID Ligand name Ligand formula Ligand ID 1A7C Serpins ALPHA-D-MANNOSE N-ACETYL-D-GALACTOSAMINE N-METHYLCARBONYLTHREONINE C6 H12 O6 3(C8 H15 N1 O6) 2(C6 H11 N1 O4) MAN NGA THC 1AU1 Interfero... ZINC ION FUCOSE GLUCOSE D-GALACTOSE ALPHA-D-MANNOSE ZN1 2+ 2(C6 H12 O6) 4(C6 H12 O6) C6 H12 O6 2(C1 H12 O6) ZN FUC GLC GAL MAN 1AXM Fibroblast Growth Factors SELENOMETHIONINE O2-SULFO-GLUCURONIC ACID N,O6-DISULFO-GLUCOSAMINE 6(C5 H11 N1 O2 SE1) 7(C6 H10 O10 S1) 8(C6 H13 N1 O11 S2) MSE IDS SGN 1CVN Lectin legB CALCIUM ION MANGANESE (II) ION ALPHA-D-MANNOSE 4(CA1 2+) 4(MN1 2+) 12(C1 H12 O6) CA MN MAN 1E6N CBM_5_12 Glycol_hydro_18 GLYCEROL SULFATE ION N-ACETYL-D-GLUCOSAMINE C3 H8 O3 12(O4 S1 2-) 10(C8 H15 N1 O6) GOL SO4 NAG 1FV3 Toxin_R_bind_C Toxin_R_bind_N Toxin_trans GLUCOSE PHOSPHATE ION D-GALACTOSE ETHYL-TRIMETHYL-SILANE N-ACETYL-D-GALACTOSAMINE 5-N-ACETYL-BETA-D-NEURAMINIC ACID 5-N-ACETYL-ALPHA-D-NEURAMINIC ACID 2(C6 H12 O6) O4 P1 3- 4(C6 H12 O6) 2(C5 H14 SI1) 2(C8 H15 N1 O6) 2(C11 H19 N1 O9) 4(C11 H19 N1 O9) GLC PO4 GAL CEQ NGA SLB NAN 1FWU Ricin_B_lectin FUCOSE O3-SULFONYLGALACTOSE ALPHA-METHYL-N-ACETYL-D-GLUCOSAMINE C6 H12 O5 C6 H12 O9 S1 C9 H17 N1 O6 FUC SGA MAG 1G1T EGF Lectin C FUCOSE CALCIUM ION D-GALACTOSE O-SIALIC ACID N-ACETYL-O-METHYL-D-GLUCOSAMINE C6 H12 O5 CA1 2+ C6 H12 O6 C11 H19 N1 O9 C9 H17 N1 O6 FUC CA GAL SIA 1NA 1G5N Annexin CALCIUM ION N,O6-DISULFO-GLUCOSAMINE 1,4-DIDEOXY-O2-SULFO-GLUCURONIC ACID 1,4-DIDEOXY-5-DEHYDRO-O2-SULFO-GLUCURONIC ACID 9(CA1 2+) 4(C6 H13 N1 O11 S2) 2(C6 H10 O8 S1) 2(C6 H8 O8 S1) CA SGN IDU UAP 1GMN Kringle PAN O2-SULFO-GLUCURONIC ACID N,O6-DISULFO-GLUCOSAMINE 4-(2-HYDROXYETHYL)-1-PIPERAZINE ETHANESULFONIC ACID 3(C6 H10 O10 S1) 2(C6 H13 N1 O11 S2) 2(C8 H18 N2 O4 S1) IDS SGN EPE 1GUI CBM4/9 CALCIUM ION GLYCEROL BETA-D-GLUCOSE CA1 2+ 5(C3 H8 O3) 6(C6 H12 O6) CA GOL BGC 1GWM Family 29 carbohydrate binding module GLUCOSE COBALT (II) ION 1,2-ETHANEDIOL BETA-D-GLUCOSE C6 H14 O6 CO1 2+ 8(C2 H6 O2) 5(C6 H12 O6) GLC CO EDO BGC 1IW6 Bac_rhodopsin GLUCOSE RETINAL D-GALACTOSE ALPHA-D-MANNOSE 2,3-DI-PHYTANYL-GLYCEROL 2,3-DI-O-PHYTANLY-3-SN-GLYCERO-1-PHOSPHORYL-3'-SN-GLYCEROL-1'-PHOSPHATE C6 H12 O6 C20 H28 O1 C6 H12 O6 C6 H12 O6 C43 H88 O3 4(C46 H94 O11 P2 2-) GLC RET GAL MAN L2P L3P 1J8R PapG _N GLUCOSE D-GALACTOSE N-ACETYL-D-GLUCOSAMINE SELENOMETHIONINE C6 H12 O6 2(C6 H12 O6) C8 H15 N1 O6 3(C5 H11 N1 O2 SE1) GLC GAL NAG MSE 1JPC B_lectin D-mannose binding lectin ALPHA-D-MANNOSE 8(C1 H12 O6) MAN 1LGB Lectin_legB Transferrin FUCOSE CALCIUM ION D-GALACTOSE MANGANESE (II) ION ALPHA-D-MANNOSE N-ACETYL-D-GLUCOSAMINE C6 H12 O6 CA1 2+ C6 H12 O6 MN1 2+ 3(C1 H12 O6) 4(C8 H15 N1 O6) FUC CA GAL MN MAN NAG 1M5J ALPHA-D-MANNOSE O1-PENTYL-MANNOSE 2- [N-CYCLOHEXYLAMINO]ETHANE SULFONIC ACID 8(C6 H12 O6) C11 H22 O6 C8 H17 N1 O3 S1 MAN OPM NHE 1OH4 CALCIUM ION GLYCEROL SULFATE ION BETA-D-MANNOSE ALPHA D-GALACTOSE CA1 2+ 2(C3 H8 O3) O4 S1 2- 5(C6 H12 O6) 2(C6 H12 O6) CA GOL SO4 BMA GLA 1Q8V Lectin_legB CALCIUM ION MANGANESE (II) ION ALPHA-D-MANNOSE PYROGLUTAMIC ACID 2(CA1 2+) 2(MN1 2+) 5(C6 H12 O6) 2(C5 H7 N1 O3) CA MN MAN PCA 1QFO V-set GLUCOSE D-GALACTOSE O-SIALIC ACID 2(C6 H12 O6) 2(C6 H12 O6) 3(C11 H19 N1 O9) GLC GAL SIA 1RID Sushi O2-SULFO-GLUCURONIC ACID N,O6-DISULFO-GLUCOSAMINE 8(C6 H10 O10 S1) 8(C6 H13 N1 O11 S2) IDS SGN 1SE3 Stap_Strp_tox_C Stap_Strp_toxin GLUCOSE D-GALACTOSE O-SIALIC ACID C6 H12 O6 C6 H12 O6 C11 H19 N1 O9 GLC GAL SIA 1SL4 Lectin_C CALCIUM ION ALPHA-D-MANNOSE 3(CA1 2+) 4(C6 H12 O6) CA MAN 1SLC Gal-bind_lectin D-GALACTOSE ALPHA-D-MANNOSE N-ACETYL-D-GLUCOSAMINE 4(C6 H12 O6) 6(C1 H12 O6) 6(C8 H15 N1 O6) GAL MAN NAG 1T0W Chitin_bind_1 AMINO GROUP N-ACETYL-D-GLUCOSAMINE H2 N1 3(C8 H15 N1 O6) NH2 NAG 1T8U Sulfotransfer_1 SODIUM ION SULFATE ION O2-SULFO-GLUCURONIC ACID N,O6-DISULFO-GLUCOSAMINE ADENOSINE-3'-5'-DIPHOSPHATE 1,4-DIDEOXY-5-DEHYDRO-O2-SULFO-GLUCURONIC ACID 2(NA1 1+) O4 S1 2- C6 H10 O10 S1 2(C6 H13 N1 O11 S2) 2(C10 H15 N5 O10 P2) C6 H8 O8 S1 NA SO4 IDS SGN A3P UAP 1ULE D-GALACTOSE N-ACETYL-D-GLUCOSAMINE 4(C6 H12 O6) 2(C8 H15 N1 O6) GAL NAG 1UX7 CBM_6 CALCIUM ION SULFATE ION BETA-D-XYLOPYRANOSE 2(CA1 2+) O4 S1 2- 3(C5 H10 O5) CA SO4 XYP 1UY4 SODIUM ION CALCIUM ION GLYCEROL BETA-D-XYLOPYRANOSE NA1 1+ CA1 2+ C3 H8 O3 4(C5 H10 O5) NA CA GOL XYP 1UYY CBM_6 CALCIUM ION BETA-D-GLUCOSE 4(CA1 2+) 7(C6 H12 O6) CA BGC 1VBO ALPHA-D-MANNOSE N-ACETYLALANINE 20(C6 H12 O6) 8(C5 H9 N1 O3) MAN AYA 1VPS Polyoma Coat D-GALACTOSE O-SIALIC ACID N-ACETYL-D-GLUCOSAMINE 5(C6 H12 O6) 10(C11 H19 N1 O9) 5(C8 H15 N1 O6) GAL SIA NAG 1W9T CBM_6 SODIUM ION XYLOPYRANOSE BETA-D-XYLOPYRANOSE 6(NA1 1+) 2(C5 H10 O5) 8(C5 H10 O5) NA XYS XYP 1XT3 Toxin_1 CITRIC ACID N,O6-DISULFO-GLUCOSAMINE 1,4-DIDEOXY-O2-SULFO-GLUCURONIC ACID C6 H8 O7 3(C6 H13 N1 O11 S2) 3(C6 H10 O8 S1) CIT SGN IDU 2BOS SLT beta BUTYL GROUP GLUCOSE D-GALACTOSE 3(C4 H9) 5(C6 H12 O6) 14(C6 H12 O6) BUT GLC GAL 2FCP Plug TonB_dep_Rec GLUCOSE PHOSPHATE ION D-GALACTOSE NICKEL (II) ION 3-OXO-BUTYRIC ACID 3-OXO-PENTADECANOIC ACID GLUCOSAMINE 1-PHOSPHATE GLUCOSAMINE 4-PHOSPHATE ETHANOL AMINE PYROPHOSPHATE L-GLYCERO-D-MANNO-HEPTOPYRANOSE 3-DEOXY-D-MANNO-OCT-2-ULOSONIC ACID 2-TRIDECANOYLOXY-PENTADECANOIC ACID C6 H12 O6 O4 P1 3- 2(C6 H12 O6) 2(NI1 2+) C4 H6 O3 C15 H28 O3 C6 H14 N1 O8 P1 C6 H14 N1 O8 P1 C2 H9 N1 O7 P2 2(C7 H14 O7) 2(C8 H14 O8) 2(C28 H54 O4) GLC PO4 GAL NI LIN LIM GP1 GP4 EA2 GMH KDO LIL 2MPR LamB GLUCOSE CALCIUM ION C 6 H 12 O 6 Ca 2+ GLC CA 3CHB Enterotoxin b GLUCOSE D-GALACTOSE O-SIALIC ACID N-ACETYL-D-GALACTOSAMINE N-(EHTYLSULFITE)MORPHOLINE 5(C6 H12 O6) 10(C6 H12 O6) 5(C11 H19 N1 O9) 5(C8 H15 N1 O6) 2(C6 H14 N1 O4 S1) GLC GAL SIA NGA MES 3MAN Cellulase ALPHA-D-MANNOSE 3(C6 H12 O6) MAN 3MBP GLUCOSE 3(C6 H12 O6) GLC Table 2 Propensities of Procarb40, PDNA62 & PLD116 along with their binding and non-binding data PROCARB40 PDNA62 PLD116 Residue Propensity BS NBS Propensity BS NBS Propensity BS NBS A 0.43 9 494 0.64 42 389 0.79 109 2684 C 0.00 0 29 0.34 7 143 1.07 24 436 D 1.41 27 433 0.36 18 292 0.79 84 2009 E 1.81 29 356 0.39 32 510 0.92 92 1952 F 0.66 9 318 0.77 33 245 1.09 70 1346 G 0.80 20 581 0.71 46 372 1.26 176 2633 H 1.58 8 114 1.08 39 194 2.09 81 712 I 0.12 2 392 0.48 30 373 0.72 70 1837 K 1.40 26 419 1.95 180 423 0.59 65 2053 L 0.34 8 561 0.38 39 624 0.81 120 2872 M 0.19 1 124 0.54 14 149 1.11 42 716 N 1.96 38 429 1.45 74 260 1.17 92 1485 P 0.40 5 297 0.66 35 307 0.45 38 1597 Q 1.54 18 263 1.19 61 272 0.74 46 1123 R 2.77 32 246 2.41 208 360 1.80 139 1450 S 0.43 9 499 1.33 91 355 1.03 112 2049 T 0.70 15 499 1.36 85 325 0.87 90 2030 V 0.00 0 472 0.59 40 399 0.73 92 2315 W 3.31 23 144 1.40 22 81 2.30 67 518 Y 1.68 25 333 1.19 43 189 1.88 125 1189 Table 3 Comparison of Binary and PSSM prediction results using jackknife leave-one-out method (binding sites were labeled at 3.5 Å cut-off distance between carbohydrate and protein atoms).

Publication Year: 2007

PubMed ID is not available.

Published in 2014

PubMedCentral: PMC4184157

Table 2 List of Bound and Unbound Protein Conformations Used for Docking Prediction as well as Rank and RMSD of Best Prediction and the Rank of the Binding Site bound PDB ID unbound PDB ID chain best ... robe rank RMSD (Å) binding site rank 1T8U 1T8T B 5th 7.995 1st 3QMK 3Q7L A 5th 5.961 1st 1GMN 1NK1 A 3rd 8.959 1st 3DY0 1LQ8 A 1st 7.510 1st 1BFC 1BFG A 2nd 3.499 1st Docking Results for the Test Set As shown in Table 2 , docking of the selected heparin probe, the binding site for heparin can be predicted by the consensus of calculated heparin poses.

Table 1 List of Heparin Bound Protein Structures and their Unbound Forms a protein unbound PDB bound PDB heparin length human 3- O -Sulfotransferase-3 1T8T( 40 ) 1T8U( 40 ) 2 E2 domain of amyloid precursor-like protein 1 3Q7L( 41 ) 3QMK( 42 ) 2 NK1 fragment of human hepatocyte growth factor/scatter factor (HGF/SF) 1NK1( 43 ) 1GMN( 44 ) 2.5 plasma serine protease inhibitor 1LQ8( 45 ) 3DY0( 46 ) 2.5 basic fibroblast growth factor 1BFG( 47 ) 1BFC( 48 ) 3 a Length of heparin chain present in bound structures given in terms of disaccharides.

Publication Year: 2014