View on GitHub

bayesProtQuant

Analysis of FG.NormalizedMS2PeakArea = 1

Information regarding FG.NormalizedMS2PeakArea = 1.0 from Roland.

Values of 1 can come from two sources:

In Spectronaut, small values for quantities (<1) are set to one, these arise from small noise peaks or from local normalization effects.

In both cases were the signals noise or close to noise.

It mostly arises due to the fact that the dynamic range of MS1 and MS2 are not necessarily the same.

In Spectronaut MS1 and MS2 information is used for identification and it can be that one layer is enough. So the quantitative information of the other layer can be very low.

Import relevant modules.

import pandas as pd 
from utils import *

Load the pickled raw data set.

df = pd.read_pickle("../../data/500-PSSS3-equ decoy_Report_nonShared_20190507.pkl")

Load triqler formatted data set. (Formatted from the same data set).

df_triqler = pd.read_pickle("../../data/PSSS3_triqlerFormatted_nonShared.pkl")

Find proteins where intensity (FG.NormalizedMS2PeakArea) = 1.

proteins = df_triqler[df_triqler["intensity"] == 1].proteins
proteins = proteins[proteins.str[:5] != "decoy"]
proteins = proteins.str.split("_", expand = True)
proteins = proteins.rename(index = str, columns = {0:"protein", 1:"specie"})
unique_proteins = pd.Series(proteins.protein.unique())


Show these proteins.

unique_proteins
0        A0A023T4K3
1        A0A061ACK4
2        A0A061ACL3
3        A0A061ACR1
4        A0A061ACU2
5        A0A061ACU6
6        A0A061ACY0
7        A0A061AD21
8        A0A061AD39
9        A0A061AD47
10       A0A061AE05
11       A0A061AJK8
12       A0A061AKY5
13       A0A061AL89
14       A0A078BPH9
15       A0A078BPJ4
16       A0A078BPM1
17       A0A0H3W5N0
18       A0A0K3AQM0
19       A0A0K3AQN5
20       A0A0K3AQN9
21       A0A0K3AQV8
22       A0A0K3AQY3
23       A0A0K3AR10
24       A0A0K3ARF3
25       A0A0K3ARM6
26       A0A0K3ARN9
27       A0A0K3ARR9
28       A0A0K3ARY0
29       A0A0K3ARZ7
            ...    
20675        P06732
20676        P22352
20677        P27216
20678        P55082
20679        P56381
20680        Q13233
20681      Q13555-6
20682        Q15291
20683        Q16611
20684        Q19185
20685        Q1G3M2
20686      Q6PJT7-9
20687        Q7XA86
20688        Q84UC7
20689        Q8L4R0
20690        Q8N8J7
20691        Q8NG11
20692        Q8VYK9
20693        Q9BS34
20694        Q9H1E5
20695        Q9LFA4
20696        Q9MBH1
20697        Q9NWT8
20698        Q9STL2
20699        Q9SZY4
20700        Q9UAX1
20701        Q9UBM7
20702      Q9UPN9-2
20703        Q9XIM0
20704        Q9ZVA5
Length: 20705, dtype: object
query_prot = unique_proteins[0]
query = df[df["PG.ProteinAccessions"] == query_prot]

query
R.Condition R.FileName PG.Organisms PG.ProteinAccessions PG.Cscore PG.NrOfStrippedSequencesIdentified PG.Qvalue PG.Quantity EG.StrippedSequence EG.IsDecoy EG.PrecursorId EG.PEP EG.Qvalue EG.Cscore FG.NormalizedMS2PeakArea
0 S500-PSSS3-S04 G_D180330_S500-PSSS3-S04_MHRM_R03_T0 Caenorhabditis elegans OX=6239 A0A023T4K3 0.182511 2 0.000646 377452.15625 SFYYLVQDLK False _SFYYLVQDLK_.2 1.479293e-02 0.003878 2.355977 7.319080e+05
1 S500-PSSS3-S04 G_D180330_S500-PSSS3-S04_MHRM_R03_T0 Caenorhabditis elegans OX=6239 A0A023T4K3 0.182511 2 0.000646 377452.15625 EATVSESVLSELKR False _EATVSESVLSELKR_.3 2.210408e-02 0.005114 2.073540 2.299625e+04
2 S500-PSSS3-S04 G_D180330_S500-PSSS3-S04_MHRM_R03_T0 Caenorhabditis elegans OX=6239 A0A023T4K3 0.182511 2 0.000646 377452.15625 EATVSESVLSELKR False _EATVSESVLSELKR_.2 1.000000e+00 0.762912 -2.174942 4.274121e+04
3 S500-PSSS3-S04 G_D180330_S500-PSSS3-S04_MHRM_R03_T0 Caenorhabditis elegans OX=6239 A0A023T4K3 0.182511 2 0.000646 377452.15625 AADFYVR False _AADFYVR_.2 9.465682e-03 0.015673 1.283862 3.631738e+04
4 S500-PSSS3-S04 G_D180330_S500-PSSS3-S04_MHRM_R03_T0 Caenorhabditis elegans OX=6239 A0A023T4K3 0.182511 2 0.000646 377452.15625 IGALADVNNSKDPDGLR False _IGALADVNNSKDPDGLR_.3 4.225811e-01 0.102376 0.036027 2.253101e+04
5 S500-PSSS3-S04 G_D180330_S500-PSSS3-S04_MHRM_R03_T0 Caenorhabditis elegans OX=6239 A0A023T4K3 0.182511 2 0.000646 377452.15625 IGALADVNNSKDPDGLR False _IGALADVNNSKDPDGLR_.2 1.000000e+00 0.760221 -2.151609 2.341229e+04
6 S500-PSSS3-S04 G_D180330_S500-PSSS3-S04_MHRM_R03_T0 Caenorhabditis elegans OX=6239 A0A023T4K3 0.182511 2 0.000646 377452.15625 NEHISFTTGK False _NEHISFTTGK_.2 4.517035e-01 0.117293 -0.067812 1.782665e+04
7 S500-PSSS3-S04 G_D180330_S500-PSSS3-S04_MHRM_R03_T0 Caenorhabditis elegans OX=6239 A0A023T4K3 0.182511 2 0.000646 377452.15625 NEHISFTTGK False _NEHISFTTGK_.3 6.079661e-01 0.219510 -0.606611 8.547810e+04
8 S500-PSSS3-S04 G_D180330_S500-PSSS3-S04_MHRM_R03_T0 Caenorhabditis elegans OX=6239 A0A023T4K3 0.182511 2 0.000646 377452.15625 FGHEFLEFEFRPNGSLR False _FGHEFLEFEFRPNGSLR_.4 9.981995e-01 0.702184 -1.814023 8.090994e+03
9 S500-PSSS3-S04 G_D180330_S500-PSSS3-S04_MHRM_R03_T0 Caenorhabditis elegans OX=6239 A0A023T4K3 0.182511 2 0.000646 377452.15625 FGHEFLEFEFRPNGSLR False _FGHEFLEFEFRPNGSLR_.3 9.884009e-01 0.693141 -1.773851 2.529676e+04
10 S500-PSSS3-S04 G_D180330_S500-PSSS3-S04_MHRM_R03_T0 Caenorhabditis elegans OX=6239 A0A023T4K3 0.182511 2 0.000646 377452.15625 GKFGHEFLEFEFRPNGSLR False _GKFGHEFLEFEFRPNGSLR_.4 1.000000e+00 0.584152 -1.466288 1.000000e+00
11 S500-PSSS3-S04 G_D180330_S500-PSSS3-S04_MHRM_R03_T0 Caenorhabditis elegans OX=6239 A0A023T4K3 0.182511 2 0.000646 377452.15625 QELEILYK False _QELEILYK_.2 5.828639e-01 0.268762 -0.809965 1.876812e+05
12 S500-PSSS3-S04 G_D180330_S500-PSSS3-S04_MHRM_R03_T0 Caenorhabditis elegans OX=6239 A0A023T4K3 0.182511 2 0.000646 377452.15625 IIEDSEIMQEDDDNWPEPDKIGR False _IIEDSEIMQEDDDNWPEPDKIGR_.3 7.583292e-01 0.375077 -1.108799 1.882612e+04
13 S500-PSSS3-S04 G_D180330_S500-PSSS3-S04_MHRM_R03_T0 Caenorhabditis elegans OX=6239 A0A023T4K3 0.182511 2 0.000646 377452.15625 EATVSESVLSELK False _EATVSESVLSELK_.2 1.000000e+00 0.758897 -2.142972 4.276881e+04
14 S500-PSSS3-S04 G_D180330_S500-PSSS3-S04_MHRM_R03_T0 Caenorhabditis elegans OX=6239 A0A023T4K3 0.182511 2 0.000646 377452.15625 IGALADVNNSK False _IGALADVNNSK_.2 9.720947e-01 0.773937 -2.279790 1.754513e+04
15 S500-PSSS3-S04 G_D180330_S500-PSSS3-S04_MHRM_R03_T0 Caenorhabditis elegans OX=6239 A0A023T4K3 0.182511 2 0.000646 377452.15625 IIEDSEIMQEDDDNWPEPDK False _IIEDSEIMQEDDDNWPEPDK_.2 1.000000e+00 0.736726 -1.985881 2.863420e+03
16 S500-PSSS3-S04 G_D180330_S500-PSSS3-S04_MHRM_R03_T0 Caenorhabditis elegans OX=6239 A0A023T4K3 0.182511 2 0.000646 377452.15625 QELEILYKNEHISFTTGK False _QELEILYKNEHISFTTGK_.4 1.000000e+00 0.766065 -2.205700 4.767215e+05
17 S500-PSSS3-S04 G_D180330_S500-PSSS3-S04_MHRM_R03_T0 Caenorhabditis elegans OX=6239 A0A023T4K3 0.182511 2 0.000646 377452.15625 QELEILYKNEHISFTTGK False _QELEILYKNEHISFTTGK_.3 1.000000e+00 0.742685 -2.022538 1.182180e+04
18 S500-PSSS3-S04 G_D180330_S500-PSSS3-S04_MHRM_R03_T0 Caenorhabditis elegans OX=6239 A0A023T4K3 NaN 13 NaN NaN NSEIGFTTHK True _NSEIGFTTHK_.3 NaN NaN -0.886623 3.543555e+04
19 S500-PSSS3-S04 G_D180330_S500-PSSS3-S04_MHRM_R03_T0 Caenorhabditis elegans OX=6239 A0A023T4K3 NaN 13 NaN NaN NSEIGFTTHK True _NSEIGFTTHK_.2 NaN NaN -1.928180 5.298748e+04
20 S500-PSSS3-S04 G_D180330_S500-PSSS3-S04_MHRM_R03_T0 Caenorhabditis elegans OX=6239 A0A023T4K3 NaN 13 NaN NaN VADFYAR True _VADFYAR_.2 NaN NaN -1.296488 3.413020e+04
21 S500-PSSS3-S04 G_D180330_S500-PSSS3-S04_MHRM_R03_T0 Caenorhabditis elegans OX=6239 A0A023T4K3 NaN 13 NaN NaN SAVVLEETSESLKR True _SAVVLEETSESLKR_.3 NaN NaN -1.588380 1.039064e+04
22 S500-PSSS3-S04 G_D180330_S500-PSSS3-S04_MHRM_R03_T0 Caenorhabditis elegans OX=6239 A0A023T4K3 NaN 13 NaN NaN SAVVLEETSESLKR True _SAVVLEETSESLKR_.2 NaN NaN -1.314687 7.577624e+05
23 S500-PSSS3-S04 G_D180330_S500-PSSS3-S04_MHRM_R03_T0 Caenorhabditis elegans OX=6239 A0A023T4K3 NaN 13 NaN NaN FGHENLFFLEREFGSPR True _FGHENLFFLEREFGSPR_.4 NaN NaN -1.283839 6.206880e+02
24 S500-PSSS3-S04 G_D180330_S500-PSSS3-S04_MHRM_R03_T0 Caenorhabditis elegans OX=6239 A0A023T4K3 NaN 13 NaN NaN FGHENLFFLEREFGSPR True _FGHENLFFLEREFGSPR_.3 NaN NaN -1.757634 7.628955e+03
25 S500-PSSS3-S04 G_D180330_S500-PSSS3-S04_MHRM_R03_T0 Caenorhabditis elegans OX=6239 A0A023T4K3 NaN 13 NaN NaN EELQYLIK True _EELQYLIK_.2 NaN NaN -1.280042 4.756001e+04
26 S500-PSSS3-S04 G_D180330_S500-PSSS3-S04_MHRM_R03_T0 Caenorhabditis elegans OX=6239 A0A023T4K3 NaN 13 NaN NaN QEEEILLKNIYHSFTTGK True _QEEEILLKNIYHSFTTGK_.4 NaN NaN -2.222821 1.015951e+05
27 S500-PSSS3-S04 G_D180330_S500-PSSS3-S04_MHRM_R03_T0 Caenorhabditis elegans OX=6239 A0A023T4K3 NaN 13 NaN NaN QEEEILLKNIYHSFTTGK True _QEEEILLKNIYHSFTTGK_.3 NaN NaN -0.119591 7.760912e+04
28 S500-PSSS3-S04 G_D180330_S500-PSSS3-S04_MHRM_R03_T0 Caenorhabditis elegans OX=6239 A0A023T4K3 NaN 13 NaN NaN GKFFHEFSEGEFRPNGLLR True _GKFFHEFSEGEFRPNGLLR_.4 NaN NaN -1.395250 3.577354e+03
29 S500-PSSS3-S04 G_D180330_S500-PSSS3-S04_MHRM_R03_T0 Caenorhabditis elegans OX=6239 A0A023T4K3 NaN 13 NaN NaN VGNLANIDASK True _VGNLANIDASK_.2 NaN NaN -2.343939 4.672448e+04
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
52155410 S500-PSSS3-S01 G_D180330_S500-PSSS3-S01_MHRM_R02_T0 Caenorhabditis elegans OX=6239 A0A023T4K3 0.182511 11 0.000646 190713.62500 NEHISFTTGK False _NEHISFTTGK_.2 3.332570e-06 0.000132 2.548576 1.445004e+05
52155411 S500-PSSS3-S01 G_D180330_S500-PSSS3-S01_MHRM_R02_T0 Caenorhabditis elegans OX=6239 A0A023T4K3 0.182511 11 0.000646 190713.62500 NEHISFTTGK False _NEHISFTTGK_.3 1.681911e-03 0.000467 2.016687 1.781270e+04
52155412 S500-PSSS3-S01 G_D180330_S500-PSSS3-S01_MHRM_R02_T0 Caenorhabditis elegans OX=6239 A0A023T4K3 0.182511 11 0.000646 190713.62500 FGHEFLEFEFRPNGSLR False _FGHEFLEFEFRPNGSLR_.4 2.766263e-09 0.000060 2.865193 6.142457e+04
52155413 S500-PSSS3-S01 G_D180330_S500-PSSS3-S01_MHRM_R02_T0 Caenorhabditis elegans OX=6239 A0A023T4K3 0.182511 11 0.000646 190713.62500 FGHEFLEFEFRPNGSLR False _FGHEFLEFEFRPNGSLR_.3 1.977536e-10 0.000057 2.962228 8.107272e+04
52155414 S500-PSSS3-S01 G_D180330_S500-PSSS3-S01_MHRM_R02_T0 Caenorhabditis elegans OX=6239 A0A023T4K3 0.182511 11 0.000646 190713.62500 GKFGHEFLEFEFRPNGSLR False _GKFGHEFLEFEFRPNGSLR_.4 2.559424e-09 0.000060 2.868159 2.840809e+04
52155415 S500-PSSS3-S01 G_D180330_S500-PSSS3-S01_MHRM_R02_T0 Caenorhabditis elegans OX=6239 A0A023T4K3 0.182511 11 0.000646 190713.62500 QELEILYK False _QELEILYK_.2 3.783442e-05 0.000191 2.401054 2.886927e+04
52155416 S500-PSSS3-S01 G_D180330_S500-PSSS3-S01_MHRM_R02_T0 Caenorhabditis elegans OX=6239 A0A023T4K3 0.182511 11 0.000646 190713.62500 IIEDSEIMQEDDDNWPEPDKIGR False _IIEDSEIMQEDDDNWPEPDKIGR_.3 3.861019e-08 0.000075 2.759681 2.008109e+04
52155417 S500-PSSS3-S01 G_D180330_S500-PSSS3-S01_MHRM_R02_T0 Caenorhabditis elegans OX=6239 A0A023T4K3 0.182511 11 0.000646 190713.62500 EATVSESVLSELK False _EATVSESVLSELK_.2 3.696719e-02 0.003889 0.774860 2.714550e+04
52155418 S500-PSSS3-S01 G_D180330_S500-PSSS3-S01_MHRM_R02_T0 Caenorhabditis elegans OX=6239 A0A023T4K3 0.182511 11 0.000646 190713.62500 IGALADVNNSK False _IGALADVNNSK_.2 9.931060e-03 0.001268 1.462434 3.666480e+04
52155419 S500-PSSS3-S01 G_D180330_S500-PSSS3-S01_MHRM_R02_T0 Caenorhabditis elegans OX=6239 A0A023T4K3 0.182511 11 0.000646 190713.62500 IIEDSEIMQEDDDNWPEPDK False _IIEDSEIMQEDDDNWPEPDK_.2 3.624652e-01 0.051595 -1.179870 8.187363e+03
52155420 S500-PSSS3-S01 G_D180330_S500-PSSS3-S01_MHRM_R02_T0 Caenorhabditis elegans OX=6239 A0A023T4K3 0.182511 11 0.000646 190713.62500 QELEILYKNEHISFTTGK False _QELEILYKNEHISFTTGK_.4 6.368914e-01 0.109106 -1.759308 4.004741e+05
52155421 S500-PSSS3-S01 G_D180330_S500-PSSS3-S01_MHRM_R02_T0 Caenorhabditis elegans OX=6239 A0A023T4K3 0.182511 11 0.000646 190713.62500 QELEILYKNEHISFTTGK False _QELEILYKNEHISFTTGK_.3 1.611248e-01 0.018855 -0.347197 3.279107e+04
52155422 S500-PSSS3-S01 G_D180330_S500-PSSS3-S01_MHRM_R02_T0 Caenorhabditis elegans OX=6239 A0A023T4K3 NaN 13 NaN NaN NSEIGFTTHK True _NSEIGFTTHK_.3 NaN NaN -1.625697 8.006405e+04
52155423 S500-PSSS3-S01 G_D180330_S500-PSSS3-S01_MHRM_R02_T0 Caenorhabditis elegans OX=6239 A0A023T4K3 NaN 13 NaN NaN NSEIGFTTHK True _NSEIGFTTHK_.2 NaN NaN -1.655590 1.387353e+04
52155424 S500-PSSS3-S01 G_D180330_S500-PSSS3-S01_MHRM_R02_T0 Caenorhabditis elegans OX=6239 A0A023T4K3 NaN 13 NaN NaN VADFYAR True _VADFYAR_.2 NaN NaN -1.541574 6.440874e+04
52155425 S500-PSSS3-S01 G_D180330_S500-PSSS3-S01_MHRM_R02_T0 Caenorhabditis elegans OX=6239 A0A023T4K3 NaN 13 NaN NaN SAVVLEETSESLKR True _SAVVLEETSESLKR_.3 NaN NaN -2.266281 2.088234e+04
52155426 S500-PSSS3-S01 G_D180330_S500-PSSS3-S01_MHRM_R02_T0 Caenorhabditis elegans OX=6239 A0A023T4K3 NaN 13 NaN NaN SAVVLEETSESLKR True _SAVVLEETSESLKR_.2 NaN NaN -2.667791 1.217765e+05
52155427 S500-PSSS3-S01 G_D180330_S500-PSSS3-S01_MHRM_R02_T0 Caenorhabditis elegans OX=6239 A0A023T4K3 NaN 13 NaN NaN FGHENLFFLEREFGSPR True _FGHENLFFLEREFGSPR_.4 NaN NaN -2.190037 1.841496e+03
52155428 S500-PSSS3-S01 G_D180330_S500-PSSS3-S01_MHRM_R02_T0 Caenorhabditis elegans OX=6239 A0A023T4K3 NaN 13 NaN NaN FGHENLFFLEREFGSPR True _FGHENLFFLEREFGSPR_.3 NaN NaN -2.551600 1.682023e+04
52155429 S500-PSSS3-S01 G_D180330_S500-PSSS3-S01_MHRM_R02_T0 Caenorhabditis elegans OX=6239 A0A023T4K3 NaN 13 NaN NaN EELQYLIK True _EELQYLIK_.2 NaN NaN -1.859780 1.620637e+03
52155430 S500-PSSS3-S01 G_D180330_S500-PSSS3-S01_MHRM_R02_T0 Caenorhabditis elegans OX=6239 A0A023T4K3 NaN 13 NaN NaN QEEEILLKNIYHSFTTGK True _QEEEILLKNIYHSFTTGK_.4 NaN NaN -1.595791 1.904184e+04
52155431 S500-PSSS3-S01 G_D180330_S500-PSSS3-S01_MHRM_R02_T0 Caenorhabditis elegans OX=6239 A0A023T4K3 NaN 13 NaN NaN QEEEILLKNIYHSFTTGK True _QEEEILLKNIYHSFTTGK_.3 NaN NaN -0.315062 1.340976e+06
52155432 S500-PSSS3-S01 G_D180330_S500-PSSS3-S01_MHRM_R02_T0 Caenorhabditis elegans OX=6239 A0A023T4K3 NaN 13 NaN NaN GKFFHEFSEGEFRPNGLLR True _GKFFHEFSEGEFRPNGLLR_.4 NaN NaN -2.517616 1.472539e+03
52155433 S500-PSSS3-S01 G_D180330_S500-PSSS3-S01_MHRM_R02_T0 Caenorhabditis elegans OX=6239 A0A023T4K3 NaN 13 NaN NaN VGNLANIDASK True _VGNLANIDASK_.2 NaN NaN -2.681926 6.866233e+03
52155434 S500-PSSS3-S01 G_D180330_S500-PSSS3-S01_MHRM_R02_T0 Caenorhabditis elegans OX=6239 A0A023T4K3 NaN 13 NaN NaN VDALADPNNSKIGDGLR True _VDALADPNNSKIGDGLR_.3 NaN NaN -2.507371 1.992760e+04
52155435 S500-PSSS3-S01 G_D180330_S500-PSSS3-S01_MHRM_R02_T0 Caenorhabditis elegans OX=6239 A0A023T4K3 NaN 13 NaN NaN VDALADPNNSKIGDGLR True _VDALADPNNSKIGDGLR_.2 NaN NaN -1.980533 3.229269e+04
52155436 S500-PSSS3-S01 G_D180330_S500-PSSS3-S01_MHRM_R02_T0 Caenorhabditis elegans OX=6239 A0A023T4K3 NaN 13 NaN NaN LQYYLVSDFK True _LQYYLVSDFK_.2 NaN NaN -1.511397 2.168764e+05
52155437 S500-PSSS3-S01 G_D180330_S500-PSSS3-S01_MHRM_R02_T0 Caenorhabditis elegans OX=6239 A0A023T4K3 NaN 13 NaN NaN LTAVSESELSVEK True _LTAVSESELSVEK_.2 NaN NaN -2.657463 1.782522e+05
52155438 S500-PSSS3-S01 G_D180330_S500-PSSS3-S01_MHRM_R02_T0 Caenorhabditis elegans OX=6239 A0A023T4K3 NaN 13 NaN NaN IIDDSEMEWEDDDNQPEPKIGIR True _IIDDSEMEWEDDDNQPEPKIGIR_.3 NaN NaN -2.562214 5.152218e+04
52155439 S500-PSSS3-S01 G_D180330_S500-PSSS3-S01_MHRM_R02_T0 Caenorhabditis elegans OX=6239 A0A023T4K3 NaN 13 NaN NaN MIIDSEDIEEDDQPWNEPDK True _MIIDSEDIEEDDQPWNEPDK_.2 NaN NaN -2.688025 5.576035e+03

1800 rows × 15 columns

A lot of these proteins in fact has peptide intensities. Lets investigate where intensities are 0.

query[query["FG.NormalizedMS2PeakArea"] == 1]
R.Condition R.FileName PG.Organisms PG.ProteinAccessions PG.Cscore PG.NrOfStrippedSequencesIdentified PG.Qvalue PG.Quantity EG.StrippedSequence EG.IsDecoy EG.PrecursorId EG.PEP EG.Qvalue EG.Cscore FG.NormalizedMS2PeakArea
10 S500-PSSS3-S04 G_D180330_S500-PSSS3-S04_MHRM_R03_T0 Caenorhabditis elegans OX=6239 A0A023T4K3 0.182511 2 0.000646 377452.156250 GKFGHEFLEFEFRPNGSLR False _GKFGHEFLEFEFRPNGSLR_.4 1.000000 0.584152 -1.466288 1.0
2128802 S500-PSSS3-S06 G_D180330_S500-PSSS3-S06_MHRM_R03_T0 Caenorhabditis elegans OX=6239 A0A023T4K3 0.182511 0 0.000646 NaN GKFGHEFLEFEFRPNGSLR False _GKFGHEFLEFEFRPNGSLR_.4 0.971182 0.877314 -1.756694 1.0
3193196 S500-PSSS3-S07 G_D180330_S500-PSSS3-S07_MHRM_R03_T0 Caenorhabditis elegans OX=6239 A0A023T4K3 0.182511 0 0.000646 NaN FGHEFLEFEFRPNGSLR False _FGHEFLEFEFRPNGSLR_.4 1.000000 0.731580 -1.305529 1.0
3193202 S500-PSSS3-S07 G_D180330_S500-PSSS3-S07_MHRM_R03_T0 Caenorhabditis elegans OX=6239 A0A023T4K3 0.182511 0 0.000646 NaN IGALADVNNSK False _IGALADVNNSK_.2 1.000000 0.913514 -2.087202 1.0
4257592 S500-PSSS3-S10 G_D180330_S500-PSSS3-S10_MHRM_R03_T0 Caenorhabditis elegans OX=6239 A0A023T4K3 0.182511 0 0.000646 NaN FGHEFLEFEFRPNGSLR False _FGHEFLEFEFRPNGSLR_.4 1.000000 0.882508 -1.837320 1.0
5321988 S500-PSSS3-S08 G_D180330_S500-PSSS3-S08_MHRM_R03_T0 Caenorhabditis elegans OX=6239 A0A023T4K3 0.182511 0 0.000646 NaN FGHEFLEFEFRPNGSLR False _FGHEFLEFEFRPNGSLR_.4 1.000000 0.749338 -1.102499 1.0
7450780 S500-PSSS3-S08 G_D180330_S500-PSSS3-S08_MHRM_R04_T0 Caenorhabditis elegans OX=6239 A0A023T4K3 0.182511 0 0.000646 NaN FGHEFLEFEFRPNGSLR False _FGHEFLEFEFRPNGSLR_.4 0.949748 0.900993 -1.883789 1.0
7450795 S500-PSSS3-S08 G_D180330_S500-PSSS3-S08_MHRM_R04_T0 Caenorhabditis elegans OX=6239 A0A023T4K3 NaN 13 NaN NaN FGHENLFFLEREFGSPR True _FGHENLFFLEREFGSPR_.4 NaN NaN -1.485949 1.0
9579572 S500-PSSS3-S06 G_D180330_S500-PSSS3-S06_MHRM_R04_T0 Caenorhabditis elegans OX=6239 A0A023T4K3 0.182511 0 0.000646 NaN FGHEFLEFEFRPNGSLR False _FGHEFLEFEFRPNGSLR_.4 1.000000 0.744226 -1.351521 1.0
9579573 S500-PSSS3-S06 G_D180330_S500-PSSS3-S06_MHRM_R04_T0 Caenorhabditis elegans OX=6239 A0A023T4K3 0.182511 0 0.000646 NaN FGHEFLEFEFRPNGSLR False _FGHEFLEFEFRPNGSLR_.3 1.000000 0.646199 -1.209836 1.0
10643968 S500-PSSS3-S07 G_D180330_S500-PSSS3-S07_MHRM_R04_T0 Caenorhabditis elegans OX=6239 A0A023T4K3 0.182511 0 0.000646 NaN FGHEFLEFEFRPNGSLR False _FGHEFLEFEFRPNGSLR_.4 0.982366 0.922066 -2.057751 1.0
13837149 S500-PSSS3-S05 G_D180330_S500-PSSS3-S05_MHRM_R04_T0 Caenorhabditis elegans OX=6239 A0A023T4K3 0.182511 0 0.000646 NaN EATVSESVLSELKR False _EATVSESVLSELKR_.3 0.978127 0.832960 -2.124984 1.0
13837156 S500-PSSS3-S05 G_D180330_S500-PSSS3-S05_MHRM_R04_T0 Caenorhabditis elegans OX=6239 A0A023T4K3 0.182511 0 0.000646 NaN FGHEFLEFEFRPNGSLR False _FGHEFLEFEFRPNGSLR_.4 1.000000 0.749722 -1.772931 1.0
14901545 S500-PSSS3-S09 G_D180330_S500-PSSS3-S09_MHRM_R04_T0 Caenorhabditis elegans OX=6239 A0A023T4K3 0.182511 0 0.000646 NaN EATVSESVLSELKR False _EATVSESVLSELKR_.3 1.000000 0.948016 1.891271 1.0
14901554 S500-PSSS3-S09 G_D180330_S500-PSSS3-S09_MHRM_R04_T0 Caenorhabditis elegans OX=6239 A0A023T4K3 0.182511 0 0.000646 NaN GKFGHEFLEFEFRPNGSLR False _GKFGHEFLEFEFRPNGSLR_.4 1.000000 0.948983 1.806365 1.0
17030343 S500-PSSS3-S07 G_D180330_S500-PSSS3-S07_MHRM_R05_T0 Caenorhabditis elegans OX=6239 A0A023T4K3 0.182511 0 0.000646 NaN NEHISFTTGK False _NEHISFTTGK_.3 0.948465 0.832369 -1.417118 1.0
17030344 S500-PSSS3-S07 G_D180330_S500-PSSS3-S07_MHRM_R05_T0 Caenorhabditis elegans OX=6239 A0A023T4K3 0.182511 0 0.000646 NaN FGHEFLEFEFRPNGSLR False _FGHEFLEFEFRPNGSLR_.4 1.000000 0.801456 -1.343402 1.0
18094733 S500-PSSS3-S04 G_D180330_S500-PSSS3-S04_MHRM_R05_T0 Caenorhabditis elegans OX=6239 A0A023T4K3 0.182511 0 0.000646 NaN EATVSESVLSELKR False _EATVSESVLSELKR_.3 1.000000 0.779697 -2.380094 1.0
18094742 S500-PSSS3-S04 G_D180330_S500-PSSS3-S04_MHRM_R05_T0 Caenorhabditis elegans OX=6239 A0A023T4K3 0.182511 0 0.000646 NaN GKFGHEFLEFEFRPNGSLR False _GKFGHEFLEFEFRPNGSLR_.4 0.956955 0.758794 -2.127311 1.0
19159131 S500-PSSS3-S05 G_D180330_S500-PSSS3-S05_MHRM_R05_T0 Caenorhabditis elegans OX=6239 A0A023T4K3 0.182511 0 0.000646 NaN AADFYVR False _AADFYVR_.2 0.950289 0.733853 -1.887230 1.0
19159135 S500-PSSS3-S05 G_D180330_S500-PSSS3-S05_MHRM_R05_T0 Caenorhabditis elegans OX=6239 A0A023T4K3 0.182511 0 0.000646 NaN NEHISFTTGK False _NEHISFTTGK_.3 0.937416 0.726019 -1.865410 1.0
20223532 S500-PSSS3-S08 G_D180330_S500-PSSS3-S08_MHRM_R05_T0 Caenorhabditis elegans OX=6239 A0A023T4K3 0.182511 0 0.000646 NaN FGHEFLEFEFRPNGSLR False _FGHEFLEFEFRPNGSLR_.4 1.000000 0.889777 -1.878795 1.0
21287921 S500-PSSS3-S10 G_D180330_S500-PSSS3-S10_MHRM_R05_T0 Caenorhabditis elegans OX=6239 A0A023T4K3 0.182511 0 0.000646 NaN EATVSESVLSELKR False _EATVSESVLSELKR_.3 1.000000 0.923936 -1.457002 1.0
21287923 S500-PSSS3-S10 G_D180330_S500-PSSS3-S10_MHRM_R05_T0 Caenorhabditis elegans OX=6239 A0A023T4K3 0.182511 0 0.000646 NaN AADFYVR False _AADFYVR_.2 0.945463 0.957276 -1.718887 1.0
21287927 S500-PSSS3-S10 G_D180330_S500-PSSS3-S10_MHRM_R05_T0 Caenorhabditis elegans OX=6239 A0A023T4K3 0.182511 0 0.000646 NaN NEHISFTTGK False _NEHISFTTGK_.3 0.964483 0.940092 -1.562498 1.0
21287928 S500-PSSS3-S10 G_D180330_S500-PSSS3-S10_MHRM_R05_T0 Caenorhabditis elegans OX=6239 A0A023T4K3 0.182511 0 0.000646 NaN FGHEFLEFEFRPNGSLR False _FGHEFLEFEFRPNGSLR_.4 0.820509 0.656198 -0.931232 1.0
22352330 S500-PSSS3-S03 G_D180330_S500-PSSS3-S03_MHRM_R05_T0 Caenorhabditis elegans OX=6239 A0A023T4K3 0.182511 3 0.000646 224372.781250 IGALADVNNSK False _IGALADVNNSK_.2 0.823999 0.376484 -1.475100 1.0
23416720 S500-PSSS3-S06 G_D180330_S500-PSSS3-S06_MHRM_R05_T0 Caenorhabditis elegans OX=6239 A0A023T4K3 0.182511 1 0.000646 830202.687500 FGHEFLEFEFRPNGSLR False _FGHEFLEFEFRPNGSLR_.4 1.000000 0.703447 -1.285880 1.0
23416722 S500-PSSS3-S06 G_D180330_S500-PSSS3-S06_MHRM_R05_T0 Caenorhabditis elegans OX=6239 A0A023T4K3 0.182511 1 0.000646 830202.687500 GKFGHEFLEFEFRPNGSLR False _GKFGHEFLEFEFRPNGSLR_.4 1.000000 0.915586 -2.226666 1.0
26609901 S500-PSSS3-S09 G_D180330_S500-PSSS3-S09_MHRM_R05_T0 Caenorhabditis elegans OX=6239 A0A023T4K3 0.182511 0 0.000646 NaN EATVSESVLSELKR False _EATVSESVLSELKR_.3 0.923761 0.922159 -1.667343 1.0
26609910 S500-PSSS3-S09 G_D180330_S500-PSSS3-S09_MHRM_R05_T0 Caenorhabditis elegans OX=6239 A0A023T4K3 0.182511 0 0.000646 NaN GKFGHEFLEFEFRPNGSLR False _GKFGHEFLEFEFRPNGSLR_.4 0.948613 0.948630 -1.890678 1.0
26609912 S500-PSSS3-S09 G_D180330_S500-PSSS3-S09_MHRM_R05_T0 Caenorhabditis elegans OX=6239 A0A023T4K3 0.182511 0 0.000646 NaN IIEDSEIMQEDDDNWPEPDKIGR False _IIEDSEIMQEDDDNWPEPDKIGR_.3 1.000000 0.769534 -1.065896 1.0
29803091 S500-PSSS3-S09 G_D180330_S500-PSSS3-S09_MHRM_R03_T0 Caenorhabditis elegans OX=6239 A0A023T4K3 0.182511 0 0.000646 NaN AADFYVR False _AADFYVR_.2 1.000000 0.946887 -1.492815 1.0
29803095 S500-PSSS3-S09 G_D180330_S500-PSSS3-S09_MHRM_R03_T0 Caenorhabditis elegans OX=6239 A0A023T4K3 0.182511 0 0.000646 NaN NEHISFTTGK False _NEHISFTTGK_.3 1.000000 0.888488 -1.251080 1.0
30867485 S500-PSSS3-S09 G_D180330_S500-PSSS3-S09_MHRM_R02_T0 Caenorhabditis elegans OX=6239 A0A023T4K3 0.182511 0 0.000646 NaN EATVSESVLSELKR False _EATVSESVLSELKR_.3 0.942934 0.912783 -1.570019 1.0
30867487 S500-PSSS3-S09 G_D180330_S500-PSSS3-S09_MHRM_R02_T0 Caenorhabditis elegans OX=6239 A0A023T4K3 0.182511 0 0.000646 NaN AADFYVR False _AADFYVR_.2 0.852968 0.897701 -1.457832 1.0
30867492 S500-PSSS3-S09 G_D180330_S500-PSSS3-S09_MHRM_R02_T0 Caenorhabditis elegans OX=6239 A0A023T4K3 0.182511 0 0.000646 NaN FGHEFLEFEFRPNGSLR False _FGHEFLEFEFRPNGSLR_.4 1.000000 0.926072 -1.735365 1.0
30867493 S500-PSSS3-S09 G_D180330_S500-PSSS3-S09_MHRM_R02_T0 Caenorhabditis elegans OX=6239 A0A023T4K3 0.182511 0 0.000646 NaN FGHEFLEFEFRPNGSLR False _FGHEFLEFEFRPNGSLR_.3 1.000000 0.841734 -1.236124 1.0
31931881 S500-PSSS3-S09 G_D180330_S500-PSSS3-S09_MHRM_R01_T0 Caenorhabditis elegans OX=6239 A0A023T4K3 0.182511 0 0.000646 NaN EATVSESVLSELKR False _EATVSESVLSELKR_.3 1.000000 0.832340 -1.661479 1.0
34060675 S500-PSSS3-S08 G_D180330_S500-PSSS3-S08_MHRM_R01_T0 Caenorhabditis elegans OX=6239 A0A023T4K3 0.182511 0 0.000646 NaN AADFYVR False _AADFYVR_.2 0.956659 0.952245 -1.919342 1.0
34060680 S500-PSSS3-S08 G_D180330_S500-PSSS3-S08_MHRM_R01_T0 Caenorhabditis elegans OX=6239 A0A023T4K3 0.182511 0 0.000646 NaN FGHEFLEFEFRPNGSLR False _FGHEFLEFEFRPNGSLR_.4 0.906760 0.926398 -1.619079 1.0
34060697 S500-PSSS3-S08 G_D180330_S500-PSSS3-S08_MHRM_R01_T0 Caenorhabditis elegans OX=6239 A0A023T4K3 NaN 13 NaN NaN EELQYLIK True _EELQYLIK_.2 NaN NaN -0.977456 1.0
35125069 S500-PSSS3-S07 G_D180330_S500-PSSS3-S07_MHRM_R02_T0 Caenorhabditis elegans OX=6239 A0A023T4K3 0.182511 0 0.000646 NaN EATVSESVLSELKR False _EATVSESVLSELKR_.3 0.962650 0.896980 -1.603957 1.0
35125071 S500-PSSS3-S07 G_D180330_S500-PSSS3-S07_MHRM_R02_T0 Caenorhabditis elegans OX=6239 A0A023T4K3 0.182511 0 0.000646 NaN AADFYVR False _AADFYVR_.2 0.959751 0.873907 -1.456922 1.0
36189465 S500-PSSS3-S07 G_D180330_S500-PSSS3-S07_MHRM_R01_T0 Caenorhabditis elegans OX=6239 A0A023T4K3 0.182511 0 0.000646 NaN EATVSESVLSELKR False _EATVSESVLSELKR_.3 0.947165 0.865787 -1.914139 1.0
36189472 S500-PSSS3-S07 G_D180330_S500-PSSS3-S07_MHRM_R01_T0 Caenorhabditis elegans OX=6239 A0A023T4K3 0.182511 0 0.000646 NaN FGHEFLEFEFRPNGSLR False _FGHEFLEFEFRPNGSLR_.4 1.000000 0.894660 -2.123795 1.0
36189489 S500-PSSS3-S07 G_D180330_S500-PSSS3-S07_MHRM_R01_T0 Caenorhabditis elegans OX=6239 A0A023T4K3 NaN 13 NaN NaN EELQYLIK True _EELQYLIK_.2 NaN NaN -1.575424 1.0
37253861 S500-PSSS3-S06 G_D180330_S500-PSSS3-S06_MHRM_R02_T0 Caenorhabditis elegans OX=6239 A0A023T4K3 0.182511 0 0.000646 NaN EATVSESVLSELKR False _EATVSESVLSELKR_.3 1.000000 0.687519 -1.132478 1.0
37253874 S500-PSSS3-S06 G_D180330_S500-PSSS3-S06_MHRM_R02_T0 Caenorhabditis elegans OX=6239 A0A023T4K3 0.182511 0 0.000646 NaN IGALADVNNSK False _IGALADVNNSK_.2 0.875654 0.499425 -0.858331 1.0
38318257 S500-PSSS3-S06 G_D180330_S500-PSSS3-S06_MHRM_R01_T0 Caenorhabditis elegans OX=6239 A0A023T4K3 0.182511 0 0.000646 NaN EATVSESVLSELKR False _EATVSESVLSELKR_.3 1.000000 0.900043 -2.153550 1.0
38318266 S500-PSSS3-S06 G_D180330_S500-PSSS3-S06_MHRM_R01_T0 Caenorhabditis elegans OX=6239 A0A023T4K3 0.182511 0 0.000646 NaN GKFGHEFLEFEFRPNGSLR False _GKFGHEFLEFEFRPNGSLR_.4 0.979426 0.832022 -1.763272 1.0
39382655 S500-PSSS3-S05 G_D180330_S500-PSSS3-S05_MHRM_R02_T0 Caenorhabditis elegans OX=6239 A0A023T4K3 0.182511 1 0.000646 735648.250000 AADFYVR False _AADFYVR_.2 0.889480 0.733052 -1.420900 1.0
40447049 S500-PSSS3-S05 G_D180330_S500-PSSS3-S05_MHRM_R01_T0 Caenorhabditis elegans OX=6239 A0A023T4K3 0.182511 0 0.000646 NaN EATVSESVLSELKR False _EATVSESVLSELKR_.3 0.945992 0.616186 -1.354987 1.0
40447055 S500-PSSS3-S05 G_D180330_S500-PSSS3-S05_MHRM_R01_T0 Caenorhabditis elegans OX=6239 A0A023T4K3 0.182511 0 0.000646 NaN NEHISFTTGK False _NEHISFTTGK_.3 0.921434 0.785328 -1.711489 1.0
40447059 S500-PSSS3-S05 G_D180330_S500-PSSS3-S05_MHRM_R01_T0 Caenorhabditis elegans OX=6239 A0A023T4K3 0.182511 0 0.000646 NaN QELEILYK False _QELEILYK_.2 1.000000 0.854983 -2.048480 1.0
40447079 S500-PSSS3-S05 G_D180330_S500-PSSS3-S05_MHRM_R01_T0 Caenorhabditis elegans OX=6239 A0A023T4K3 NaN 13 NaN NaN VDALADPNNSKIGDGLR True _VDALADPNNSKIGDGLR_.2 NaN NaN -1.610883 1.0
41511458 S500-PSSS3-S04 G_D180330_S500-PSSS3-S04_MHRM_R02_T0 Caenorhabditis elegans OX=6239 A0A023T4K3 0.182511 1 0.000646 667178.000000 IGALADVNNSK False _IGALADVNNSK_.2 0.989736 0.502940 -1.490796 1.0
42575850 S500-PSSS3-S04 G_D180330_S500-PSSS3-S04_MHRM_R01_T0 Caenorhabditis elegans OX=6239 A0A023T4K3 0.182511 0 0.000646 NaN GKFGHEFLEFEFRPNGSLR False _GKFGHEFLEFEFRPNGSLR_.4 0.950841 0.643791 -1.965023 1.0
44704642 S500-PSSS3-S03 G_D180330_S500-PSSS3-S03_MHRM_R02_T0 Caenorhabditis elegans OX=6239 A0A023T4K3 0.182511 5 0.000646 246017.718750 GKFGHEFLEFEFRPNGSLR False _GKFGHEFLEFEFRPNGSLR_.4 0.995765 0.584585 -2.150846 1.0
48962232 S500-PSSS3-S02 G_D180330_S500-PSSS3-S02_MHRM_R01_T0 Caenorhabditis elegans OX=6239 A0A023T4K3 0.182511 8 0.000646 97127.828125 QELEILYKNEHISFTTGK False _QELEILYKNEHISFTTGK_.4 0.997542 0.368788 -2.107628 1.0

Most of the PG.Quantity values are NaN. But, looking at above table we can see that for e.g. S04:R03 We have PG.Quantity = 377452, even though the FG.NormalizedMS2PeakArea = 0. This is because we have multiple peptide sequences (EG.StrippedSequence) which belong to the protein A0A023T4K3. Spectronaut uses a combination of top 3 intensities combined and Cscore to quantify proteins.

In cases where all samples and runs have 1.0 intensities the PG.Quantity is simply registered as NaN, meaning that the 1.0 intensities do not induce any bias to any downstream analysis. These low intensities are therefore ignored in PG.Quantity.

The 1.0 intensities are not handled in the triqler data parsing module (Code below is not executed as it is only to show snippets from the triqler data parsing module).


from __future__ import print_function

import sys
import numpy as np
import math
import csv
import os
import itertools
import re
from collections import defaultdict, namedtuple


def getTsvReader(filename):
  # Python 3
  if sys.version_info[0] >= 3:
    return csv.reader(open(filename, 'r', newline = ''), delimiter = '\t')
  # Python 2
  else:
    return csv.reader(open(filename, 'rb'), delimiter = '\t')

TriqlerSimpleInputRowHeaders = "run condition charge searchScore intensity peptide proteins".split(" ")
TriqlerInputRowHeaders = "run condition charge spectrumId linkPEP featureClusterId searchScore intensity peptide proteins".split(" ")
TriqlerInputRowBase = namedtuple("TriqlerInputRow", TriqlerInputRowHeaders)

class TriqlerInputRow(TriqlerInputRowBase):
  def toList(self):
    l = list(self)
    return l[:-1] + l[-1]
  
  def toSimpleList(self):
    l = list(self)
    return l[:3] + l[6:-1] + l[-1]
  
  def toString(self):
    return "\t".join(map(str, self.toList()))

def parseTriqlerInputFile(triqlerInputFile):
  reader = getTsvReader(triqlerInputFile)
  headers = next(reader)
  hasLinkPEPs = "linkPEP" in headers
  getUniqueProteins = lambda x : list(set([p for p in x if len(p.strip()) > 0]))
  intensityCol = 7 if hasLinkPEPs else 4
  seenPeptChargePairs = dict()
  for i, row in enumerate(reader):
    if i % 1000000 == 0:
      print("  Reading row", i)
    
    intensity = float(row[intensityCol])
    if intensity > 0.0: # <-------------------- 1.0 intensities passes through triqler 
      if hasLinkPEPs:
        proteins = getUniqueProteins(row[9:])
        yield TriqlerInputRow(row[0], row[1], int(row[2]), int(row[3]), float(row[4]), int(row[5]), float(row[6]), intensity, row[8], proteins)
      else:
        key = (int(row[2]), row[5])
        if key not in seenPeptChargePairs:
          seenPeptChargePairs[key] = len(seenPeptChargePairs)
        proteins = getUniqueProteins(row[6:])
        yield TriqlerInputRow(row[0], row[1], int(row[2]), (i+1) * 100, 0.0, seenPeptChargePairs[key], float(row[3]), intensity, row[5], proteins)

Note the line “if intensity > 0.0:” allowing 1.0 intensities to be passed in triqler. We could set the non decoy intensities to 0.0 and let triqler filter the 1.0 intensities, since they are also ignored in spectronaut.

print("triqler input rows: " + str(len(df_triqler)))

# I don't think decoy intensities are used...
#print("triqler input rows with intensity = 1" + str(len(df_triqler[df_triqler["intensity"] == 1])))
#print("ratio of noise rows to all rows: " + str(len(df_triqler[df_triqler["intensity"] == 1])/len(df_triqler)))

# Therefore it should be
print("triqler input rows with intensity = 1" + str(len(proteins)))
print("ratio of noise rows to all rows: " + str(len(proteins)/len(df_triqler)))
triqler input rows: 52354084
triqler input rows with intensity = 11131973
ratio of noise rows to all rows: 0.021621484199780862

2.16% of the input is noise. I am not certain if this is a large or small number in this case.