Computational Assessment of Single-Molecule Protein Sequencing

More Info


A single-molecule protein sequencer, which labels only 2 out of 20 amino acids and uses single-molecule TIRF microscopy to measure the order of these fingerprints, opens the door to identify proteins with high fidelity using only a small quantity of sample. From the fingerprint, a key challenge is to detect which protein was measured. We present a first tool that efficiently retrieves the protein sequences by just comparing the fingerprints, even in the presence of a high error rate. A clustering method is first employed to reduce the redundancy of the database. Given a fingerprint, our algorithm employs an efficient filtering strategy to identify potential matches and a dynamic programming to verify the matches found. These matches are then mapped back to the original fingerprint database to get the final proteins. We analyzed the detection behavior on simulated data and investigated how the use of additional information may improve the performance. In addition, we tested whether the fingerprint information is sufficient to solve other problems, such as distinguishing whether a human cell sample contains bacterial or viral proteins.