Computing GC Content

Computing GC Content

In Rosalind’s implementation, a string in FASTA format will be labeled by the ID “Rosalind_xxxx”, where “xxxx” denotes a four-digit code between 0000 and 9999.

Given: At most 10 DNA strings in FASTA format (of length at most 1 kbp each).

Return: The ID of the string having the highest GC-content, followed by the GC-content of that string. Rosalind allows for a default error of 0.001 in all decimal answers unless otherwise stated; please see the note on absolute error below.

Sample Dataset

>Rosalind_6404
CCTGCGGAAGATCGGCACTAGAATAGCCAGAACCGTTTCTCTGAGGCTTCCGGCCTTCCC
TCCCACTAATAATTCTGAGG
>Rosalind_5959
CCATCGGTAGCGCATCCTTAGTCCAATTAAGTCCCTATCCAGGCGCTCCGCCGAAGGTCT
ATATCCATTTGTCAGCAGACACGC
>Rosalind_0808
CCACCCTCGTGGTATGGCTAGGCATTCAGGAACCGGAGAACGCTTCAGACCAGCCCGGAC
TGGGAACCTGCGGGCAGTAGGTGGAAT

Sample Output

Rosalind_0808
60.919540

代碼

import os
os.chdir("/home/owht/R/Rosalind")

def calcuGC(list):
    noCG = list.count("G") + list.count("C")
    GCcon=float(noCG)/len(list)
    return GCcon*100

index = []
seqlist = []
longseq = ""

file = open("rosalind_gc.txt")
line = file.readlines()
file.close()

noline = 0
for seq in line:
    if ">" in seq:
        index.append(seq)
        seqlist.append(longseq.replace("\n",""))
        longseq = ""
        noline +=1
    else:
        longseq = longseq + seq.replace("\n","")
        noline +=1
    if noline ==  len(line):
        seqlist.append(longseq.replace("\n",""))

seqlist = seqlist[1:]
result = []
for longseq in seqlist:
    result.append(calcuGC(longseq))

SeqID = index[result.index(max(result))].replace(">","")
SeqID = SeqID.replace("\n","")

SeqGC = max(result)

file = open("result.txt","w")
file.write(SeqID)
file.write("\r")
file.write(str(SeqGC))
file.close()
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章