Home
Reading
Searching
Subscribe
Sponsors
Statistics
Posting
Contact
Spam
Lists
Links
About
Hosting
Filtering
Features Download
Marketing
Archives
FAQ
Blog
 
Gmane
From: Chris Fields <cjfields <at> uiuc.edu>
Subject: Re: Get nucleotide sequence when expecting proteinfromgenpept
Newsgroups: gmane.comp.lang.perl.bio.general
Date: Tuesday 11th July 2006 22:47:38 UTC (over 11 years ago)
Okay, now try this:

use Bio::DB::GenPept;
use Bio::SeqIO;

my $factory = Bio::DB::GenPept->new(-format => 'fasta');
my $seqin = $factory->get_Stream_by_acc('T16005');
my $seqout = Bio::SeqIO->new(-fh => \*STDOUT,
                             -format => 'fasta');
while (my $seq = $seqin->next_seq) {
    $seqout->write_seq($seq);
}

This returns both the nucleotide sequence and the correct protein sequence;
the protein was returned second for some reason, so get_Seq_by_acc misses
it
while get_Stream_by_acc doesn't.  I have notified NCBI about this issue,
but
they will likely just tell me to use the GI number for searches as they are
unique.  Probably a good warning for anyone using accessions for all their
work (I use the GI myself).

Chris

> -----Original Message-----
> From: [email protected] [mailto:bioperl-l-
> [email protected]] On Behalf Of Chris Fields
> Sent: Tuesday, July 11, 2006 5:05 PM
> To: 'Frederick Partridge'; [email protected]
> Subject: Re: [Bioperl-l] Get nucleotide sequence when expecting
> proteinfromgenpept
> 
> It's an imprted PIR record, so there probably is no accession recorded in
> the database.  I think NCBI uses a fallback to nucleotide if it can't
find
> a
> particular accession via protein.  Using the primary ID (the GI#,
7498730)
> works.
> 
> Chris
> 
> > -----Original Message-----
> > From: bioperl-[email protected] [mailto:bioperl-l-
> > [email protected]] On Behalf Of Frederick Partridge
> > Sent: Tuesday, July 11, 2006 4:23 PM
> > To: [email protected]
> > Subject: [Bioperl-l] Get nucleotide sequence when expecting protein
> > fromgenpept
> >
> >
> >
> > I am trying to retrieve various protein sequences from genpept using
> > get_Seq_by_acc. All of them work ok, except one T16005:
> >
> >
> > If I try and retrieve it with a reduced program:
> >
> >
> > #!usr/bin/perl -w
> >
> > use strict;
> >
> > use Bio::Perl;
> > use Bio::SeqIO;
> >
> > my $genpept = new Bio::DB::GenPept;
> >
> > my $seq = $genpept->get_Seq_by_acc('T16005');
> >
> > print ($seq->seq(),'\n');
> >
> >
> >
> > I get back a nucleotide sequence, which is another sequence at NCBI
with
> > the same accession number. (I thought these were meant to be unique?
but
> > evidently not.)
> >
> >
> > I am using bioperl 1.5.1, perl 5.8.1, Mac OS 10.3
> >
> >
> > Could anyone help me to get this protein sequence with my program?
> >
> >
> > Many thanks,
> >
> >
> >
> > Freddie Partridge
> >
> > University of Oxford
> >
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > [email protected]
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> _______________________________________________
> Bioperl-l mailing list
> [email protected]
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
 
CD: 3ms