Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GFF file mismatch. #55

Closed
a14578 opened this issue Jun 16, 2021 · 6 comments
Closed

GFF file mismatch. #55

a14578 opened this issue Jun 16, 2021 · 6 comments

Comments

@a14578
Copy link

a14578 commented Jun 16, 2021

Hello, I'm getting a GFF file mismatch error:
*** ERROR ***
gff_check.cpp: Protein id "--output_00002" is not in the .gff-file

I've generated the gff file using prokka and have removed the whole genome assembled fasta sequence from the bottom of the file. I've also checked all of the files I am using in the amrfinder plus command (protein, nucleotide and gff files) and they all contain "--output_00002", so I'm not sure what I'm missing.

The link for "Using Prokka or RAST GFF files with AMRFinderPlus" doesn't work either.

Do you have any advice please?

@julianzaugg
Copy link

julianzaugg commented Jun 17, 2021

Possibly unrelated, but I also had issues using the GFF output from Prokka. My hacky-solution was to edit the attributes field for each sequence ("^contig" in my case) in the GFF file. Example below.

grep "^contig" prokka_out.gff | sed "s/Name=/OtherName=/g" | sed "s/ID=/Name=/g" > amrfinder_out/prokka_out_clean.gff
amrfinder \
--protein prokka_out.faa \
--gff amrfinder_out/prokka_out_clean.gff \
--plus \
--threads 30 > amrfinder_out/amrfinder.tsv

@evolarjun
Copy link
Contributor

Thanks @julianzaugg!

It appears that a link to instructions our wiki wasn't working. Please let me know if you know where the broken link was coming from. Here's the perl one-liner from the wiki help that does what @julianzaugg's nice sed pipeline does. It also trims off FASTA at the end because we've gotten examples of Prokka GFFs with FASTA files included in them (might not be necessary anymore):

perl -pe '/^##FASTA/ && exit; s/(\W)Name=/$1OldName=/i; s/ID=([^;]+)/ID=$1;Name=$1/' <prokka_output.gff>  > <amrfinder.gff>

@chaoyanggu
Copy link

hi, I met the same error;
GFF file mismatch.
*** ERROR ***
gff_check.cpp: Protein FASTA id "CRE.p1-1_clust1_00003" is not in the GFF file

But I checked my three input files, all contained the CRE.p1-1_clust1_00003.
gff:
CRE.p1-1_clust1_00003 Prodigal:2.6 CDS 82 486 . + 0 ID=CRE.p1-1_clust1_00003;Name=CRE.p1-1_clust1_00003;Name=tufB;gene=tufB;inference=ab initio prediction:Prodigal:2.6,similar to AA sequence:UniProtKB:P0CE48;locus_tag=CRE.p1-1_clust1_00003;product=Elongation factor Tu 2

faa:

CRE.p1-1_clust1_00003
MFRKLLDEGRAGENVGVLLRGIKREEIERGQVLAKPGSIKPHTQFESEVYILSKDEGGRH
TPFFKGYRPQFYFRTTDVTGTIELPEGVEMVMPGDNVNMVVTLIHPIAMDDGLRFAIREG
GRTVGAGVVAKVIA

ffn:

CRE.p1-1_clust1_00003
ATGTTCCGCAAACTGCTGGACGAAGGCCGTGCTGGTGAGAACGTAGGTGTTCTGCTGCGT
GGTATCAAACGTGAAGAAATCGAACGTGGTCAGGTACTGGCGAAGCCAGGCTCTATCAAG
CCACACACCCAGTTCGAATCTGAAGTGTACATCCTGAGCAAAGATGAAGGTGGTCGTCAC
ACTCCATTCTTCAAAGGCTACCGTCCACAGTTCTACTTCCGTACCACTGACGTGACCGGT
ACCATCGAACTGCCAGAAGGCGTAGAGATGGTAATGCCAGGCGACAACGTGAACATGGTT
GTAACCCTGATTCACCCAATCGCGATGGACGACGGTCTGCGTTTCGCAATCCGTGAAGGC
GGCCGTACCGTTGGCGCAGGTGTTGTTGCTAAAGTTATCGCTTAA

I don't know why, and I hope you can give me some advices,thank you.

@vbrover
Copy link
Contributor

vbrover commented Nov 14, 2022

We have implemented the amrfinder option

-a ANNOTATION_FORMAT, --annotation_format ANNOTATION_FORMAT
    Type of GFF file: bakta, genbank, microscope, patric, pgap, prokka, pseudomonasdb, rast

What is the format of your GFF file?
Please attach your protein, nucleotide and GFF files for us to test.

@evolarjun
Copy link
Contributor

Hi @chaoyanggu,

I would also point you at the section of our documentation on input formats. Most likely you just need to use one of the --annotation_format options that Slava mentioned above.

Hope that helps,
Arjun

@chaoyanggu
Copy link

I added the option -a prokka, now I can running the codes normally, because my gff was generated by prokka, not the default genbank.
Thank you all very much.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants
-