Database creation

Workflow for database creation

1) Download plasmid sequences available in NCBI refseq.

2) Extract fasta from tar.gz.

3) Download and extract NCBI taxonomy, which will be fed to pATLAS.

4) Clone this repository:

git clone https://github.com/tiagofilipe12/pATLAS

5) Install its dependencies

6) Configure the database:

createdb <database_name>
pATLAS/patlas/db_manager/db_create.py <database_name>

7) run MASHix.py - the output will include a filtered. fasta file (master_fasta_*.fas).

8) run ABRicate, with CARD, ResFinder, PlasmidFinder, VFDB databases.

# e.g.
abricate --db card <master_fasta*.fas> > abr_card.tsv
abricate --db resfinder <master_fasta*.fas> > abr_resfinder.tsv
abricate --db vfdb <master_fasta*.fas> > abr_vfdb.tsv
abricate --db plasmidfinder <master_fasta*.fas> > abr_plasmidfinder.tsv

9) Download the card index necessary for the abricate2db.py script (aro_index.csv).

10) Update the git submodules (git submodule update --init --recursive) and run abricate2db.py - using all the previous tsv as input.

# e.g.
abricate2db.py -i abr_plasmidfinder.tsv -db plasmidfinder \
    -id 80 -cov 90 -csv aro_index.csv -db_psql <database_name>

11) dump database to a sql file.

Automation of this steps

This steps are fully automated in the nextflow pipeline pATLAS-db-creation.

Creating a custom version of pATLAS

If you require to add your own plasmids to pATLAS database without asking to add them to pATLAS website, you can provide custom fasta files when building the database using the -i option of MASHix.py. Then follow the steps described above.

Last updated