Import
Import functions allow the users to import a
JSON
file (extension .json
), or multiple files, with their own plasmid data. These files are generated by pATLASflow, a pipeline to run mapping, mash screen and assembly methods for pATLAS. They can also be generated through FlowCraft recipesThe
json
files can be imported using a the Upload file...
button or by dragging and droping the files to the text box on the right of this button.To do so, you can use two different programs:
- already performed qc analysis, assemblies and every required analysis before, mash dist,mash screen and mapping approaches here provided.
- or mash screen approach. The pipeline will handle qc analysis and trimming with defaultparameters described in FlowCraft documentation and then perform the desiredanalysis (either mash dist / assembly, mash screen or mapping).
conda install nextflow
The mapping pipeline can be run with the following command:
nextflow run tiagofilipe12/pATLASflow --mapping --reads "your_folder/*.fastq"
The resulting
JSON
file can then be provided to pATLAS in the Mapping menu.The mash screen pipeline can be run with the following command:
nextflow run tiagofilipe12/pATLASflow --mash_screen --reads "your_folder/*.fastq"
The resulting
JSON
file can then be provided to pATLAS in the Mash screen menu.The sequence pipeline can be run with the following command:
nextflow run tiagofilipe12/pATLASflow --assembly --fasta "your_folder/*.fasta"
The resulting
JSON
file can then be provided to pATLAS in the Assembly menu.A consensus approach between the Mash screen and Mapping results. To generate this
JSON
input users must run the following command:nextflow run tiagofilipe12/pATLASflow --mapping --mash_screen --reads "your_folder/*.fastq"
Then, the following
JSON
file can then be provided to pATLAS in the Consensus menu.In order to use pATLAS recipes using FlowCraft there a 4 recipes that you can use:
- Mapping
First build the pipeline script with this command:
flowcraft.py build -r plasmids_mapping -o pipeline
And then execute the pipeline by running nextflow in the script:
nextflow run pipeline.nf
- Assembly / Mash Dist
First build the pipeline script with this command:
flowcraft.py build -r plasmids_assembly -o pipeline
And then execute the pipeline by running nextflow in the script:
nextflow run pipeline.nf
- Mash Screen
First build the pipeline script with this command:
flowcraft.py build -r plasmids_mash -o pipeline
And then execute the pipeline by running nextflow in the script:
nextflow run pipeline.nf
- All
This will run all the above pipelines in the same command and generate different outputs for each one of the approaches.
First build the pipeline script with this command:
flowcraft.py build -r plasmids -o pipeline
And then execute the pipeline by running nextflow in the script:
nextflow run pipeline.nf
Results will be available within the current working directory in a folder named:
results
. These files can be uploaded to their respective menus within the pATLAS sidebar menu.You can also use
flowcraft.py report
module to generate interactive reports that can send requests to pATLAS directly without importing a file to pATLAS.After loading the files through any of these popup menus and setting the desired cutoffs, a new popup will appear asking if the user wants to use the redundancy option for importing results into the pATLAS matrix.
This option was created because plasmids are highly chimeric and modular by nature and this renders that results often contains redundant information. Consider the following examples:
- Two plasmids are highly related (and thus they are linked in pATLAS) and results show that HTS data has a 100% identity with both, but one of them is larger than the other (let's say one has 5kb and another has 50kb). In this case the plasmid with the same % identity but that is larger is the more likely plasmid to be present in our data.
- HTS data suggest that we may have:
- one plasmid with 100% identity and sequence length of 5kb.
- another plasmid with 90% identity and sequence length of 50kb.
- both plasmids are highly related (and thus they are linked inpATLAS matrix).
In the 2nd case, despite the first plasmid presents a higher identity, the second plasmid presents an overall larger sequence similarity and thus the second plasmid should be the more likely plasmid to be contained in the sequencing data.
Hence, this option was added in order to help dealing with this problem and to make a "guess" of the most likely plasmids instead of reporting all hits from the pipelines described above.
All linked plasmids are compared with each other in order to know which one is the best hit from a given group of linked plasmids. If they are not linked, they will not be compared. So, if we have two different groups of plasmids it is likely that HTS data contain two plasmids.
However, each different import types has different calculations to "guess" the best hit for the plasmids within each group, since they are generated by different approaches and pipelines.
Therefore each pair of linked plasmids will be compared as described below for each one of the imports:
- Mapping
plasmid1 percentage * plasmid1 length - plasmid2 percentage * plasmid2 length
Interpretation: If this result is
> 0
it means that the plasmid1 is the "best hit". If this result is < 0
it means that the plasmid2 is a "better hit" than plasmid1. However, if calc results is = 0
this means that both are "best hits".Note:
percentage
is the percentage of the queried plasmid that is covered by HTS data, resulting from mapping.- Mash screen
plasmid1 identity * plasmid1 length - plasmid2 identity * plasmid2 length
Note:
identity
is the percentage identity, from the mash screen output, of the queried plasmid and the HTS data.Interpretation: If this result is
> 0
it means that the plasmid1 is the "best hit". If this result is < 0
it means that the plasmid2 is a "better hit" than plasmid1. However, if calc results is = 0
this means that both are "best hits".- Assembly
plasmid1 identity * plasmid1 shared hashes * plasmid1 length - plasmid2 identity * plasmid2 shared hashes * plasmid2 length
Note:
identity
is the percentage identity, from the mash dist output, of the queried plasmid and the HTS data Note 2: shared hashes
is a measure of the percentage of sequence that are shared between the HTS data and the plasmid. This is useful because mash dist reports identity of the smallest sequence against the larger sequence.Interpretation: If this result is
> 0
it means that the plasmid1 is the "best hit". If this result is < 0
it means that the plasmid2 is a "better hit" than plasmid1. However, if calc results is = 0
this means that both are "best hits".Last modified 5yr ago