FAME Metagenomics workshop 2022

SUPER-FOCUS

SUPER-FOCUS is a tool which allows us to determine the functions present in metagenomic sequencing data. SUPER-FOCUS makes use of Subsystems, a functional classification system which contains three hierarchical levels.

This tutorial will demonstrate how we can to use SUPER-FOCUS to determine the functions a metagenome is performing.

Running SUPER-FOCUS

SUPER-FOCUS has also been downloaded and configured with a database. This means are ready to run SUPER-FOCUS!
(If you need to install SUPER-FOCUS in the future you should refer to the SUPER-FOCUS github for instructions)

We will run SUPER-FOCUS on just the R1 reads like we did for FOCUS. This will still be in your good_out_R1 directory.

Run the command

superfocus -q good_out_R1/ -dir superfocus_out -a diamond

When we run this command, a new directory will be created named superfocus_out which will contain files generated by SUPER-FOCUS. The flag -a refers to what aligner SUPER-FOCUS uses, I’ve told it to use diamond.

Great, now what?

We can start to look at the output which SUPER-FOCUS by taking looking in the output directory

cd superfocus_out

You’ll notice a few files ending with .m8. These are alignment files generated by superfocus.

More importantly, you should notice the files, output_subsystem_level_1.xls output_subsystem_level_1.xls output_subsystem_level_1.xls . Each of these files provides details on the prevalence of each function belonging to the corresponding level. All three levels are contained in the file output_all_levels_and_function.xls

To look at the level 1 output run the command

column -t -s $'\t' -n string output_subsystem_level_1.xls  | less

Here the first four columns correspond to the normalised read counts of each sample, and the second four columns contain the percent abundance of each function.

(Note that the read counts in the superfocus output have been normalised and this is why read counts have decimal values. if you would prefer to have the raw, un-normalised read counts in the output, make sure to run SUPER-FOCUS with the flag -n 0)

When you are done looking at the output, press the letter ‘q’ on your keyboard.

Visualising SUPER-FOCUS with Krona

We can build a Krona plot on our SUPER-FOCUS output just like we did for our Kraken output earlier today.

Again, we need to rearrange the output into a format which Krona can understand. We can rearrange it using this bash command.

tail -n +5 output_all_levels_and_function.xls | awk -F '\t' '{n=$4+$5+$6+$7; print n"\t"$1"\t"$2"\t"$3}' > superfocus_out_krona.tsv

This creates a file, superfocus_out_krona.tsv which can be read by Krona.

Next we can generate our krona plot by running the command

ktImportText superfocus_out_krona.tsv -o superfocusKronaPlot.html

Next download the Krona html file to your desktop using WinSCP. You can open this html file in your favourite browser to reveal a plot of the distribution of functions in the samples. You can zoom in and zoom out to see the different levels of annotations.

scp -r grig0076@115.146.84.253:/home/grig0076/superfocus_out/superfocus_out_krona.tsv . 

Congratulations on making it to the end of the tutorial! I hope you enjoyed it

Feeling lazy? Here’s the final product!