Campylobacter analysis

Campylobacter are gram-negative bacteria responsible for the majority of the cases of foodborne bacterial infections (Kirk et al. 2015, The European Union One Health 2019 Zoonoses Report). Although poultry has been pointed as the major source of campylobacteriosis, several cases have been linked to other sources such as ruminants or environment (Cody et al. 2019). Campylobacteriosis causes gastroenteritis, with symptoms that involve diarrhea and fever, but it may also be responsible for a neurological disorder called Guillain-Barré syndrome. So far, 17 Campylobacter species and six subspecies have been described, from which Campylobacter jejuni and Campylobacter coli are the most commonly associated with human illness (ECDC 2018). Thermophilic Campylobacter species grow at temperatures between 37ºC and 42ºC (41.5ºC being the optimal temperature) (Silva et al. 2011). The thermophilic species are the ones that are of greatest concern for human illness.

In a WGS approach regarding Campylobacter spp. it might be of importance to identify the species. The similarity between different isolates (from clinical, animal or environmental sources), and their respective virulence and antimicrobial resistance markers is essential for a proper disease surveillance. Campylobacter serotyping is based on Penner serotyping scheme, which relies on a hemagglutination assay of lipooligosaccharides (LOS) and of a capsule polysaccharide (CPS), with CPS being the primary serodeterminant (Penner & Hennessy, 2000, Parkhill et al. 2000, Karlyshev et al. 2000, Pike et al. 2013). More than 40 Campylobacter serotypes have been described with this methodology. Nevertheless, similar to what happens with other species, molecular typing has a higher discriminatory power, which is useful for epidemiological purposes.

Typing methods

An ideal typing method presents not only a high discriminatory power, but also high reproducibility and the possibility of automation. For this reason, molecular typing is a constantly evolving field always seeking for better technologies. Nowadays, different techniques can be applied for Campylobacter molecular typing, namely:

  • Pulsed Field Gel Electrophoresis (PFGE) - PFGE is a fragment length restriction analysis that has long been considered the most discriminatory typing method for Campylobacter in the pre-WGS era (Sabat et al. 2013, Frazão et al. 2020). This is currently the “gold-standard” for PulseNet network, and has been used by public health authorities and food regulators for outbreak investigations.
  • MLVA (Multiple locus variable tandem repeat analysis) - Multiple Locus Variable Number of Tandem Repeats Analysis is a PCR-based typing method, which is another typing tool used by the PulseNet network (before WGS). This method is able to differentiate fast-evolving bacteria even if they look similar with PFGE. Therefore, MLVA is usually performed as a complement to PFGE results, thus providing a useful resource during outbreaks (Techaruvichit et al. 2015).
  • MLST (Multi-Locus Sequence Typing) - As for other bacteria, a MLST method based on 7-locus (asp, gnl, glt, gly, pgm, tkt, and unc) has been developed for Campylobacter (Dingle et al. 2001). MLST can provide faster results compared to PFGE, and it is highly reproducible. However, it shows lower discriminatory power than PFGE and MLVA (Sabat et al. 2013, Techaruvichit et al. 2015, Frazão et al. 2020), and therefore it was suggested that it should be used as a complement to PFGE (Frazão et al. 2020). A big advantage of MLST analysis for Campylobacter in comparison for instance to PFGE, is the existence of a curated database with common nomenclature which allows the comparison of results between studies (PubMLST), which has made this technique being widely used in epidemiological studies.
  • Sequencing of the short variable region (SVR) of the flaA gene - This method relies on the analysis of the genetic sequence of flaA in comparison with the alleles present in PubMLST, and has been described as a fast, discriminatory and reproducible tool to discriminate among Campylobacter isolates. This technique is useful in combination with PFGE or MLST to differentiate outbreak-related isolates (Niederer et al. 2012, Mohan & Habib, 2019, Frazão et al. 2020). Nevertheless, PFGE and MLVA are more discriminatory and used tools for Campylobacter typing in epidemiology (Frazão et al. 2020).
  • CRISPR - High-resolution DNA melt curve analysis (HRMA) of the CRISPR region can be used to differentiate among Campylobacter isolates (Price et al. 2007). This technique has been shown to be less discriminatory than PFGE, MLST or even SVR of the flaA gene (Frazão et al. 2020). Nevertheless, when used in combination with another typing method such as MLST, this technique has proven to be useful for epidemiological studies (Kovanen et al. 2014).
  • WGS (Whole-Genome Sequencing) - With the advent of NGS technologies, WGS was proven to be useful for Campylobacter outbreak investigation (Joensen et al. 2020). The Campylobacter genome size is approximately 1.8Mb with ~1,800 genes. By providing information at the genomic level, WGS allows not only a highly discriminatory typing (cgMLST, wgMLST and SNP-typing), but also to establish the backward compatibility with previously mentioned molecular typing methods, as 7-loci MLST, which, for this reason, will tend to continue to be used. Furthermore, it allows the analysis of specific genes, such as virulence factors and antimicrobial resistance genes, contributing to a better understanding of the different pathogenic populations. Genetic clustering using WGS can be performed on any distance measure (eg. issued from allelic differences detected using cgMLST typing) or evolutionary-model based clustering (ie. phylogenetics) relying on variants/SNPs detection. PulseNet network is making efforts to implement WGS as a routine tool to replace PFGE and MLVA. Nevertheless, this is still not the routine in the case of Campylobacter.

“One Health” surveillance and WGS of Campylobacter

The identification of sources of infection and the knowledge of pathogens’ genomic features is essential for proper surveillance and outbreak monitorization. Hence, an integrated analysis of clinical, food and veterinary samples relying on the concept of One Health is the key to achieve a good surveillance system. As shown here by PulseNet network, the high discriminatory power of WGS increases the chances to find the bacterial source of infection, and possibly reduces the time that it takes. Indeed, WGS analysis has proven to be an effective way to determine the genetic clustering of Campylobacter isolates, as well as the source of infections (Joensen et al. 2020). According to the European Union One Health 2019 Zoonoses report, surveillance systems for infections by Campylobacter are present in almost all member states, with the notification of campylobacteriosis being mandatory in 21 countries. Moreover, Campylobacter is monitored along the food chain. Nevertheless, WGS is not yet being implemented in routine Campylobacter surveillance.

WGS lab protocol

DNA extraction

Before DNA extraction, Campylobacter is cultured in the laboratory. These bacteria are microaerophilic, and for this reason they should be cultured under an oxygen-reduced atmosphere (Buss et al. 2019). Moreover, C. jejuni is usually cultured at 41.5ºC, as it only grows at temperatures between 30ºC and 42ºC (Duffy & Dykes, 2006). Regarding DNA extraction, there is not a standard protocol or kit that is used, but many studies use QIAGEN DNeasy Blood or Tissue kit or DNA QIAamp Mini Kit (Qiagen, The Netherlands) (Meistere et al. 2019, Dunn et al. 2018, Dahl et al. 2020, Joensen et al. 2020).

Sequencing technology

There is not a prefered WGS technology to sequence Campylobacter. Similar to other fields, Illumina paired-end reads represent the most commonly used strategy. Due to the number of samples that can be handled at a single run and the possible higher read size, MiSeq sequencing machines seem to be the choice for the majority of the labs.

Bioinformatics protocol

Mapping or assembly

The first step to perform when receiving the sequencing data, is to evaluate the sequencing quality and perform trimming and cleaning of the reads (see Data preprocessing).

The cleaned sequence data can then be used for downstream analysis following one of two approaches (or both in parallel, check Data production):

  • De novo genome assembly of the sample(s),
  • Read mapping of each sample on a reference sequence (obtained from a database or by de novo genome assembly of one of your samples)

It is important to note that both approaches have advantages and disadvantages. The decision on which of them to follow should be made according to what is more appropriate for the data at hand, and the purpose of the analyses. De novo genome assembly of all sequenced isolates followed by their annotation seems to be a common approach in studies including Campylobacter genomes, which then perform a cgMLST analysis. A commonly used de novo genome assembler for Campylobacter is SPAdes (Dunn et al. 2018, Redondo et al. 2019, Kelley et al. 2020). It performs very well and is freely available. As for read mapping, when performed, it usually relies on the usage of Bowtie or BWA (Golz et al. 2020, Dunn et al. 2018, Mandal et al. 2017, Wallace et al. 2020, Chung et al. 2016). There are command-line pipelines, such as INNUca, which incorporate these programs and provide the opportunity to automatically perform all the analyses from quality control to genome assembly. If a platform with predefined pipelines (and that usually does not require bioinformatics skills) is preferred, Enterobase is available for Campylobacter. In addition, the IRIDA system and CLC Genomics Workbench is in common use.

Choosing a reference genome

Should an analysis require the use of a reference genome, the choice of the reference genome is a crucial step. Analyses relying on read-mapping approaches might be strongly influenced by reference choice, as the genetic distance between the reference and the sample may influence the performance of downstream steps, namely SNPs/INDELs calling (Pightling et al. 2014, Pightling et al. 2015). This reference can be picked from the samples (after genome assembly), or from a public database. A read mapping approach is not commonly used in Campylobacter analysis, and for this reason there is not a specific reference genome in public databases that is in common use.

Getting SNPs

How to detect SNPs is described earlier.

Briefly, there are three different approaches.

  • Perform de novo genome assembly of each sample and then align their genomic sequences (or gene sequences after annotation). Campylobacter analyses usually use MAUVE or PRANK to align the genomes (Clark et al. 2018, Weis et al. 2016, Fiedoruk et al. 2019, Parker et al. 2021). The last aligner is mostly used as part of the pan-genome pipeline Roary.
  • Use a reference genome where the reads of all the samples will be mapped (check above), and then use a variant-calling pipeline to determine the polymorphic positions. CFSAN SNP is a commonly used pipeline which performs both processes (read mapping and variant calling). Snippy is also a commonly used alternative.
  • Determine the polymorphic positions in the sample by analyzing the k-mer pattern using kSNP. For this approach you can either provide the genome assembly, or the cleaned genomic reads. This is the less frequently used approach for Campylobacter.

Each of these approaches provides you with information about the genetic variability of your dataset. This information can then be used to perform SNP-based clustering and phylogenetic analysis. Alternatively, if you follow a read mapping approach, you can replace the reference nucleotide by the observed allele, and consequently reconstruct the haplotype of each sample. This is the approach used by the CFSAN SNP pipeline.

Getting alleles and allele differences

The allele sequences of the samples can be retrieved by:

  • Replacing the nucleotide of the reference genome by the observed alternative allele, and then retrieve the sequence of each gene of interest considering the genome annotation of the reference.
  • Obtaining the de novo genome assembly of each sample, and performing the respective genome annotation. Prokka is acommonly used program for Campylobacter.
  • Some allele callers, such as chewBBACA, provide locus-specific alignments in an automated manner, being a good option to determine the allelic profile of samples.

It is important to note that nowadays there are several platforms which can automatically do all this analysis. One of the more commonly used for Campylobacter is BIGSdb. These platforms provide assembly, serotyping and allele calling. Several of these platforms are mentioned in the xMLST section.

Allele based typing

Allele-based typing consists of retrieving clustering information considering the different alleles present in a population for a given set of genes (e.g. the core genome). With the advent of WGS, the 7-loci based MLST approach was broadened to the use of a cgMLST or a wgMLST approach. In this context, there is a public cgMLST scheme which has been used in Campylobacter jejuni/coli analysis considering an allele-based approach. This scheme comprises 1,343 loci (Cody et al. 2017).

Platforms available for cgMLST typing of Campylobacter include BIGSdb, BioNumerics, IRIDA, Pathogen Watch, and Ridom SeqSphere+. BIGSdb and BioNumerics seem to be commonly used by the community.

SNP based typing

A SNP-based approach relies on the comparison of SNPs in a population. This strategy can be seen as an alternative to the allele-based approach, but many studies actually perform both of them and assess the overlap of the results. For a SNP-based analysis all of the the SNPs that are present in the samples need to be acquired and used to obtain clustering information. Examples of publicly available pipelines for SNP-based typing are:

  • Center for Food Safety and Applied Nutrition (CFSAN) HqSNPs pipeline
  • Lyve-SET pipeline for HqSNPs typing
  • SNV-Phyl (Canadian Public Health Agency)
  • PHEnix (The Public Health England SNP calling pipeline)

Outbreak definition

As defined by the World Health Organization, “a disease outbreak is the occurrence of cases of disease in excess of what would normally be expected in a defined community, geographical area or season”. WGS data provides a high discriminatory power allowing clustering of different isolates (from different geographical areas, and clinical, animal or environmental sources) according to their genomic similarity. This contributes not only to an earlier detection of outbreaks and determination of contamination sources, but also to the detection of more outbreaks, as has been reported by PulseNet network for Listeria. It is difficult to establish a clear cluster outbreak definition, a threshold at which we decide whether two isolates belong to the same genetic cluster, thus linking two cases of infection. Epidemiological related Campylobacter isolates can be distinguished from unrelated ones (Llarena et al. 2017). Nevertheless, the genomic variability within an outbreak-related clade varies depending not only on the dataset, but also on the methodology used (e.g. which MLST or cgMLST scheme is used). Furthermore, mixed infections may also influence the results (Llarena et al. 2017). In two independent outbreaks, a 3 SNPs variation has been found among the isolates (Revez et al. 2014a and Revez et al. 2014b). Using a 732-core-gene schema, Clark et al. (2016) found 4 allele differences between isolates. Lahti et al. (2017) described a maximum of 1 allele difference between clinical isolates, considering a reference-based cgMLST with 1,271 loci. Therefore, so far, there is no specific threshold used to define Campylobacter outbreaks, and more studies on the genetic variation within and between Campylobacter populations would provide a great contribution to the field.

Virulence and AMR

Similar to other pathogens, several genes are important for Campylobacter ability to cause infection, and therefore genes such as cadF and ciaB have been described as medically relevant (eg. Wu et al. 2016, Dasti et al. 2009, Fiedoruk et al. 2019, Chukwu et al. 2019). Moreover, despite the majority of infections not requiring the administration of antimicrobial drugs, in severe cases of disease antimicrobial therapy can be provided. In recent years, an increased resistance to these drugs has been observed in Campylobacter becoming a concern for public health authorities (CDC).

Several studies have determined genes and respective variations which are potentially related with increased virulence or specific antimicrobial resistance (e.g. Bravo et al. 2021, Lluque et al. 2017, Gahamanyi et al. 2021, Aksomaitiene et al. 2021). Moreover, the existence of several events of horizontal gene transfer may contribute to increase the list of relevant genes for this species (Aksomaitiene et al. 2021, Hull et al. 2021). For this reason, it is important to determine the presence of medically important genes/variations in the isolates. As mentioned in the Virulence and AMR detection section, where more details can be found, this is performed by comparing the genome to a database comprising a set of genes of interest. Examples of predefined resistome databases are mentioned in the same section.