Methodology

Genome Assembly and Annotation Pipeline

The Citrus reticulata var. Murcott genome was assembled at the chromosomal level through a comprehensive bioinformatics pipeline integrating PacBio HiFi long-reads, Illumina short-reads, and Hi-C data. The workflow comprised stringent quality control, read filtering and taxonomic classification, primary assembly with HiFiasm, assembly purification to eliminate organellar contaminants, and Hi-C–based scaffolding to achieve chromosome-scale resolution.

Gene prediction and annotation followed a multi-phase strategy, including repetitive element masking with RepeatModeler2 and RepeatMasker, ab initio gene prediction with the Helixer deep learning framework, protein domain characterization using InterProScan5, signal peptide detection with Phobius, and extensive functional annotation through Funannotate, with Gene Ontology (GO) term assignment and UniProt cross-referencing.

Assembly quality was assessed with BUSCO and QUAST, yielding a high-quality reference genome with 98.2% BUSCO completeness. This work represents a concerted effort by the Chilean biotechnology company Meristem, aimed at enhancing the identification of genes associated with agronomically relevant traits, thereby providing a solid foundation for their subsequent editing and the development of new, improved citrus varieties.

Assembly Statistics

Metric Value Description
Total Assembly Size 322.8 Mbp Total length of assembled sequences
Number of Contigs 9 Count of contiguous assembled sequences
Contig N50 38.1 Mbp Length above which 50% of the assembly is contained in contigs
Longest Contig 49.7 Mbp Length of the largest assembled contig
GC Content 36.4% Proportion of guanine and cytosine bases in the genome
BUSCO Completeness 98.2% Percentage of complete orthologs recovered
Predicted Genes 27,043 Number of predicted protein-coding genes
Repeat Content 53.65% Proportion of the genome composed of repetitive elements

Hi-C Contact Maps of the 9 Chromosomes

These heatmaps display the chromatin contact frequencies across the nine chromosomes, derived from Hi-C data. The intensity of each region reflects the level of interaction, allowing the visualization of genomic architecture and large-scale structural organization.

Hi-C map

Genome Browser

Explore the Citrus reticulata var. Murcott genome using the interactive genome browser below. Navigate across chromosomes, zoom into specific regions, and examine diverse genomic features such as genes, annotations, and variants.