Skip to contents

Version 1.0.0

This is a major update to:

  • ensure compatibility with R version 4 and Seurat version 5
  • ease the installation by suggesting dependencies instead of importing all of them
  • ease the re-usability by improving the website

Data

New data, corresponding to a subset of real 10X Genomics data:

  • sample “A” (817 cells): barcodes.tsv.gz, features.tsv.gz and matrix.mtx.gz
  • sample “B” (850 cells): barcodes.tsv.gz, features.tsv.gz and matrix.mtx.gz

Renamed to a more general name:

  • color_cnv -> palette_BlWhRd
  • color_gene -> palette_GrOrBl

Removed, because not used:

  • crb.genes
  • mrb.genes
  • mt.genes
  • str.genes

Functions

New functions, related to visualization:

  • plot_alluvial: plot an alluvial plot, showing changes between two annotations in a dataframe
  • plot_doublets_composition: plot cells colored by doublet status, according to their cell type scoring
  • plot_empty: plot an empty figure, for instance used by the plot_piechart function
  • plot_gsea_curve: a wrapper of enrichplot::gseaplot2 to add a title and subtitle with metrics
  • plot_pct: plot the result of Seurat::FindMarkers, i.e. genes in a (pct.1,pct.2) space
  • plot_piechart_subpopulation: plot cells and piechart to make pseudo-quantification analysis
  • plot_subpopultions: a wrapper of Seurat::DimPlot to color only specific cells while display the remaining as a background
  • plot_wordcloud: a wrapper of ggwordcloud::geom_text_wordcloud_area to make wordclouds. It can be applied to visualize (1) differentially expressed genes, (2) several gene sets names together, (3) the genes belonging to a gene set or (4) the over-represented words among a set of gene sets

Changes related to compatibility with recent version of the dependencies:

  • get_gene_sets: change “gs_cat” and “gs_subcat” to “gs_collection” and “gs_subcollection”, to be compatible with new versions of msigdbr package
  • filter_features: if/else statement to check if the data and scale.data assay are present, otherwise, subsetting the Seurat V5.1.0 object by feature does not work (error)
  • find_doublets: compatibility modifications regarding the scDblFinder package
  • plot_qc_density and plot_qc_facslike: replace ..density.. by ggplot2::after_stat(density)

Improvements:

  • get_gene_sets: simplify the output to return only the database as dataframe
  • integration_fastmnn: now includes the step to split the Seurat object to a list. The input is now a Seurat object, and not a list of Seurat objects
  • plot_piechart: if/else statement to check if the input dataframe is empty, and to plot an empty figure if so
  • repro_installation_order: generalization/simplification to any version of BiocManager and packages

Bug fixes:

  • plot_gsea_barplot: the issue with the color palette is solved

Minor changes:

  • add_cell_cycle: default value for verbose is FALSE, and removed “print” messages
  • cell_annot_custom: default value for newname is “annotation” instead of “newgroup”
  • find_doublets: default value for verbose is FALSE, and removed “print” messages
  • load_sc_data: default value for verbose is FALSE
  • Globally, remove the lines setting values in sobj@misc slot

Renaming:

  • fig_plot_gb -> plot_df
  • gsea_run -> run_gsea
  • gsea_plot -> plot_gsea_barplot

Many functions have been removed (see version 0.1.5 to access them) :

  • not used: add_genes_columns, binary_normalization, clustering_eval_mt, clustering_louvain, corr_features, dend_utils, dimensions_eval, find_markers_heterogeneity, find_markers_quick, find_projection, find_vtr, dimensions_reduction, filter_cells_QC, filter_doublets, find_expressing_clusters, grid_scalers, integration_combine_datasets_from_matrices, integration_combine_datasets_from_Seurat, integration_Seurat, integration_Seurat_check, open_rmd, plot_dendrogram, plot_mini_axes, plot_umap_ctrl, plot_umap_markers, plot_umap_QC, traj_root_cell, traj_root_cluster
  • not necessary to be a wrapper: add_QC_metrics, clustering_kmeans, gsea_heatmap, gsea_score, integration_sample_palette, is_feature_present, open_genecards, plot_histogram_QC, plot_inertia, sc_normalization
  • too specific: cell_annot_tica, convert_genes, get_gene_id, paper_functions, plot_filtered_cells
  • resource-consuming: run_foldchange (better to use Seurat::FoldChange)

Version 0.1.5

This is a minor update.

Improvements

  • gsea_run:
    • correction of seed parameter
  • plot_split_dimred:
    • plot feature expression by color_by parameter
    • better legend theme
    • new parameter order to order cells by intensities or by factor levels
  • run_enrichr and gsea_run:
    • generalize column names
  • run_foldchange:
    • change code to optimize memory usage
    • correct mistakes when group1 and group2 are logical

Version 0.1.4

General

Improvements

  • plot_split_dimred: better plot theme (axis, aspect ratio, colors)

Removed

  • gsea_scoring: replaced by gsea_score
  • run_enrichgo: replaced by run_enrichr

Renaming

All the functions to make figures for the paper have the prefix fig_:

  • barplot_sample_fun -> fig_plot_bp_sample
  • barplot_tt_fun -> fig_plot_bb_tt
  • dimplot_fun -> fig_plot_dimplot
  • dotplot_fun -> fig_plot_dotplot
  • gene_fun -> fig_plot_feature
  • make_gsea_barplot -> fig_plot_bp_gsea
  • make_heatmap -> fig_make_heatmap
  • plot_gb -> fig_plot_gb
  • quality_fun -> fig_plot_quality
  • split_tt_fun -> fig_plot_split_tt

Functional Enrichment

GSEA

  • run_foldchange: function to compute fold change, using a UMI count matrix and two groups of cells
  • gsea_run: function to perform a GSEA based on a ranked genes list and a list of gene sets, using clusterprofiler package
  • gsea_plot: function to plot GSEA output (x-axis is NES and y-axis is gene set name). /! There is a problem with fill color when only positive or negative NES are represented.
  • gsea_score: function to score each cell for gene sets content
  • gsea_heatmap: function to make a heatmap of gene set scores x cells matrix

Over-representation analysis

  • get_gene_id: function to convert gene name to gene ID
  • get_gene_sets: function to get gene sets from MSigDB R package (msigdbr)
  • run_enrichr: function to make ORA using clusterprofiler package

Visualization

  • plot_red_and_blue_dimplot: function to represent two groups of cells of interest, useful to check differential expression input

Copy Number analysis

  • read_infercnv: function to read inferCNV output tables
  • cnv_plot_heatmap: function to plot the CNV results as a heatmap
  • cnv_compute_intensity: function to compute CNV intensity as a bulk information
  • cnv_plot_intensity: function to plot CNV intensity as a bulk result
  • cnv_find_peaks: function to identify CNV based on a the CN intensity
  • cnv_label_peaks: function to label CNV peaks based on their chromosomic location
  • cnv_show_peaks: function to represent peaks on a genome
  • cnv_add_score: function to score each cell for a set of genes in a genomic region (a peak), for CN values those genes carry
  • plot_external: function to visualize something coming from another dataframe, as if it was available using Seurat::FetchData. It is useful to represent CNV on a projection available in a Seurat object, using results from inferCNV package

Other functions

  • integration_combine_datasets_from_matrices: function to build a Seurat object from several count matrices
  • find_projection: function to identify the first 2D projection available in a Seurat object
  • plot_prop_heatmap: function to plot a heatmap of prop.table(table(.))
  • plot_qc_density: function to plot an histogram and density of a numeric vector
  • plot_qc_facslike: function to plot a FACS like quality control plot, based on @meta.data slot of a Seurat object, or another dataframe
  • plot_violin: function to plot a violin plot, as Seurat::VlnPlot, where cells can be colored by another gene expression, proportion of cells having positive expression is shown on top of each violin, and random noise spread cells with 0 expression to better visualize their color
  • traj_inference: function wrapper around dynverse to perform trajectory inference using a TI method

Other files

  • cell_markers (data): new names, same markers
  • color_cnv (data): custom colors to visualize CNV results
  • color_gene (data): custom colors to visualize gene expression
  • color_markers (data): new names, new colors
  • dotplot_markers (data): new names, new markers

Version 0.1.3

Improvements

  • plot_dendrogram: new parameters and better documentation

Integration

  • integration_combine_datasets: function to merge several Seurat objects from .rda and/or .rds file
  • integration_sample_palette: function to generate colors for each sample in a combined Seurat object
  • integration_fastmnn: function wrapper around FastMNN to remove batch effect

Dimensionality Reduction

  • run_diffusion_map: function wrapper around destiny package to build a Diffusion Map and add in a Seurat object

Trajectory

  • traj_tinga: function wrapper around dynverse and TInGa packages to perform trajectory inference with TInGa

Visualization

  • plot_barplot: function to make a nice barplot from a proportion dataframe (wrapper using ggplot2 package)
  • plot_c30_palette: function to visualize the c30_palette data
  • plot_label_dimplot: function wrapper around Seurat::DimPlot to label few cells on the representation
  • plot_mini_axes (ongoing): function to make small axes to 2D representation of cells
  • plot_piechart: function to make a piechart from a proportion dataframe (wrapper using ggplot2 package)
  • plot_selector: function wrapper around Seurat::HoverLocator to select cells on a ggplot representation

Other functions

  • cell_annot_tica: function to annotate cells for cell type, using the Tumor Immune Atlas (DOI: 10.1101/gr.273300.120)
  • convert_genes: function wrapper around biomaRt and nichenetr packages to convert gene names between mouse and Human
  • dend_utils: functions to work on dendrogram. This is a pending project considering distance between cells to correct cell type annotation.
  • find_expressing_clusters: function to identify clusters of cells expressing a feature above an expression threshold
  • find_extrema_curve: function to find extrema in a given curve, from a dataframe with (x,y) coordinates
  • find_extrema_density: function to find extrema in a given density, from a numeric vector
  • find_features_order: function to identify a nice order of features along pseudotime, to make a heatmap afterwards
  • gsea_scoring: function to attribute score to MSigDB gene sets, with Seurat::AddModuleScore
  • open_rmd: function to generate a Rmd file in a certain location, with our favorite settings
  • print_to_copy_paste: function wrapper around write.table to print gene names easy to copy paste in MSigDB
  • run_rescale: function to rescale a numeric vector between two values

Other files

  • c30_palette (data): a custom palette of 30 distinguishable colors
  • sample_summary (inst): Rmarkdown notebook to make a analysis summary (not flexible at all)
  • template (inst): Rmarkdown notebook with our favorite settings

Version 0.1.2

Convention naming

All functions and parameters follow the snake convention: my_function.

Trajectory

  • traj_root_cluster: function to identify a root cluster, before trajectory inference
  • traj_root_cell: function to identify a root cell inside a root cluster, to add a root to trajectory

Visualization

  • plot_split_dimred: function wrapper of Seurat::DimPlot() with the parameter split.by, but showing a grey background to help visualization

Version 0.1.1

Functions to help building Singularity container with all R packages installed

  • get.dependency.tree: function to get a graph of dependencies for all R packages installed
  • get.installation.order: function to recover all the URL to download all the packages installed, in an order compatible with dependencies

Automatic cell type annotation and extension to cluster

  • cells.annot.custom: function based on Seurat::AddModuleScore function, to attribute scores to cells according to the expression of some markers of interest. The function attributes to each cell the cell type corresponding to the highest score. A new dataset called cells_markers has been added to defined the cell types
  • cluster_annot: after cell type annotation, an overclustering should be performed. Then, this function is used to extend the annotation to cluster
  • see_dotplot: this is a wrapper around Seurat::DotPlot function to make a dot plot showing the expression of a subset of the genes used for automatic annotation, for cell type defined in a Seurat object. A new dataset called dotplot_markers has been added to defined the genes of interest used to plot the dot plot
  • color_markers is a dataset containing custom colors a UMAP legend showing cell type

Clustering

  • dendroplot: make a dendrogram from a subset of a Seurat object, to see if cells from the same cell types are grouped together, from an object of class hclust
  • do_kmeans: this is a wrapper around the kmeans function
  • plot_inertia: this function plot the inter-class inertia from an object of class hclust, to identify a number of groups to defined with do_kmeans function, according to the hclust dendrogram
  • sample_cell_barcode: subset a Seurat object according to active.ident, based on a sampling on cell barcode, to have an equal number of cell in each identities. This function is supposed to be used before dendroplot

Other functions

  • find.vtr: this function evaluates correlation between each PCA dimensions and some variables of interest, and return variables to regress during normalization when the correlation is higher than a provided threshold
  • get.enrichgo: this is a wrapper around clusterProfiler::enrichGO function
  • gg_color_hue: generates n colors according to ggplot2 default palette
  • read.gtf: to read GTF file and return a data.frame

Version 0.1.0

This package is used to store all functions of interest for single-cell RNA sequencing data analysis. The functions come from https://github.com/abeaude/M2_report. For the moment, only used functions are in the package. All other functions are in the directory unused, in order to not delete anything from the original repository. This package will be enriched with new functions to help analysis and reproducibility of analyses. The package is mainly based on Seurat package.