High Resolution Alignment of Single-Cell and Spatial Transcriptomes
CytoSPACE is a computational tool from the Newman Lab for assigning single-cell transcriptomes to in situ spatial transcriptomics data, enabling high-resolution tissue cartography.
This site provides a web interface for running CytoSPACE with standard options for bulk ST data such as 10x Visium. Full source code with additional options, including single-cell mode, is available on GitHub.
Please note that this a beta version of the CytoSPACE website, and we are actively making improvements. We appreciate your patience and welcome any feedback via email at cytospaceteam@gmail.com.
© Stanford University 2023
About
Overview of CytoSPACE
CytoSPACE is a computational method for assigning single-cell transcriptomes to in situ spatial transcriptomics (ST) data, requiring as input spatial transcriptomic profiling of a tissue specimen and an annotated scRNA-seq atlas and yielding a reconstructed tissue specimen with both high gene coverage and spatially-resolved scRNA-seq data suitable for downstream analysis. CytoSPACE is offered in both bulk ST and single-cell ST modes.
Bulk ST mode
Bulk ST mode is suitable for any ST data lacking single-cell resolution, such as that from 10x Visium or legacy platforms. These data generally have high gene recovery but are not immediately suitable for single-cell or cell-type-dependent analyses given their resolution. For such data, CytoSPACE outputs can be used to understand cellular substructures, identify colocalization patterns of cell types, analyze differential expression of genes within a cell type according to location, and more.
Single-cell ST mode
Single-cell ST mode is suitable for any ST data with single-cell resolution, such as that from Vizgen MERSCOPE data. While these data have single-cell resolution, they generally have much lower gene recovery and can be prone to noise. For such data, CytoSPACE outputs can be used to enhance gene recovery.
CytoSPACE framework
CytoSPACE works by constrained global optimization to yield robust mappings of single cells to tissue. It operates over the full transcriptome without reduction to pre-selected marker genes or a shared embedding space, thereby retaining sensitivity to subtle cell states.
Optimization framework
Specifically, CytoSPACE constructs (1) a set of input scRNA-seq matching what is predicted to be present in the ST sample, and (2) a set of "sub-spots" available within the ST sample according to the predicted cellular density at each location. With these matched sets, CytoSPACE formulates the tissue reconstruction task as a linear assignment problem and optimally arranges the set of selected scRNA-seq across the set of sub-spots according to a cost function based on transcriptomic concordance between cells and spots.
Bulk ST mode
Given bulk ST data and an annotated scRNA-seq atlas, CytoSPACE first estimates overall cell type fractions in the tissue as well as the number of cells per spot, using these to construct the matched sets.
By default, CytoSPACE estimates the fractional abundance of cell types in the ST specimen using Spatial Seurat. Users may also provide fractional abundance estimates generated by an external deconvolution tool such as Spatial Seurat as well as RCTD, SPOTlight, cell2location, or CIBERSORTx, if desired.
Similarly, CytoSPACE accepts but does not require user-provided estimates for the number of cells per spot, calculating these constraints internally according to a model associating detectably expressed genes with the number of cells contributing mRNA content within a spot. If imaging data is available, users may also provide their own estimates, including those generated by visual cell segmentation methods such as VistoSeg and Cellpose.
Single-cell ST mode
CytoSPACE works similarly in single-cell mode, constructing matched sets of input scRNA-seq data and single-cell ST data and then mapping betewen them. In this case, no cellular density estimate is required. Cell type fractional abundances can be explicitly constrained by user-provided annotations of the single-cell ST data or estimated in the same way as for bulk. In single-cell ST mode, CytoSPACE can separate matched sets by cell type before mapping when single-cell ST annotations are available or map all cells together according only to overall fractions.
Website features
The site offers an interface for running CytoSPACE in bulk mode with basic parameters. The web implementation of CytoSPACE has been adjusted for efficiency and simplicity and must be run with the "lap_CSPR" solver option and with subsampling parallelization ("-sss -nosss 1000"; mapping of 1,000 cells at a time) enabled. Empirically, performance is generally comparable to mapping without subsampling. For single-cell mode, larger files, and extended options, CytoSPACE should be run from source code for the time being.
Tutorials for running CytoSPACE from the web app as well as from source code are provided, as are example datasets for download.
Additional resources
For further details regarding the method and its validation, please refer to the publication release.
To access the full source code, please see the GitHub repository.
To contact the CytoSPACE team, please direct emails to cytospaceteam@gmail.com.
By default, CytoSPACE requires 4 files as input, but this may depend on the type of file the users have. Please see below for further formatting details. All files should be provided in tab-delimited tabular input format (saved as .txt) with no double quotations, except for when providing a tar.gz Space Ranger output.
Due to storage considerations, we may not be able to take file uploads larger than 500MB. Please try to keep the input files under this size. It may be helpful to subset both gene expression files to the genes common to both scRNA-seq and spatial transcriptomics datasets, as they are used for CytoSPACE runs.
Users starting from Space Ranger outputs can substitute (3) ST gene expression and (4) ST coordinates with a single tar.gz file.
If a Space Ranger output is specified, CytoSPACE will automatically attempt to unzip the provided tarball and load the correponding ST expression and coordinates data.
If there are duplicate gene names in the gene expression file, they will be made unique prior to running CytoSPACE, as is done in Seurat.
The tarball should only include the following:
An example file tree for an unzipped tarball is shown below on the left; if downloading from the public 10X Visium data, users can download the files shown below on the right.
We provide helper scripts under the Prepare_input_files section of the source code, which users can use to generate the four .txt input files from Seurat objects.
To account for the disparity between scRNA-seq and ST data in the number of cells per cell type, CytoSPACE requires the fractional composition of each cell type in the ST tissue. By default, CytoSPACE will make estimations internally.
Alternatively, users can choose to provide their own estimated cell type composition. In this case, the provided file must be a table consisting of 2 rows with row names, where the first row contains the cell type labels, and the second row contains the cell fractions of each cell type represented as proportions between 0 and 1. The cell types on the first row should be alphabetically ordered.
If providing this file, please make sure that the cell type labels in the first row match the labels present in the cell type label file, and that the cell type fractions sum to one.
Rather than using the internal mechanism of CytoSPACE for estimating the number of cells per spot, users can provide their own estimates (from image segmentation, for example) in a two-column file with header, in which the first column contains spot IDs and the second contains the number of cells predicted per spot:
CytoSPACE will produce two plots and four text files by default.
cell_type_assignments_by_spot.pdf
(left) contains heatmaps of cell type assignments within the ST sample. Along with a plot showing the total number of cells mapped to each spot, these show the spatial distribution of cell type assignments. Color bars indicate the number of cells of the respective cell type inferred per spot.
cell_type_assignments_by_spot_jitter.pdf
(right) is a single scatterplot showing all assigned cells by their spot location. Each cell is colored based on its cell type.
The plots shown below are generated from our example Visium breast cancer dataset.