Technologies/Resources
The technologies and resources available to the Cedars-Sinai research community include high-performance computing (HPC) clusters, virtual servers, centrally managed disk space and enterprise-licensed software packages, such as EndNote, IPA, JMP, Prism and SPSS.
Enterprise Information Services (EIS) Research Informatics and Scientific Computing Core (RISCC) support includes providing enterprise-wide tools for study management, cohort identification, research biobanking, Institutional Review Board (IRB) review, grant management, etc. EIS RISCC provides research database and technology consultation, data extracts for research studies through an honest broker function, and pre- and post-award consulting services, including programming for special projects funded through the UCLA Clinical and Translational Science Institute (CTSI) or research grant chargebacks.
HPC resources are available at no charge to Cedars-Sinai staff. The enterprise research HPC cluster uses Grid Engine and Slurm software to manage bare-metal Linux compute nodes. All compute nodes are built on Intel-based hardware running CentOS.
This system is best suited for tasks that lend themselves to parallel processing with 3Tflops of processing capability. In total, this cluster has 1,012 CPU cores and 11.4TB of RAM available for data analysis, with an additional 1,000 cores, including graphics processing units (GPUs), coming online later this year.
Applications installed on the cluster include:
- FSL and FreeSurfer (functional MRI, MRI and diffusion tensor imaging of the brain)
- NAMD (simulation and modeling of molecular systems)
- MATLAB/Octave (scientific programming and visualization tool)
- R (statistical computing and graphics)
- JAGS (Bayesian model analysis, Markov chain Monte Carlo simulation)
- BioPerl (Perl modules for bioinformatics computation)
- MACH (resolve long haplotypes)
- National Center for Biotechnology Information BLAST (sequencing analysis)
- ClustalW (sequence alignment)
- FASTA (search protein and DNA sequence data)
- GROMACS (molecular simulation)
- BreakDancer (genome-wide detection of variants)
- FFTW3 (discrete Fourier transform)
- QIIME (comparison and analysis of microbial communities)
- TMAP (programs to build genetic maps)
- EMBOSS (tools used for sequence analysis)
- Fortran (compiler)
Other software can be added as requested, subject to compatibility requirements as determined at the time of the request.
Depending on your specific project or activity, a few storage options are available, as listed below. If you are unsure of the type of storage you need, please schedule an infrastructure consultation with us to review options for your use case.
- Network attached storage (NAS)—All investigators are provided access to centrally managed disk storage that is backed up daily. More than 700TB of storage is available, configured as network attached storage.
Example use case: Your lab instruments generate large volumes of raw data that need short-term storage for analysis. - High-performance storage—High-performance storage is provided by Isilon X400 disk systems. This storage is intended for use primarily with the HPC cluster, although it is also available for large datasets that exceed the 5TB architectural limit. A total of 4.7PB of storage with automatic replication to a secondary data center in Phoenix is available for research.
Example use case: You have an instrument that outputs large raw datasets that need to be analyzed by the HPC cluster. Once the results data have been generated, the raw data can be moved over to long-term archival storage. - Cloud storage—Cloud storage on Amazon Web Services (AWS) is available as short-term and long-term (archival) storage. Data transfer automation from short-term to long-term is managed by Starfish Software, with reporting and visibility to labs on a per-user/per-lab basis.
Example use case: Short-term and long-term storage of your research results data. - Box storage—Researchers have access to unlimited Box.com storage in a HIPAA-compliant environment. Box is ideal for long-term file storage, including sharing files with external collaborators. All data stored in Box is encrypted in transit and at rest, so it is safe to use Box to share data. The maximum size for upload is 150GB per file.
Example use case: You have a collaborative research project and wish to share large datasets consisting of large files with an external collaborator. You can securely share the specific Box folder link for that project with your collaborator. The collaborator will need to self-register in Box to access the files.
Biobank/Biorepository Management
- LabVantage—LabVantage is the enterprise-wide research biobanking system used for research specimen tracking and management. The system provides comprehensive sample management through the entire lifespan, including barcode generation in various formats and laboratory information management systems (LIMS) functionality for inventory management of samples, reagents, lab equipment, etc.
Clinical Trials Discovery
- Study Information Portal—The Study Information Portal is an online repository of all active/ongoing cancer clinical trials. Physicians can search for trials and refer patients to specific trials. Similarly, patients can search for clinical trials and request that they be enrolled in them after consulting with their physician.
Cohort Identification
- Los Angeles Data Resource (LADR)—LADR is a tool for finding cohorts amongst a consortium of institutions (Cedars-Sinai, UCLA, University of Southern California, City of Hope and Children’s Hospital of Los Angeles). The application is based on i2b2 and SHRINE and generates criteria-specific deidentified electronic medical record (EMR) data and counts of cohorts at each institution.
- Deep 6 AI—The Deep 6 Cohort Builder platform is a tool researchers can use to assist with identifying subjects for clinical trials. The platform uses natural language processing, a subset of AI, to search structured and unstructured data in the EMR. The system allows for building a search query, generating a list of matching subjects, reviewing source documentation within the tool for subjects, and fine-tuning search criteria to identify the optimal cohort for a clinical trial.
Research Study Management
- OnCore CTMS—The OnCore clinical trial management system (CTMS) is the enterprise-wide system used for clinical trials management. It captures protocol, subject and financials information and provides a systematic way of tracking, accessing, monitoring and reporting on overall clinical trial activity and productivity.
- REDCap—Research Electronic Data Capture (REDCap) is available as a self-service research database development tool. REDCap is best suited for research projects involving online data-entry forms, including surveys, where follow-up reminders may be required. REDCap enables researchers to quickly configure a database (known as a project) by uploading database fields via a spreadsheet or by using a wizard that guides the user through the process. Once the database is ready, the user can submit the database for review and eventually move it to a production status.
- SourceDrive—SourceDrive is the preferred regulatory-compliant, cloud-based solution for managing clinical trial documents and workflows. SourceDrive replaces paper subject and regulatory binders. Trial documents can be distributed, controlled and collected within SourceDrive, helping to ensure accuracy and completeness. SourceDrive is fully compliant with 21 CFR Part 11 and HIPAA, and it is well suited for investigator-initiated trials and Food and Drug Administration trials.
EIS provides a variety of enterprise-licensed software free of charge or at a discounted cost to the academic research community. All software can be requested through Service Center.
- Adobe Creative Cloud—Software for graphic design, video editing, web development and photography. Create scientific figures, diagrams and illustrations from a collection of more than 20 apps all under one umbrella.
- BioRender—Cloud software for creating scientific figures, diagrams and illustrations from a scientifically accurate image library. The software contains thousands of premade icons and templates from more than 30 fields of life science.
- CLC Genomics Workbench—Software for analyzing and visualizing next-generation sequencing, incorporating unique features and widely used algorithms to overcome bottleneck challenges associated with data analysis.
- EndNote—Reference management software package used to manage bibliographies and references when writing papers and grant proposals.
- IPA—Software used to identify related proteins within a pathway. This is helpful when studying differential expression of a gene in a disease. By examining the changes in gene expression in a pathway, the biological causes of a phenotype can be explored.
- JMP—Software for dynamic data visualization and analytics on the desktop. Interactive, comprehensive and highly visual, JMP includes capabilities for data access and processing, statistical analysis, design of experiments, multivariate analysis, quality and reliability analysis, scripting, graphing, and charting.
- MATLAB—MATLAB (matrix laboratory) is a multiparadigm numerical computing environment and fourth-generation programming language. MATLAB allows matrix manipulations, plotting of functions and data, implementation of algorithms, creation of user interfaces, and interfacing with programs written in other languages, including C, C++, Java, Fortran and Python.
- Prism—GraphPad Prism combines scientific graphing, comprehensive curve fitting (nonlinear regression), understandable statistics and data organization. While it won't replace a heavy-duty statistics program, Prism lets you easily perform basic statistical tests commonly used by laboratory and clinical researchers.
- SAS—SAS Desktop allows you to prepare and undertake statistical analysis of your research (or other) data.
- SPSS—SPSS Statistics addresses the entire statistical analysis process—planning, data collection, analysis, reporting—for better decision-making and performance.
Additional software is available at a discounted cost through various research cores.
- FlowJo (available through the Flow Core)— FlowJo is a software package for analyzing flow cytometry data. For FlowJo analysis assistance and questions, or to cancel your license, please contact the Flow Core at flowcore@cshs.org.
- VMware cluster – On-premise VMware virtual servers with either RedHat Linux or Windows and virtual graphics processing unit (vGPU) capability are available for research use. There is no charge for a virtual machine, but there may be some architectural limits on the number of CPUs and/or memory that can be allocated.
- Cloud cluster – Cloud servers with RedHat Linux or Windows and featuring transparent migration of workloads between on-premise and cloud servers. These servers allow for ease of storage of larger datasets, including long-term archival storage.
All server requests are subject to an EIS review prior to provisioning to ensure compliance with technical and security standards.
Contact the Research Informatics and Scientific Computing Core
6500 Wilshire Blvd.
Los Angeles, CA 90048