Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2020 Apr 1;9(4):giaa033.
doi: 10.1093/gigascience/giaa033.

Laniakea: an open solution to provide Galaxy "on-demand" instances over heterogeneous cloud infrastructures

Affiliations

Laniakea: an open solution to provide Galaxy "on-demand" instances over heterogeneous cloud infrastructures

Marco Antonio Tangaro et al. Gigascience. .

Abstract

Background: While the popular workflow manager Galaxy is currently made available through several publicly accessible servers, there are scenarios where users can be better served by full administrative control over a private Galaxy instance, including, but not limited to, concerns about data privacy, customisation needs, prioritisation of particular job types, tools development, and training activities. In such cases, a cloud-based Galaxy virtual instance represents an alternative that equips the user with complete control over the Galaxy instance itself without the burden of the hardware and software infrastructure involved in running and maintaining a Galaxy server.

Results: We present Laniakea, a complete software solution to set up a "Galaxy on-demand" platform as a service. Building on the INDIGO-DataCloud software stack, Laniakea can be deployed over common cloud architectures usually supported both by public and private e-infrastructures. The user interacts with a Laniakea-based service through a simple front-end that allows a general setup of a Galaxy instance, and then Laniakea takes care of the automatic deployment of the virtual hardware and the software components. At the end of the process, the user gains access with full administrative privileges to a private, production-grade, fully customisable, Galaxy virtual instance and to the underlying virtual machine (VM). Laniakea features deployment of single-server or cluster-backed Galaxy instances, sharing of reference data across multiple instances, data volume encryption, and support for VM image-based, Docker-based, and Ansible recipe-based Galaxy deployments. A Laniakea-based Galaxy on-demand service, named Laniakea@ReCaS, is currently hosted at the ELIXIR-IT ReCaS cloud facility.

Conclusions: Laniakea offers to scientific e-infrastructures a complete and easy-to-use software solution to provide a Galaxy on-demand service to their users. Laniakea-based cloud services will help in making Galaxy more accessible to a broader user base by removing most of the burdens involved in deploying and running a Galaxy service. In turn, this will facilitate the adoption of Galaxy in scenarios where classic public instances do not represent an optimal solution. Finally, the implementation of Laniakea can be easily adapted and expanded to support different services and platforms beyond Galaxy.

Keywords: PaaS; cloud; galaxy; on-demand; workflow.

PubMed Disclaimer

Figures

Figure 1:
Figure 1:
Laniakea architecture. The Laniakea Dashboard is the front-end that users access to configure and manage Galaxy instances. When a new Galaxy instance is requested by a user, the resulting TOSCA (Topology and Orchestration Specification for Cloud Orchestration) [45] template is sent to the platform as a service layer that employs INDIGO services to deploy the instance over the infrastructure as a service (IaaS), retrieving the needed virtual hardware, storage, and networking resources. Finally, the Galaxy instance is configured with the requested set of tools (flavour) and attached to a plain or Linux Unified Key Setup (LUKS) encrypted storage volume and the CernVM-FS shared volume hosting reference data. At the end of the process, a public IP address is assigned to the freshly minted Galaxy instance and made available to the user.
Figure 2:
Figure 2:
Laniakea Dashboard home page. Each tile provides a quick explanation of the corresponding application and links to the configuration panels (see Fig. 3). Deployments using virtual machine snapshots correspond to the tiles labelled as “Express.” Deployments using Docker containers correspond to the tiles labelled as “Docker.” Finally, deployments using Ansible recipes correspond to the tiles labelled as “Live Build.”
Figure 3:
Figure 3:
Laniakea Dashboard configuration panels. The “Virtual hardware” tab (left) allows the selection of the virtual hardware in terms of number of virtual CPUs, amount of RAM, size of the data volume (encrypted or not), and number and hardware configuration of the worker nodes (only for cluster deployments), and it requires the public SSH key of the user. The “Galaxy” tab (right) is used to tweak the software configuration: Galaxy version, description of the instance, the e-mail address of the administrator, Galaxy flavour (see Galaxy flavours), and reference data repository.
Figure 4:
Figure 4:
Laniakea Dashboard information and management interface. It reports the name, current status, creation time, initial Galaxy flavour, virtual hardware setup (virtual machine flavour), and the URL (endpoint) of each Galaxy instance generated by the user. Galaxy instances that are needed no more can be deleted using the “Delete” button.
Figure 5:
Figure 5:
The GDC Somatic Variant Calling analysis pipeline implemented as a Galaxy workflow for the corresponding flavour. The workflow design interface of Galaxy is a powerful instrument to elaborate complex workflows chaining together the output and the input of different tools in an intuitive fashion.
Figure 6:
Figure 6:
The relationship between Galaxy, the filesystem, and dm-crypt. Data are encrypted and decrypted on-the-fly when writing and reading through dm-crypt. The underlying disk encryption layer is entirely transparent for Galaxy.
Figure 7:
Figure 7:
Storage encryption workflow. (1) The user logs into the Laniakea Dashboard and enables data encryption when configuring a new Galaxy instance. (2–3) The Dashboard contacts Hashicorp Vault, using an Identity and Access Management token, to retrieve a one-time Vault token. The one-time token is used to avoid transmitting user credentials over the infrastructure and limit any potential damage from a malicious attacker intercepting it (4). The one-time token is passed to the encryption script on the virtual machine through the INDIGO Orchestrator service. (5) A random passphrase is generated, and the data volume is encrypted by Linux Unified Key Setup (LUKS), unlocked, and formatted. It is now ready to be attached to the new Galaxy instance. (6) The encryption script logs into Vault using the one-time token and stores the encryption passphrase that will be accessible only by the Laniakea user that requested the encrypted volume (7–8). The user can retrieve the passphrase at any time using the Dashboard. For example, if the encrypted volume needs to be remounted (usually after a reboot of the Galaxy virtual machine), the user can retrieve the passphrase from Vault and unlock the volume using the Dashboard.
Figure 8:
Figure 8:
Galaxy elastic cluster architecture. Initially, only the master node, which hosts Galaxy, SLURM, and CLUES, is deployed. The SLURM queue is monitored by CLUES, and new worker nodes are deployed to process pending jobs up to the maximum number set during the cluster configuration, thus adapting resource availability to the current workload. The user home directory and persistent storage are shared among master and worker nodes through the Network File System (NFS), enabling the sharing of CONDA tool dependencies. The CVMFS shared volume is also mounted on each worker node to ensure that tools have access to reference data.

Similar articles

Cited by

  • Training Infrastructure as a Service.
    Rasche H, Hyde C, Davis J, Gladman S, Coraor N, Bretaudeau A, Cuccuru G, Bacon W, Serrano-Solano B, Hillman-Jackson J, Hiltemann S, Zhou M, Grüning B, Stubbs A. Rasche H, et al. Gigascience. 2022 Dec 28;12:giad048. doi: 10.1093/gigascience/giad048. Epub 2023 Jul 3. Gigascience. 2022. PMID: 37395629 Free PMC article.
  • Laniakea@ReCaS: exploring the potential of customisable Galaxy on-demand instances as a cloud-based service.
    Tangaro MA, Mandreoli P, Chiara M, Donvito G, Antonacci M, Parisi A, Bianco A, Romano A, Bianchi DM, Cangelosi D, Uva P, Molineris I, Nosi V, Calogero RA, Alessandri L, Pedrini E, Mordenti M, Bonetti E, Sangiorgi L, Pesole G, Zambelli F. Tangaro MA, et al. BMC Bioinformatics. 2021 Nov 8;22(Suppl 15):544. doi: 10.1186/s12859-021-04401-3. BMC Bioinformatics. 2021. PMID: 34749633 Free PMC article.
  • Cloud Computing Enabled Big Multi-Omics Data Analytics.
    Koppad S, B A, Gkoutos GV, Acharjee A. Koppad S, et al. Bioinform Biol Insights. 2021 Jul 28;15:11779322211035921. doi: 10.1177/11779322211035921. eCollection 2021. Bioinform Biol Insights. 2021. PMID: 34376975 Free PMC article. Review.
  • GalaxyCloudRunner: enhancing scalable computing for Galaxy.
    Goonasekera N, Mahmoud A, Chilton J, Afgan E. Goonasekera N, et al. Bioinformatics. 2021 Jul 19;37(12):1763-1765. doi: 10.1093/bioinformatics/btaa860. Bioinformatics. 2021. PMID: 33104194 Free PMC article.
  • PIPE-T: a new Galaxy tool for the analysis of RT-qPCR expression data.
    Zanardi N, Morini M, Tangaro MA, Zambelli F, Bosco MC, Varesio L, Eva A, Cangelosi D. Zanardi N, et al. Sci Rep. 2019 Nov 26;9(1):17550. doi: 10.1038/s41598-019-53155-9. Sci Rep. 2019. PMID: 31772190 Free PMC article.

References

    1. Attwood TK, Blackford S, Brazas MD, et al. .. A global perspective on evolving bioinformatics and data science training needs. Brief Bioinform. 2019;20:398–404. - PMC - PubMed
    1. Via A, Attwood TK, Fernandes PL, et al. .. A new pan-European Train-the-Trainer programme for bioinformatics: Pilot results on feasibility, utility and sustainability of learning. Brief Bioinform. 2019;20:405–15. - PMC - PubMed
    1. McGrath A, Champ K, Shang CA, et al. .. From trainees to trainers to instructors: Sustainably building a national capacity in bioinformatics training. PLoS Comput Biol. 2019;15:1–12. - PMC - PubMed
    1. Piccolo SR, Frampton MB. Tools and techniques for computational reproducibility. Gigascience. 2016;5:1–13. - PMC - PubMed
    1. Kumar S, Dudley J. Bioinformatics software for biologists in the genomics era. Bioinformatics. 2007;23:1713–7. - PubMed

Publication types

-