π₯οΈFor Bio-OS Developer
Architecture
Features:
Highly scalable, including APIs and the storage layer;
Deep integration with widely used open-source tools in bioinformatics, such as Cromwell and JupyterHub.
The diagram below illustrates the network and storage dependencies, as well as the interactions between internal components:
File storage: stores inputs and outputs of bioinformatics analysis jobs, along with Notebooks.
Database (DB): stores control information for various entities, including workflow scripts.

Domain Design
The project applies Domain-Driven Design (DDD) extensively. The diagram illustrates only the core domain design represented by the API Server, while Cromwell and JupyterHub belong to the supporting domain and are not depicted. The bounded contexts are defined according to the principle of separating static and dynamic concerns. For the sake of lightweight implementation, all bounded contexts are consolidated within a single service.

Bounded Context
Function
Workspace
It is used to archive and present bioinformatics analysis processes and results, while also serving as the fundamental unit for delineating responsibilities and roles.
Notebook Server
Used to manage the Notebook editing environment, including environment configuration and lifecycle management.
Submission
Used to manage workflow executions and submission history.
Coding
Framework
Hertz The open-source HTTP service framework developed by ByteDanceβs CloudWeGo team
GRPC gRPC is primarily used for CLI access and for interface calls between internal bounded contexts.
Casbin Access control is supported through its RBAC capabilities.
GORM The most popular ORM framework in the Go ecosystem, used for connecting to all relational databases.
Project Layout
Others
Domain-Driven Design in Practice
The project adopts the classic four-layer architecture:
Interface: This layer leverages various server frameworks to implement different types of interfaces, including integrations with Hertz and gRPC. Within this layer, View Objects (VOs) are defined for each protocol type, and transformations from View Objects to Data Transfer Objects (DTOs) are implemented.
Application: According to the CQRS principle, logic is divided into Command and Query. The Query part contains the abstraction of the Read Model, which is implemented in the Infrastructure layer. The Data Transfer Objects (DTOs) used for communication with the Interface layer are also defined here.
Domain: This is the core part of the system, where external dependencies are abstractly defined, and their implementations are provided externally. A typical example is the Repository pattern, which embodies the principle of dependency inversion.
Infrastructure: Responsible for integrating with various external facilities, including storage, by providing concrete implementations of the defined abstractions.

Development environment
Required tools (excluding frontend):
Git
GNU Make
Go 1.19 or later
Docker
The remaining tools (e.g., Swagger, Womtool, Protoc) can be installed with a single command by running make tools.
Debug
For cluster deployment, it is recommended to use the official helm charts. While the original documentation demonstrates Minikube as the runtime environment, using a managed Kubernetes service on public cloud platforms is preferred, as it enables one-click installation of networking and storage components. In addition, public clouds typically provide NFS and MySQL services.
Appendix: Running Cromwell in standalone mode
Prepare a working directory, create an application.conf file in it, and paste the following content.
Download the program file cromwell.jar
Download cromwell-85.jar from the link οΌhttps://github.com/broadinstitute/cromwell/releases/tag/85
Run
bash ./run.sh
Last updated