🖥️For Bio-OS Developer

Architecture

Features:

Highly scalable, including APIs and the storage layer;
Deep integration with widely used open-source tools in bioinformatics, such as Cromwell and JupyterHub.

The diagram below illustrates the network and storage dependencies, as well as the interactions between internal components:

File storage: stores inputs and outputs of bioinformatics analysis jobs, along with Notebooks.
Database (DB): stores control information for various entities, including workflow scripts.

Domain Design

The project applies Domain-Driven Design (DDD) extensively. The diagram illustrates only the core domain design represented by the API Server, while Cromwell and JupyterHub belong to the supporting domain and are not depicted. The bounded contexts are defined according to the principle of separating static and dynamic concerns. For the sake of lightweight implementation, all bounded contexts are consolidated within a single service.

Bounded Context

Function

Workspace

It is used to archive and present bioinformatics analysis processes and results, while also serving as the fundamental unit for delineating responsibilities and roles.

Notebook Server

Used to manage the Notebook editing environment, including environment configuration and lifecycle management.

Submission

Used to manage workflow executions and submission history.

Coding

Framework

Hertz The open-source HTTP service framework developed by ByteDance’s CloudWeGo team

GRPC gRPC is primarily used for CLI access and for interface calls between internal bounded contexts.

Casbin Access control is supported through its RBAC capabilities.

GORM The most popular ORM framework in the Go ecosystem, used for connecting to all relational databases.

Project Layout

.
├── cmd # The main entry points of the components run in the form of Cobra commands.
│   ├── apiserver
│   └── bioctl
├── internal # Internal component logic
│   ├── apiserver
│   ├── bioctl
│   ├── context # Implementation of bounded contexts
│   │   ├── notebookserver
│   │   ├── submission
│   │   └── workspace
│   │       ├── application
│   │       ├── domain
│   │       ├── infrastructure
│   │       └── interface
├── pkg # The backend common libraries, referenced by cmd and internal.
│   ├── auth # Authentication and authorization configuration
│   ├── client
│   ├── consts
│   ├── db # Database configuration
│   ├── errors
│   ├── eventbus # Event bus configuration
│   ├── jupyterhub
│   ├── log
│   ├── middlewares # Handles authentication and authorization for API access.
│   ├── notebook # Notebook configuration
│   ├── schema
│   ├── server # Server framework configuration
│   ├── storage # File storage configuration
│   ├── utils
│   ├── validator # API input validation method
│   └── version
└── web # Frontend section
    ├── public # Static resources related to the page entry
    ├── src # TypeScript code
    └── swagger-gen.js # Generate API client code from Swagger

Others

.
├── build # Dockerfiles for each component
│   ├── apiserver
│   ├── bioctl
│   └── web
├── conf # Startup configuration examples for apiserver and ctl
├── docs # API documentation (Swagger format)
├── hack
│   ├── boilerplate # LICENSE
│   └── make-rules # Implementation of Makefile targets
└── githooks # Git hook

Domain-Driven Design in Practice

The project adopts the classic four-layer architecture:

Interface: This layer leverages various server frameworks to implement different types of interfaces, including integrations with Hertz and gRPC. Within this layer, View Objects (VOs) are defined for each protocol type, and transformations from View Objects to Data Transfer Objects (DTOs) are implemented.
Application: According to the CQRS principle, logic is divided into Command and Query. The Query part contains the abstraction of the Read Model, which is implemented in the Infrastructure layer. The Data Transfer Objects (DTOs) used for communication with the Interface layer are also defined here.
Domain: This is the core part of the system, where external dependencies are abstractly defined, and their implementations are provided externally. A typical example is the Repository pattern, which embodies the principle of dependency inversion.
Infrastructure: Responsible for integrating with various external facilities, including storage, by providing concrete implementations of the defined abstractions.

Development environment

Required tools (excluding frontend):

Git
GNU Make
Go 1.19 or later
Docker

The remaining tools (e.g., Swagger, Womtool, Protoc) can be installed with a single command by running make tools.

Debug

For cluster deployment, it is recommended to use the official helm charts. While the original documentation demonstrates Minikube as the runtime environment, using a managed Kubernetes service on public cloud platforms is preferred, as it enables one-click installation of networking and storage components. In addition, public clouds typically provide NFS and MySQL services.

Appendix: Running Cromwell in standalone mode

Prepare a working directory, create an application.conf file in it, and paste the following content.

include required(classpath("application"))
webservice {
  port = 8000
}
workflow-options {
  workflow-log-dir = /nfs/bioos-storage/cromwell-workflow-logs
  workflow-log-temporary = false
}
call-caching {
  enabled = true
  invalidate-bad-cache-results = true
}
database {
  profile = "slick.jdbc.MySQLProfile$"
  db {
    driver = "com.mysql.cj.jdbc.Driver"
    url = "jdbc:mysql://${db_endpoint}:3306/${db_name}?rewriteBatchedStatements=true&useSSL=false"
    port = 3306
    user = "${db_username}"
    password = "${db_password}"
    connectionTimeout = 5000
  }
}
backend {
  default = "Local"
  providers {
    Local {
      config {
        root = "/nfs/bioos-storage/cromwell-executions"
        filesystem {
          local {
           localization: [
                  "hard-link", "soft-link", "copy"
           ]

            caching {
              duplication-strategy: [
                "hard-link", "soft-link", "copy"
              ]
              hashing-strategy: "md5"
              check-sibling-md5: false
            }
          }
        }
      }
    }
  }
}

Download the program file cromwell.jar

Download cromwell-85.jar from the link ：https://github.com/broadinstitute/cromwell/releases/tag/85

Runbash ./run.sh

#!/bin/bash
set -e
echo 'starting cromwell'
nohup java -jar -Dconfig.file=${application.conf} -DLOG_LEVEL=INFO -DLOG_MODE=standard ${cromwell_path} server >log1 2>logerr &
echo 'started cromwell'

PreviousFor User NextFor BioMate Developer

Last updated 5 months ago

hashtagArchitecture

hashtagDomain Design

hashtagCoding

hashtagFramework

hashtagProject Layout

hashtagDomain-Driven Design in Practice

hashtagDevelopment environment

hashtagDebug

hashtagAppendix: Running Cromwell in standalone mode