This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
GORpipe is a genomic analysis tool based on a Genomic Ordered Relational (GOR) architecture. It uses a declarative query language combining ideas from SQL and Unix shell pipe syntax to analyze large sets of genomic and phenotypic tabular data in a parallel execution engine.
# Build local installation
./gradlew installDist
# or:
make build
# Run GOR after building
./gortools/build/install/gor-scripts/bin/gorpipe "gor ..."
# Clean
./gradlew clean
# Compile with all warnings (useful for catching issues)
make compile-all-with-warnings# Run standard unit tests
./gradlew test
# Run slow tests
./gradlew slowTest
# Run integration tests
./gradlew integrationTest
# Run all tests
make all-test
# Run a single test class
./gradlew test --tests "org.gorpipe.gor.TestClassName"
# Run a single test method
./gradlew test --tests "org.gorpipe.gor.TestClassName.testMethodName"
# Run tests in a specific module
./gradlew :gortools:test --tests "gorsat.Script.UTestSignature"
# Run ScalaTest tests in gortools (not auto-discovered by Gradle)
./gradlew :gortools:testScalaTests are categorized with JUnit @Category annotations:
SlowTests— run withslowTesttaskIntegrationTests— run withintegrationTesttaskDbTests— run withdbTesttask
Test data lives in tests/data/ and is loaded as a git submodule (from gor-test-data repo). Initialize it with:
git submodule update --init --recursiveThis is a multi-module Gradle project with a layered dependency structure:
auth
↓
base → util
↓ ↓
└→ model (Scala+Java, genomic data structures)
↓
drivers (S3, GCS, Azure, OCI storage drivers)
↓
gortools (main query engine — ANTLR4 grammar, gorsat package)
↓
gorscripts (CLI and command-line tools)
The test module provides shared test infrastructure and depends on all main modules. The external module contains vendored/third-party code.
Key modules:
model— Core genomic data abstractions; mixed Java/Scala, uses Parquet, SQLite, PostgreSQL, Caffeine caching. Defines theRow,GenomicIterator,Analysis,CommandInfo, andSourceProviderinterfaces.gortools— Query engine entry point; contains the ANTLR4 grammar insrc/main/antlr, and thegorsatpackage with all GOR commands/functions written primarily in Scala. Command/macro registries live here.drivers— Pluggable storage drivers (auto-discovered via@AutoService); each cloud provider (S3, GCS, Azure, OCI) is a separate driver.gorscripts— CLI entry points using picocli; main class isGorCLI(gorscripts/src/main/java/org/gorpipe/gor/cli/GorCLI.java).
- Java 17, Scala 2.13 — mixed codebase; most query engine logic is Scala, infrastructure is Java
- Gradle with Groovy DSL plugins in
buildSrc/ - ANTLR4 — query language grammar in
gortools/src/main/antlr/ - JUnit 4 with ScalaTest/ScalaCheck for Scala modules
- Build configuration shared via
buildSrc/src/main/groovy/:gor.java-common.gradle— common Java/Scala settings applied to all modulesgor.java-library.gradle— publishing configuration for library modulesgor.scala-common.gradle— Scala 2.13 compilation configgor.java-application.gradle— CLI/application distribution config
- ANTLR generates sources into
gortools/build/generated-src/antlr/main(visitor pattern enabled)
Understanding how a GOR query executes end-to-end:
- Parsing —
GorScript.g4(ANTLR4) defines the grammar. Scripts go through alias expansion → include injection → macro preprocessing inScriptExecutionEngine.scala. - Command lookup — All pipe commands are registered in
GorPipeCommands.scalaviacommandMap. Each entry is aCommandInfoinstance. - Analysis chain — Each pipe step produces an
Analysis(Scala abstract class inmodel). Analysis instances are chained viapipeTo, forming a processing pipeline. Key methods:setRowHeader()(called once with incoming schema),process(r: Row)(called per row),finish()(cleanup). - Row iteration — Source data is read via
GenomicIterator(implementsIterator<Row>), which supportsseek(chr, pos)for genomic range queries. - Output — A
GorRunner(created byGorExecutionEngine) drives the iterator and collects results.
Key files:
gortools/src/main/scala/gorsat/process/GorPipeCommands.scala— command registrygortools/src/main/scala/gorsat/process/GorPipeMacros.scala— macro registry (PGOR, PARTGOR, etc.)gortools/src/main/scala/gorsat/Script/ScriptExecutionEngine.scala— script preprocessingmodel/src/main/scala/gorsat/Commands/Analysis.scala— base analysis classmodel/src/main/java/org/gorpipe/gor/model/Row.java— row interfacemodel/src/main/java/org/gorpipe/gor/model/GenomicIterator.java— iterator interface
Adding a new GOR pipe command:
- Create a Scala class in
gortools/src/main/scala/gorsat/Commands/extendingCommandInfo - Implement
processArguments()— parse args and returnCommandParsingResultcontaining anAnalysisinstance - Create a corresponding
Analysissubclass ingortools/src/main/scala/gorsat/Analysis/implementingprocess(),setRowHeader(), andfinish() - Register in
GorPipeCommands.register()inGorPipeCommands.scala
Adding a new storage driver:
- Create a class in
drivers/src/main/java/org/gorpipe/<provider>/implementingSourceProvider - Annotate with
@AutoService(SourceProvider.class)— drivers are auto-discovered at runtime - Add the provider entry under
META-INF/services/
Adding a new macro:
- Create in
gortools/src/main/scala/gorsat/Macros/extendingMacroInfo - Register in
GorPipeMacros.register()
Tests that exercise the query engine need to initialize the registries:
GorPipeCommands.register();
GorInputSources.register();Use TestUtils.runGorPipe("gor ...") for integration-style query tests.
To test changes in a dependent project:
# Publish to Maven Local (~/.m2)
make publish-local
# Then in dependent project: ./gradlew ... -PuseMavenLocal- Version stored in
VERSIONfile at repo root - Semantic versioning:
<major>.<minor>.<patch> - Development versions use
-SNAPSHOTsuffix - Releases:
make release-milestone-from-master MILESTONE=X.Y.Z - Dependency versions managed in
versions.properties(refreshVersions plugin); update with./gradlew refreshVersions