TAFFISH wrapper for lh3/bioawk, a BWK
awk-derived command-line tool with built-in parsing support for common
biological text formats.
- name:
bioawk - command:
taf-bioawk - upstream:
lh3/bioawk - packaged upstream version:
v1.0 - TAFFISH package version:
1.0-r1 - platforms:
linux/amd64,linux/arm64 - wrapper license: Apache-2.0
- upstream license: Lucent permissive notice in the bundled BWK awk source; the upstream repository does not provide a separate SPDX-style LICENSE file
This app builds bioawk from the upstream v1.0 source archive and installs the
single bioawk executable together with upstream README/FIXES/awk manual
materials and the Lucent notice extracted from the bundled BWK awk source.
Bioawk extends BWK awk with:
-c fastxfor FASTA/FASTQ records with$name,$seq,$qual, and$comment;-c sam,-c vcf,-c bed, and-c gffwith named biological fields;-c hdr/-c headerfor table columns named by the first input line;-t, equivalent to tab input/output delimiters;- biological helper functions such as
revcomp(); - gzip input support when
-cmode is used.
There is no external database, reference bundle, model, service, or interpreter required at runtime.
Show TAFFISH wrapper help:
taf-bioawk --helpShow the packaged upstream tag through TAFFISH package metadata:
taf-bioawk --versionRun the upstream executable explicitly:
taf-bioawk bioawk --version
taf-bioawk bioawk -c helpFor awk programs, use an awk program file with upstream -f. This preserves
the program text exactly through the current TAFFISH generated-shell boundary:
printf '{print $name "\t" length($seq)}\n' > lengths.awk
taf-bioawk bioawk -c fastx -f lengths.awk reads.fa > lengths.tsvBecause this tool app enables TAFFISH command mode, use the explicit upstream command for ordinary bioawk invocations:
taf-bioawk bioawk -c fastx -f lengths.awk reads.fqOption-leading calls also work with the wrapper -- separator:
taf-bioawk -- -c fastx -f lengths.awk reads.fqImportant wrapper boundary: current TAFFISH joins raw wrapper arguments into
generated shell. Inline awk one-liners such as '{print $name, length($seq)}'
are therefore not reliable through taf-bioawk, because spaces, $, braces,
parentheses, and quotes may be reinterpreted by the generated shell before
bioawk sees them. This is a wrapper argument-boundary issue, not an upstream
bioawk limitation. Put non-trivial awk programs in a file and pass that file
with -f.
The upstream bioawk --version output reports the underlying BWK awk core
version (awk version 20110810) rather than the bioawk release tag. The
TAFFISH package version and /opt/bioawk/share/doc/bioawk/source.txt bind this
container to upstream tag v1.0 and source checksum
5cbef3f39b085daba45510ff450afcf943cfdfdd483a546c8a509d3075ff51b5.
FASTA sequence lengths:
printf '{print $name "\t" length($seq)}\n' > lengths.awk
taf-bioawk bioawk -c fastx -f lengths.awk genome.fa > lengths.tsvReverse complement FASTA:
printf '{print ">"$name; print revcomp($seq)}\n' > revcomp.awk
taf-bioawk bioawk -c fastx -f revcomp.awk input.fa > rc.faRead compressed FASTA/Q directly:
taf-bioawk bioawk -c fastx -f lengths.awk reads.fq.gzExtract unmapped SAM records:
printf 'and($flag, 4)\n' > unmapped.awk
taf-bioawk bioawk -c sam -f unmapped.awk alignments.sam > unmapped.samUse VCF header-derived sample fields:
printf '{print $_CHROM, $POS, $sampleA}\n' > sample-column.awk
grep -v '^##' variants.vcf | taf-bioawk bioawk -tc hdr -f sample-column.awkThis package is intentionally a thin bioawk runtime. It does not bundle samtools, bcftools, bedtools, reference genomes, annotation databases, workflow scripts, or large example datasets. If a pipeline needs to transform BAM to SAM or pre-filter VCF/BED/GFF inputs, use the relevant TAFFISH tool app upstream of bioawk.
Bioawk can read gzip-compressed biological inputs only when -c mode is active,
matching the upstream README. Plain awk mode falls back to the original BWK awk
line reader.
The container build and TAFFISH smoke checks verify:
- source archive checksum and upstream tag binding;
bioawk --versionandbioawk -c help;- dynamic library completeness with
ldd; - FASTA/FASTQ parsing and
revcomp(); - gzip FASTA input in
-c fastxmode; - SAM named fields;
- VCF named fields and header-derived table fields;
- wrapper-level use of upstream
-fprogram files, which is the recommended TAFFISH invocation style for awk programs.
These smoke checks validate packaging and basic runtime behavior. They do not replace scientific validation of a full downstream workflow.
- homepage: https://github.com/lh3/bioawk
- release: https://github.com/lh3/bioawk/releases/tag/v1.0
- README/manual: https://github.com/lh3/bioawk#readme
The TAFFISH app packaging files are licensed under Apache-2.0.
The packaged upstream bioawk source is derived from BWK awk and carries the
Lucent Technologies permissive notice in the source tree. The upstream
repository does not provide a separate SPDX-style LICENSE file; the extracted
notice is preserved in the image under
/opt/bioawk/share/licenses/bioawk/LICENSE.Lucent.