This guide covers how to adjust what goes into the Word report — editing AI-generated text, controlling which taxa and images appear, exporting corrections, and modifying the bundled configuration files that control long-term defaults.
Before using this guide you should have completed a full cruise workflow as described in Standard Cruise Workflow.
1. AI-generated report text
When an OpenAI or Google Gemini API key is configured, AlgAware-IFCB generates Swedish and English summaries and individual station descriptions automatically when Make Report is clicked. The text is written directly into the Word document — there is no preview or editing step inside the app.
After downloading the report, open the .docx file in
Word and edit the text there. Species names are already italicised and
HAB species are marked with a red asterisk — you do not need to add
formatting manually.
If no API key is set, placeholder text is inserted in the relevant sections of the Word document so you know where to fill in the summaries manually.
Style guide: The LLM system prompt is stored in
inst/extdata/report_writing_guide.md. A package maintainer can edit this file to change tone, required content, species name conventions, or report structure — see Editing the LLM style guide below.
2. Controlling image mosaics
The image mosaics (one for Baltic Sea, one for West Coast) are designed in the Images tab before generating the report. See Design image mosaics in the workflow guide.
If an image looks bad (blurry, double-exposure, edge artefact), click it in the mosaic preview to re-roll a new random image for that taxon — repeat until satisfied, then return to the Report panel.
3. Excluding samples from the report
Samples can be excluded in the Samples tab (see Review and exclude samples in the workflow guide). Excluded samples are removed from summaries, maps, mosaics, and the report. They remain in memory and can be re-included at any time using Include Selected or Include All.
4. Exporting the corrections log
All manual relabellings and invalidations made during the session are held in memory. Export them before closing the app:
- In the Report panel, click Download Corrections.
- Choose a save location. The file is saved as a
.csv.
The corrections log serves two purposes. First, it is an archive record of all changes made to the classifier output for a given cruise. Second, it can be re-imported in a future session using Import corrections in the Validate panel — all relabellings and invalidations will be replayed automatically on the freshly loaded data, so you can continue where you left off without repeating the validation work.
5. Re-running the report with updated data
If you add more corrections after generating the first report, or if new classification files become available, simply click Make Report again. The existing file will be overwritten.
There is no need to re-download data — all IFCB files are cached locally.
Configuration files for maintainers
The following files in the package control default behaviour. After
editing them, rebuild and reinstall the package with
devtools::install() for changes to take effect.
Phytoplankton group definitions
inst/config/phyto_groups.yaml controls how taxa are
assigned to the groups shown in the pie charts and passed to the LLM
prompts. Each top-level key is a group name (e.g. Diatoms,
Mesodinium spp.). Groups marked with a role
field map to the three built-in SHARK4R parameters; all others are
passed as custom groups and are evaluated in order — place more-specific
rules (genus-level) before broader ones (class-level).
Supported matching criteria are class,
phylum, order, family, and
genus. Any combination can be used within one group
entry.
Built-in (core) groups use the role
key:
| Role | SHARK4R parameter |
|---|---|
diatoms |
diatom_class |
dinoflagellates |
dinoflagellate_class |
cyanobacteria |
cyanobacteria_class /
cyanobacteria_phylum
|
Example — adding a new custom group:
Place this entry before any broader group
(e.g. Other) that would otherwise capture haptophyte
classes first.
After adding a group, also add the corresponding colour entry to
create_group_map() in R/plots.R so the new
group gets a distinct colour in the pie chart.
Editing the LLM style guide
inst/extdata/report_writing_guide.md is the system
prompt sent to the LLM before every text generation request. It
defines:
- The expected structure of summaries and station descriptions
- Species name conventions and HAB annotation rules
- Stylistic tone (formal scientific Swedish/English)
- What information must be included vs. omitted
Edit this file to reflect updated SMHI publication guidelines, new HAB reporting requirements, or a change in the preferred language model’s behaviour.
Adding taxa to the lookup table
inst/extdata/taxa_lookup.csv controls how classifier
class names are displayed, which taxa are treated as potentially
harmful, and which taxa carry a recommended warning level. Each row
has:
| Column | Description |
|---|---|
clean_names |
Classifier class name — must match the .h5 file
exactly |
name |
Scientific name shown in plots and reports |
sflag |
Optional qualifier appended to the name (e.g. spp.,
cf.) |
AphiaID |
WoRMS AphiaID (for reference) |
HAB |
TRUE if the taxon is potentially harmful or toxic |
warning_level |
Recommended alert threshold in cells/L (leave empty if not applicable) |
italic |
TRUE if the name should be italicised
(species/genus) |
After adding a new taxon row and rebuilding, the taxon will appear in the gallery dropdown, be included in biovolume calculations, and be correctly formatted in reports.
Warning levels
The warning_level column sets a recommended abundance
threshold (cells/L, i.e. images/L from the IFCB) above which the taxon
should be flagged in the report. When a station’s measured abundance for
that taxon meets or exceeds the threshold:
- The taxon is flagged
[WARNING]in the LLM prompt data for that station. - The LLM is explicitly instructed to state the actual abundance and the threshold value in the station description and in both summaries (Swedish and English).
- Warning levels are independent of the
HABflag — a taxon can have a warning level without being flagged as HAB, and vice versa.
Example entry for Dinophysis acuminata (warning at 1 500 cells/L):
clean_names,name,sflag,AphiaID,HAB,warning_level,italic
Dinophysis_acuminata,Dinophysis acuminata,,109684,TRUE,1500,TRUE
Leave warning_level empty (not 0) for taxa
where no threshold applies.
Adding stations permanently to the package
Stations fetched from SHARK at runtime are only available on the computer where they were fetched. To bundle a station so it is available on all installations:
For AlgAware spatial bin-matching and the chlorophyll map:
Add a row to inst/stations/algaware_stations.tsv:
"NEW STATION" "EAST" "New Stn"
-
STATION_NAMEmust match the canonical name used in SHARK. -
COASTisEASTfor Baltic Sea orWESTfor West Coast. -
STATION_NAME_SHORTis the label used in plots (keep it short).
For CTD profile and Chl-a time-series panels:
- Add the station name to the
station_listarray ininst/extdata/standard_stations.yaml. - Add a
StationName: "Region name"line in the same file. Use one of the existing region names or add a new region (a new region will create a new panel group in the CTD section of the report).
For raw station name synonym resolution:
If the station name as it appears in CNV file headers or LIMS exports
differs from the canonical name, add a row to
inst/extdata/station_mapper.txt:
NEW STATION RAW SYNONYM NEW STATION
The synonym column is matched case-insensitively. Prefix matching is
also applied, so NEW STATION (EXTRA TEXT) will resolve
correctly without an explicit entry.
Updating the historical Chl-a climatology
inst/extdata/annual_1991-2020_statistics_chl20m.txt
contains monthly mean and standard deviation Chl-a values for each
standard station, used as the grey ribbon in the CTD time-series panels.
The file is tab-delimited with columns STATN,
STNCODE, MONTH, CHLA:mean,
CHLA:std, CHLA:number_of_values.
When SMHI adopts a new climatological period (e.g. 1991–2020 →
2001–2030), replace this file with the updated statistics and update the
filename and legend labels in R/ctd_plots.R.