Skip to contents

This guide covers how to adjust what goes into the Word report — editing AI-generated text, controlling which taxa and images appear, exporting corrections, and modifying the bundled configuration files that control long-term defaults.

Before using this guide you should have completed a full cruise workflow as described in Standard Cruise Workflow.


1. AI-generated report text

When an OpenAI or Google Gemini API key is configured, AlgAware-IFCB generates Swedish and English summaries and individual station descriptions automatically when Make Report is clicked. The text is written directly into the Word document — there is no preview or editing step inside the app.

After downloading the report, open the .docx file in Word and edit the text there. Species names are already italicised and HAB species are marked with a red asterisk — you do not need to add formatting manually.

If no API key is set, placeholder text is inserted in the relevant sections of the Word document so you know where to fill in the summaries manually.

Style guide: The LLM system prompt is stored in inst/extdata/report_writing_guide.md. A package maintainer can edit this file to change tone, required content, species name conventions, or report structure — see Editing the LLM style guide below.


2. Controlling image mosaics

The image mosaics (one for Baltic Sea, one for West Coast) are designed in the Images tab before generating the report. See Design image mosaics in the workflow guide.

If an image looks bad (blurry, double-exposure, edge artefact), click it in the mosaic preview to re-roll a new random image for that taxon — repeat until satisfied, then return to the Report panel.


3. Excluding samples from the report

Samples can be excluded in the Samples tab (see Review and exclude samples in the workflow guide). Excluded samples are removed from summaries, maps, mosaics, and the report. They remain in memory and can be re-included at any time using Include Selected or Include All.


4. Exporting the corrections log

All manual relabellings and invalidations made during the session are held in memory. Export them before closing the app:

  1. In the Report panel, click Download Corrections.
  2. Choose a save location. The file is saved as a .csv.

The corrections log serves two purposes. First, it is an archive record of all changes made to the classifier output for a given cruise. Second, it can be re-imported in a future session using Import corrections in the Validate panel — all relabellings and invalidations will be replayed automatically on the freshly loaded data, so you can continue where you left off without repeating the validation work.


5. Re-running the report with updated data

If you add more corrections after generating the first report, or if new classification files become available, simply click Make Report again. The existing file will be overwritten.

There is no need to re-download data — all IFCB files are cached locally.


Configuration files for maintainers

The following files in the package control default behaviour. After editing them, rebuild and reinstall the package with devtools::install() for changes to take effect.

Phytoplankton group definitions

inst/config/phyto_groups.yaml controls how taxa are assigned to the groups shown in the pie charts and passed to the LLM prompts. Each top-level key is a group name (e.g. Diatoms, Mesodinium spp.). Groups marked with a role field map to the three built-in SHARK4R parameters; all others are passed as custom groups and are evaluated in order — place more-specific rules (genus-level) before broader ones (class-level).

Supported matching criteria are class, phylum, order, family, and genus. Any combination can be used within one group entry.

Built-in (core) groups use the role key:

Role SHARK4R parameter
diatoms diatom_class
dinoflagellates dinoflagellate_class
cyanobacteria cyanobacteria_class / cyanobacteria_phylum

Example — adding a new custom group:

Haptophytes:
  class:
    - Coccolithophyceae
    - Pavlovophyceae

Place this entry before any broader group (e.g. Other) that would otherwise capture haptophyte classes first.

After adding a group, also add the corresponding colour entry to create_group_map() in R/plots.R so the new group gets a distinct colour in the pie chart.

Editing the LLM style guide

inst/extdata/report_writing_guide.md is the system prompt sent to the LLM before every text generation request. It defines:

  • The expected structure of summaries and station descriptions
  • Species name conventions and HAB annotation rules
  • Stylistic tone (formal scientific Swedish/English)
  • What information must be included vs. omitted

Edit this file to reflect updated SMHI publication guidelines, new HAB reporting requirements, or a change in the preferred language model’s behaviour.

Adding taxa to the lookup table

inst/extdata/taxa_lookup.csv controls how classifier class names are displayed, which taxa are treated as potentially harmful, and which taxa carry a recommended warning level. Each row has:

Column Description
clean_names Classifier class name — must match the .h5 file exactly
name Scientific name shown in plots and reports
sflag Optional qualifier appended to the name (e.g. spp., cf.)
AphiaID WoRMS AphiaID (for reference)
HAB TRUE if the taxon is potentially harmful or toxic
warning_level Recommended alert threshold in cells/L (leave empty if not applicable)
italic TRUE if the name should be italicised (species/genus)

After adding a new taxon row and rebuilding, the taxon will appear in the gallery dropdown, be included in biovolume calculations, and be correctly formatted in reports.

Warning levels

The warning_level column sets a recommended abundance threshold (cells/L, i.e. images/L from the IFCB) above which the taxon should be flagged in the report. When a station’s measured abundance for that taxon meets or exceeds the threshold:

  • The taxon is flagged [WARNING] in the LLM prompt data for that station.
  • The LLM is explicitly instructed to state the actual abundance and the threshold value in the station description and in both summaries (Swedish and English).
  • Warning levels are independent of the HAB flag — a taxon can have a warning level without being flagged as HAB, and vice versa.

Example entry for Dinophysis acuminata (warning at 1 500 cells/L):

clean_names,name,sflag,AphiaID,HAB,warning_level,italic
Dinophysis_acuminata,Dinophysis acuminata,,109684,TRUE,1500,TRUE

Leave warning_level empty (not 0) for taxa where no threshold applies.

Adding stations permanently to the package

Stations fetched from SHARK at runtime are only available on the computer where they were fetched. To bundle a station so it is available on all installations:

For AlgAware spatial bin-matching and the chlorophyll map:

Add a row to inst/stations/algaware_stations.tsv:

"NEW STATION"   "EAST"  "New Stn"
  • STATION_NAME must match the canonical name used in SHARK.
  • COAST is EAST for Baltic Sea or WEST for West Coast.
  • STATION_NAME_SHORT is the label used in plots (keep it short).

For CTD profile and Chl-a time-series panels:

  1. Add the station name to the station_list array in inst/extdata/standard_stations.yaml.
  2. Add a StationName: "Region name" line in the same file. Use one of the existing region names or add a new region (a new region will create a new panel group in the CTD section of the report).

For raw station name synonym resolution:

If the station name as it appears in CNV file headers or LIMS exports differs from the canonical name, add a row to inst/extdata/station_mapper.txt:

NEW STATION RAW SYNONYM NEW STATION

The synonym column is matched case-insensitively. Prefix matching is also applied, so NEW STATION (EXTRA TEXT) will resolve correctly without an explicit entry.

Updating the historical Chl-a climatology

inst/extdata/annual_1991-2020_statistics_chl20m.txt contains monthly mean and standard deviation Chl-a values for each standard station, used as the grey ribbon in the CTD time-series panels. The file is tab-delimited with columns STATN, STNCODE, MONTH, CHLA:mean, CHLA:std, CHLA:number_of_values.

When SMHI adopts a new climatological period (e.g. 1991–2020 → 2001–2030), replace this file with the updated statistics and update the filename and legend labels in R/ctd_plots.R.