Chaos Engineering release notes
The release notes describe recent changes to Harness Chaos Engineering.
- Progressive deployment: Harness deploys changes to Harness SaaS clusters on a progressive basis. This means the features described in these release notes may not be immediately available in your cluster. To identify the cluster that hosts your account, go to your Account Overview page in Harness. In the new UI, go to Account Settings, Account Details, General, Account Details, and then Platform Service Versions.
- Security advisories: Harness publishes security advisories for every release. Go to the Harness Trust Center to request access to the security advisories.
- More release notes: Go to Harness Release Notes to explore all Harness release notes, including module, delegate, Self-Managed Enterprise Edition, and FirstGen release notes.
July 2024
Version 1.43.3
New features and enhancements
-
Crictl binary is upgraded from 1.29.0 to 1.31.0 to fix 3 vulnerabilities. (CHAOS-6357)
-
Updated the status code in the
experiment-stats
page to return status code 403 instead of 401 due to the changes around support groups. 401 status code indicates that a user logged out whereas to display an error, status code 403 is used. (CHAOS-6322) -
Adds support for live logs for Linux and Windows. (CHAOS-6137)
-
Adds Probe Properties tab on the UI in ChaosHub to show details about the probe selected. (CHAOS-6132)
Fixed issues
-
Fixed issue where GameDay was not available to users at the project level but was available at the account/organization level who had administrator access. (CHAOS-6349)
-
Fixed the Windows memory hog experiment when installed using the offline installer. (CHAOS-6363)
-
Fixed an issue where the Resilience Probes page showed internal system error in prod1. (CHAOS-6360)
-
Fixed an issue where the Resilience Probe index was out of bound for GameDay experiments that did not have any probes. (CHAOS-6330)
-
Fixed the issue where Cloud Foundry app JVM CPU stress fault didn't have YAML validation in Linux. (CHAOS-6312)
-
Fixed an issue where the documents were being updated even though no changes were needed. (CHAOS-6296)
-
Fixed an incorrect syntax in the
kubectl watch
command in the UI. (CHAOS-5968)
Version 1.41.1
Fixed issues
- Fixed the error associated with upgrading a chaos infrastructure by providing relevant permissions for the upgrade agent in the execution plane (user host/cluster). (CHAOS-5980)
Version 1.40.1
New features and enhancements
-
Adds a new Kubernetes pod fault, pod IO mistake that causes files to read or write an incorrect value. (CHAOS-5916)
-
Adds proxy support for Windows chaos infrastructure. (CHAOS-5859)
-
Adds support to install Windows chaos infrastructure offline. (CHAOS-5833)
-
Unifies chaos injection by introducing a dumb agent to invoke user action and pass the results of the chaos experiment to the control plane. (CHAOS-5610)
-
Implements AWS FIS generic experiment that helps users execute and monitor any AWS FIS template. (CHAOS-5418)
-
Converts the default health check probes to
type:inline
fromtype:source
for Kubernetes infrastructure to improve the execution speed of chaos experiments. (CHAOS-4348)
Fixed issues
- Fixed an issue where an experiment in the
Error
state would not finish, and be in a state of infinite run timestamp. (CHAOS-5577)
Version 1.39.11
Fixed issues
-
Fixed an issue wherein trying to add a pre-defined experiment in Windows infrastructure was unsuccessful. (CHAOS-5863)
-
Fixed an issue where the Edit ChaosHub action was not working with non-account type connectors. (CHAOS-5820)
-
Fixed an issue where the Linux restart chaos fault could not parse string values. (CHAOS-5616)
May 2024
Version 1.38.7
New features and enhancements
-
This release improves the advanced filter support for "headers", "methods", "queryParams", "destination_IPS", and "destination_Hosts" in the API faults. (CHAOS-5381)
-
Adds the unit support (milliseconds, seconds, minutes and hours) for latency parameters in the pod API latency faults. (CHAOS-5378)
-
Adds backend to GameDay V2. (CHAOS-5138)
-
Adds the following JVM chaos faults for Linux that target the JVM of a given Java process running on a Linux machine to inject faults.
- Video tutorial to upgrade your chaos infrastructure to 1.38.x or higher
- Video tutorial to execute an experiment after infrastructure upgrade to 1.38.x or higher
- The existing APIs will work as per the norm on old and new chaos infrastructure, whereas new experiments will work only on the updated infrastructure (infrastructure version >= 1.38.0).
- Go to frequently asked questions on optimization to know more.
- This release optimizes the experiment flow by:
- Reading environment variables from the chaos engine.
- Eliminating the experiment's custom resources and the corresponding steps for new experiments.
- Eliminating the install experiment step.
- Reducing the length of the YAML manifest.
- Increasing the speed of execution of the experiment.
- Adding all the overrides to the chaos engine.
- Enhancing the list filter, compatible only with the new experiment template. (CHAOS-5122)
Fixed issues
-
Fixed an issue where the compatibility check was enabled for other infrastructure types too. The overview form now preserves the state while switching between different infrastructures. (CHAOS-5614)
-
Fixed an issue where ChaosGuard list APIs was not returning the updated_by and created_by fields. (CHAOS-5596)
-
Fixed an issue where a user could not connect to a ChaosHub if its secret had a '-' symbol (after the deployment of ng-manager 1.33). (CHAOS-5112)
-
Fixed the rendering of the View Onboarding Progress page. (CHAOS-5583)
-
Fixed an issue where the user could not set up or create a Datadog probe. (CHAOS-5440)
-
Fixed an issue where the pod IO stress experiment incorrectly applied stress on the helper pod instead of the target container. (CHAOS-5416)
Version 1.37.0
New features and enhancements
- This release introduces the DynamoDB replication pause experiments powered by AWS FIS. These experiments improve the configuration, execution, and monitoring capabilities of the application. (CHAOS-5002)
Fixed issues
- Fixed an issue where the command probe multiple source probes were overridden. (CHAOS-5308)
Version 1.36.5
Fixed issues
-
Fixed an issue where accounts that started with an underscore could not execute a Linux chaos experiment. (CHAOS-5185)
-
Fixed an issue where a chaos experiment failed when two chaos faults had the same probe (legacy) name. (CHAOS-5064)
-
Fixed an issue where editing the SLO probe evaluation window resulted in an
Internal server error
. (CHAOS-5022) -
Fixed an issue in the UI where chaos experiments with the toggle option to enable (or disable) cloud secret was enabled automatically after saving the experiment. (CHAOS-4987)
April 2024
Version 1.35.1
New features and enhancements
- The node drain chaos experiment now supports selecting multiple target nodes in sequence(serial or parallel). (CHAOS-2187)
Fixed issues
-
Linux command probes in "source" mode was failing due to a module mismatch. This is fixed now. (CHAOS-4952)
-
Fixed the issue of user receiving duplicate notification after sending an event data. (CHAOS-4942)
-
Resilience probe run were being filtered on incorrect runs. This is fixed now. (CHAOS-4912)
-
If syntax errors were identified in a manifest after uploading it, user had to refresh the page and re-upload the YAML. This is fixed now, and users can edit the YAML without refreshing it. (CHAOS-4905)
Version 1.34.5
New features and enhancements
-
Adds 32-bit Windows support for Windows chaos infrastructure. (CHAOS-4792)
-
Speeds up Windows chaos infrastructure installation with the help of a compressed Windows service binary. (CHAOS-4790)
-
Improves the error handling mechanism of HTTP probes when sending requests to blocked or unreachable hosts, thereby making monitoring (during chaos experiments) reliable and accurate. (CHAOS-4665)
-
Improves system stability and reliability during chaos testing by facilitating graceful abortion for edge cases in Windows memory hog experiment. (CHAOS-4664)
-
Provides post-hook recovery support for Windows chaos experiment, which adds system stability and automatic recovery if a chaos service terminates abruptly during a experiment. (CHAOS-4663)
-
Introduces global blackhole chaos support in the blackhole chaos experiments, which allows blocking all hosts from a VM, effectively isolating it from network communication. (CHAOS-4661)
-
Updates ensure smooth operation of the pod API chaos and pod HTTP chaos faults in case the target pod restarts. (CHAOS-4187)
Fixed issues
-
Resilience probes were not available for Windows experiments. This is fixed. (CHAOS-4786)
-
The ChaosGuard condition blocked the chaos experiments when the application specification did not match. This is fixed. Moving forward, experiments will be blocked only if the application specification matches. (CHAOS-4772)
-
While configuring the Datadog resilience probe, the UI displayed the comparator even when the user did not provide the metrics associated with the comparator during the configuration, that is, the conditional rendering was not in place. This is fixed. (CHAOS-4770)
-
The "Select Probe" UI overflowed on pagination when it was in full capacity. This is fixed. (CHAOS-4725)
-
When you provide a source port for the Linux network loss experiment, all the ports on the VM were targeted. This is fixed. (CHAOS-4591)
March 2024
Version 1.33.1
New features and enhancements
- The Windows blackhole chaos experiment supports graceful abort functionality, thereby providing better control and flexibility while performing the experiment. (CHAOS-4582)
Version 1.32.1
New features and enhancements
-
Adds
listInfrasWithExperimentStats
API to fetch the experiment statistics for the requested chaos infrastructure. The API takes a list of infrastructure IDs (infraIDs) and returns the associated experiment and experiment run count. ThelistInfras
API is deprecated. (CHAOS-4417) -
Updates the
getHelmInfra
API togetHelmInfraCommand
, and the updated API gives the command necessary to install and upgrade the chaos infrastructure using Helm. (CHAOS-4296) -
Adds conditions to the experiment name, i.e.,
- Number of characters is not more than 47;
- Names can contain only lowercase, numbers, and dashes;
- Names should not start or end with a dash. (CHAOS-3749)
-
Adds Helm support to install chaos infrastructure. (CHAOS-3327)
Fixed issues
- When a chaos experiment was cloned and the probe configuration of the cloned experiment was modified, the changes to the probe configuration were not reflected in the experiment. This issue is resolved. (CHAOS-4249)
February 2024
Version 1.31.2
New features and enhancements
- This release adds API support to install and upgrade chaos infrastructure using Helm. (CHAOS-2998)
Fixed issues
- Disabling a Linux resilience probe removed all chaos faults associated with the chaos experiment. It has been fixed. Now, you can bulk enable and disable a Kubernetes and a Linux infrastructure's resilience probe. (CHAOS-3849)
January 2024
Version 1.30.0
New features and enhancements
-
Appropriate environment variables are added at relevant places to ensure that the self-managed platform (SMP) can be used with feature flags (FF). (CHAOS-3865)
-
The SSH chaos experiment now supports an extended termination grace period, allowing for longer execution of abort scripts. (CHAOS-3748)
-
This release adds wildcard support for all entities in the chaosguard conditons. (CHAOS-3254)
Fixed issues
- Chaos hub icons were not visible when the hub name consisted of the '/' character. This is fixed so that a user can't create a hub with the '/' character. (CHAOS-3753)
Version 1.29.0
New features and enhancements
- Improves the error messages and logs returned to the client in the API to save chaos experiments. (CHAOS-3607)
Fixed issues
-
Linux chaos infrastructure (LCI) installer wasn't executing the script with sudo privileges, which resulted in Failed to install linux-chaos-infrastructure error. This issue is now resolved. (CHAOS-3724)
-
Deselecting the Show active infra displayed the inactive infrastructures only, whereas it should display all the infrastructures. This issue is now resolved. (CHAOS-3717)
-
LCI process would get killed due to a lack of memory (OOM) when a high amount of memory was specified during a memory stress fault. This issue is now resolved so that the likeliness of OOM kills during limited memory availability is reduced. (CHAOS-3469)
Version 1.28.1
New features and enhancements
-
Adds optimisation to utilise memory efficiently, reduce latency, and enhance server performance. (CHAOS-3581)
-
Linux infrastructure is automatically versioned with the help of the API. Previously, the versions were hardcoded for every release. (CHAOS-3580)
-
Adds a condition to the experiment such that a resilience probe can't be added more than once in a single fault within an experiment. The same resilience probe can be used in another fault within the same experiment, though. (CHAOS-3520)
-
Adds a generic audit function that is used to generate all audit trails, thereby reducing redundancy. This generic function is customized based on the type of audit (Chaos experiment, Gameday, Chaos infrastructure, and so on). (CHAOS-3484)
-
With this release, the Linux chaos infrastructure binary uses static linking instead of dynamic linking. This removes any dependency on the OS built-in programs including
glibc
. (CHAOS-3334) -
Enhanced the performance of the API (GetExperiment) that was used to fetch details of Kubernetes and Linux experiments. An optional field is added that fetches the average resilience score. (CHAOS-3218)
-
Adds support for bulk-disable (disable enabled CRON schedules selected by user) and bulk-enable (enable disabled CRON schedules selected by user) CRON-scheduled experiments, with a limit of 20 experiments for every operation. (CHAOS-3174)
Fixed issues
-
After selecting an experiment, when a user tried to select an active infrastructure for the experiment, the page would throw an error. This is fixed. (CHAOS-3585)
-
Editing a Linux experiment to change the infrastructure would not update the infrastructure. This is fixed. (CHAOS-3536)
-
When multiple faults are executed in parallel, faults that transitioned into an "errored" state would not reflect in the logs, whereas faults in success state reflected in the logs with an "errored" status. This is fixed. (CHAOS-3363)