Skip to content

feat: improve OTA#30566

Merged
Koenkk merged 15 commits intodevfrom
ota-refactor
Jan 24, 2026
Merged

feat: improve OTA#30566
Koenkk merged 15 commits intodevfrom
ota-refactor

Conversation

@Nerivec
Copy link
Copy Markdown
Collaborator

@Nerivec Nerivec commented Jan 8, 2026

Design changes

  • [ZH/ZHC/Z2M/WindFront] OTA refactor zigbee-herdsman#1608
  • requires feat!: OTA refactor zigbee-herdsman#1612
  • Update API to match new features
  • Merge Zigbee/MQTT triggered update logic into single function for consistency
  • Rework OTA support check (from definition) to allow custom file/URL to always go through (check/update/schedule)
  • Add from/to file versions to update response payload to avoid weirdness with swBuildId/dateCode often unavailable
  • Add latest_source (URL/filesystem), latest_release_notes to OTA state payload
  • Allow passing data settings (timings/sizes) per request to override settings
  • Tweak the published OTA state based on return of update (clear available if update returned "no image")
  • Add ability to schedule with custom URL/filesystem
  • Add ability to send hex-formatted image to Z2M which will be automatically written to data dir (for use with update or schedule)
    • Should allow using plain file over MQTT/WS assuming properly formatted upstream (e.g. frontend)
    • With limitation that MQTT/WS must allow payloads of sufficient size (though most cases should be <1MB)

Due to MQTT request payloads still allowing simple string for OTA, it duplicates the code a bit for now (marked deprecated for 3.0).

@Koenkk

  • I'm thinking with this one, we might want to bump the settings version so we can trigger a migration to remove all cached OTA states?
  • How do you prefer to deal with null source/release_notes in #getEntityPublishPayload? I'm hesitating between null or undefined so it's taken out on stringify.

TODO:

@Nerivec Nerivec force-pushed the ota-refactor branch 2 times, most recently from f87f504 to e2af075 Compare January 8, 2026 22:23
@Koenkk
Copy link
Copy Markdown
Owner

Koenkk commented Jan 10, 2026

I'm thinking with this one, we might want to bump the settings version so we can trigger a migration to remove all cached OTA states?

Agree

How do you prefer to deal with null source/release_notes in #getEntityPublishPayload? I'm hesitating between null or undefined so it's taken out on stringify.

I would propose to go for null (such that the attribute is always in the published payload)

@Nerivec
Copy link
Copy Markdown
Collaborator Author

Nerivec commented Jan 16, 2026

@Koenkk can you take a closer look now that it's fully tested? In particular:

  1. Left a few TODOs, not all quite related to this specifically.
  2. I modified reInterview (previously only used if triggered by bridge request) to behave more like if it had been triggered from ZH Controller. Should provide better feedback and align the logic?
  3. Image files as hex written to ota subdir of data dir, should be a bit cleaner. Files are auto-removed on unschedule, but not sure if we should also auto-remove on update end (tracing?)?

Note: we'll have to rebuild the all-settings page in docs for this one.


await this.configure(data.device, "zigbee_event");
});
// TODO: this is triggering for any `data.status`, but should only for `successful`? (relies on `device.definition?` early-return?)
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that even when a device interview fails, it's worth configuring it as it might be due to a non crucial error and then the configure makes the device at least partially working.

this.eventBus.onGroupMembersChanged(this, this.onGroupMembersChanged);
this.eventBus.onDeviceAnnounce(this, this.onZigbeeEvent);
this.eventBus.onDeviceJoined(this, this.onZigbeeEvent);
// TODO: this is triggering for any `data.status`?
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it makes sense to do this always, similar to https://github.com/Koenkk/zigbee2mqtt/pull/30566/changes#r2722613265 the error might be not critical and maybe all info needed to identify the device has been received, thus it can discovered.

});
this.eventBus.onDeviceInterview(this, async (data) => {
// TODO: this is triggering for any `data.status`, any use outside `successful`?
// ZHC definition would skip from `device.definition?` but OnEvent from ZHC index wouldn't => triple triggering?
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interview failed doesn't (always) mean there is no definition, so I think this is fine.

Copy link
Copy Markdown
Collaborator Author

@Nerivec Nerivec Jan 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem I was thinking for this and all above is the fact status is: "started" | "successful" | "failed". So it always triggers twice?
We can take a closer look in a separate PR though, just though it was worth a TODO comment when I passed by these 😁

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It depends a bit wether we also want to trigger the onEvent on started, I guess not so only on failed or successful makes sense.

Copy link
Copy Markdown
Collaborator Author

@Nerivec Nerivec Jan 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think it's worth a look in a follow-up PR, same for the other 2 in HA and configure extensions, should de-dupe calls a bit.

@Koenkk
Copy link
Copy Markdown
Owner

Koenkk commented Jan 23, 2026

  1. Left a few TODOs, not all quite related to this specifically.

Replied!

  1. I modified reInterview (previously only used if triggered by bridge request) to behave more like if it had been triggered from ZH Controller. Should provide better feedback and align the logic?

Look good

  1. Image files as hex written to ota subdir of data dir, should be a bit cleaner. Files are auto-removed on unschedule, but not sure if we should also auto-remove on update end (tracing?)?

Let's keep them for now, these OTAs are fairly small so shouldn't be a problem.

device.zh.save();
}

await device.reInterview(this.eventBus);
Copy link
Copy Markdown
Collaborator Author

@Nerivec Nerivec Jan 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm thinking we might want to decouple this from the update process? Currently it would count as an OTA failure (response/logs) if the interview fails. Although, we probably should avoid triggering this immediately in the background, because the early-return would also trigger the read swBuildId/dateCode (lots of requests at once).

@Koenkk what do you think?

dc155b4
I think that's about as best as we can do?

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should definitely not count towards an OTA failure indeed, I'm not sure if 5 sec is too long, maybe battery powered devices will fall asleep?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might need to adjust this yes. I figure the call for read sw/date might it keep it awake long enough. Always tricky this sort of things we ZED anyway, probably, if it has to fail due to quick-sleep, it will fail the sw/date read... 😅

@Nerivec Nerivec marked this pull request as ready for review January 24, 2026 20:22
@Koenkk Koenkk merged commit dd1c449 into dev Jan 24, 2026
14 checks passed
@Koenkk Koenkk deleted the ota-refactor branch January 24, 2026 20:27
@Bjk8kds
Copy link
Copy Markdown

Bjk8kds commented Feb 1, 2026

Thank you so much!
Now it is easier to update custom firmware such as ZigbeeTlc @pvvx, etc.

Koenkk added a commit that referenced this pull request Feb 3, 2026
Co-authored-by: Koen Kanters <koenkanters94@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants