Skip to content

Fix NPE on external/unmanaged instance import using custom offerings#12884

Merged
abh1sar merged 4 commits intoapache:4.20from
winterhazel:fix-regression-vm-import
Mar 27, 2026
Merged

Fix NPE on external/unmanaged instance import using custom offerings#12884
abh1sar merged 4 commits intoapache:4.20from
winterhazel:fix-regression-vm-import

Conversation

@winterhazel
Copy link
Member

@winterhazel winterhazel commented Mar 25, 2026

Description

This PR addresses a regression in the import of unmanaged/external VMs using a custom compute offering (reported in https://lists.apache.org/thread/1bvxjc197zhj61mtjxpm3tz1o27znjmv).

As both serviceOffering.getCpu() and serviceOffering.getRamSize() return null when the offering is custom constrained/unconstrained, we need to check the amount of CPUs and memory returned by the hypervisor in case an unmanaged instance is being imported, or the cpuNumber and memory details in case the instance belongs to a remote host/is being imported from its disk.

Types of changes

  • Breaking change (fix or feature that would cause existing functionality to change)
  • New feature (non-breaking change which adds functionality)
  • Bug fix (non-breaking change which fixes an issue)
  • Enhancement (improves an existing feature and functionality)
  • Cleanup (Code refactoring and cleanup, that may add test cases)
  • build/CI
  • test (unit or integration test code)

Feature/Enhancement Scale or Bug Severity

Bug Severity

  • BLOCKER
  • Critical
  • Major
  • Minor
  • Trivial

Screenshots (if appropriate):

How Has This Been Tested?

The following operations were validated for both fixed and custom offerings (see #12884 (comment) and #12884 (comment)):

  • Unmanaged KVM/VMware VM import.
  • KVM VM import from a remote host.
  • KVM VM import from an existing disk.
  • VMware to KVM instance conversion.

@winterhazel winterhazel added this to the 4.20.3 milestone Mar 25, 2026
@winterhazel winterhazel requested a review from abh1sar March 25, 2026 01:42
@codecov
Copy link

codecov bot commented Mar 25, 2026

Codecov Report

❌ Patch coverage is 86.15385% with 9 lines in your changes missing coverage. Please review.
✅ Project coverage is 16.26%. Comparing base (4b7370a) to head (5099423).
⚠️ Report is 2 commits behind head on 4.20.

Files with missing lines Patch % Lines
.../apache/cloudstack/vm/UnmanagedVMsManagerImpl.java 86.15% 6 Missing and 3 partials ⚠️
Additional details and impacted files
@@            Coverage Diff            @@
##               4.20   #12884   +/-   ##
=========================================
  Coverage     16.25%   16.26%           
- Complexity    13419    13428    +9     
=========================================
  Files          5664     5664           
  Lines        500467   500509   +42     
  Branches      60780    60785    +5     
=========================================
+ Hits          81354    81399   +45     
  Misses       410018   410018           
+ Partials       9095     9092    -3     
Flag Coverage Δ
uitests 4.15% <ø> (ø)
unittests 17.11% <86.15%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes a regression where importing unmanaged/external instances with custom (dynamic) compute offerings can hit NPEs when CPU/RAM are not populated on the offering, by deriving CPU/RAM from the hypervisor (unmanaged) or from provided details (external/disk-based imports) and moving resource-limit reservations closer to the specific import paths.

Changes:

  • Move VM (cpu/memory/vm count) resource-limit reservations out of top-level import methods and into specific import flows.
  • Add pre-checks to determine CPU/RAM for unmanaged-instance imports (hypervisor vs offering) and for external KVM imports (offering vs details).
  • Ensure reservations are closed via ReservationHelper.closeAll(...) around import/conversion flows.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@abh1sar
Copy link
Contributor

abh1sar commented Mar 25, 2026

@blueorangutan package

@blueorangutan
Copy link

@abh1sar a [SL] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress.

@blueorangutan
Copy link

Packaging result [SF]: ✔️ el8 ✔️ el9 ✔️ el10 ✔️ debian ✔️ suse15. SL-JID 17237

@abh1sar
Copy link
Contributor

abh1sar commented Mar 25, 2026

Tested the following test cases with custom and fixed service offerings and verified that the resource counts were being incremented accordingly

  1. Import Running VM from VMware to KVM
  2. Import Stopped VM from VMware to KVM
  3. Import Unmanaged VM in KVM
  4. Import KVM Instance from disk

@abh1sar
Copy link
Contributor

abh1sar commented Mar 25, 2026

@blueorangutan test

@blueorangutan
Copy link

@abh1sar a [SL] Trillian-Jenkins test job (ol8 mgmt + kvm-ol8) has been kicked to run smoke tests

@blueorangutan
Copy link

[SF] Trillian Build Failed (tid-15736)

@abh1sar
Copy link
Contributor

abh1sar commented Mar 25, 2026

@blueorangutan test

@blueorangutan
Copy link

@abh1sar a [SL] Trillian-Jenkins test job (ol8 mgmt + kvm-ol8) has been kicked to run smoke tests

Copy link
Contributor

@abh1sar abh1sar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@winterhazel
Copy link
Member Author

As @abh1sar already tested most of the affected workflows (#12884 (comment)), I focused on testing the remaining one (import of VMs belonging to remote hosts). My tests consisted of attempting to import an instance from a remote host using both fixed and custom offerings, validating that the process works as intended while ensuring that it is not possible to exceed the configured limits.

Evidence

Import using a custom offering

fabricio@fabricio-XPS-13-9310 ~/g/s/d/debbuild (fix-regression-vm-import)> cmk -p admin import vm name=test-1 clusterid=c3e550b6-8ba7-4a83-a9b4-a8b8c237c962 displayname=aaaaaa zoneid=5624e7c9-f374-468a-9379-6237fa56ccc7 importsource=external hypervisor=kvm host=192.168.32.11 username=root password=password diskpath= temppath= serviceofferingid=77204376-3820-45a7-9229-26968c4a7750 details[0].cpuNumber=1 details[0].cpuSpeed=1000 details[0].memory=512 domainid=530517b5-1ef5-4c1d-8e61-93c740cd72e9 account=d1 nicnetworklist[0].nic=0 nicnetworklist[0].network=ecf4cb79-2d74-4fdb-be24-519fa67e6630
{
  "account": "admin",
  "accountid": "f3405c8c-17c4-11f1-9235-32e0826870ba",
  "cmd": "org.apache.cloudstack.api.command.admin.vm.ImportVmCmd",
  "completed": "2026-03-26T08:58:41-0300",
  "created": "2026-03-26T08:58:40-0300",
  "domainid": "d1dca945-17c4-11f1-9235-32e0826870ba",
  "domainpath": "ROOT",
  "jobid": "9125a0fb-d775-4cdd-b0db-a56e19dcec3b",
  "jobprocstatus": 0,
  "jobresult": {
    "errorcode": 530,
    "errortext": "VM resource allocation error for account: f3887e64-5dd4-46ca-9cd5-bad68f245b2b. Maximum amount of resources of Type = 'cpu', tag = 'null' for Account Name = d1 in Domain Id = 2 is exceeded: Account Resource Limit = 1, Current Account Resource Amount = 1, Current Account Resource Reservation = 0, Requested Resource Amount = 1."
  },
  "jobresultcode": 530,
  "jobresulttype": "object",
  "jobstatus": 2,
  "userid": "f340dd7f-17c4-11f1-9235-32e0826870ba"
}
🙈 Error: async API failed for job 9125a0fb-d775-4cdd-b0db-a56e19dcec3b
fabricio@fabricio-XPS-13-9310 ~/g/s/d/debbuild (fix-regression-vm-import) [1]> cmk -p admin import vm name=test-1 clusterid=c3e550b6-8ba7-4a83-a9b4-a8b8c237c962 displayname=aaaaaa zoneid=5624e7c9-f374-468a-9379-6237fa56ccc7 importsource=external hypervisor=kvm host=192.168.32.11 username=root password=password diskpath= temppath= serviceofferingid=77204376-3820-45a7-9229-26968c4a7750 details[0].cpuNumber=1 details[0].cpuSpeed=1000 details[0].memory=512 domainid=530517b5-1ef5-4c1d-8e61-93c740cd72e9 account=d1 nicnetworklist[0].nic=0 nicnetworklist[0].network=ecf4cb79-2d74-4fdb-be24-519fa67e6630
{
  "account": "admin",
  "accountid": "f3405c8c-17c4-11f1-9235-32e0826870ba",
  "cmd": "org.apache.cloudstack.api.command.admin.vm.ImportVmCmd",
  "completed": "2026-03-26T08:58:56-0300",
  "created": "2026-03-26T08:58:55-0300",
  "domainid": "d1dca945-17c4-11f1-9235-32e0826870ba",
  "domainpath": "ROOT",
  "jobid": "575eca23-fcca-45fd-89d7-1095b06dec94",
  "jobprocstatus": 0,
  "jobresult": {
    "errorcode": 530,
    "errortext": "VM resource allocation error for account: f3887e64-5dd4-46ca-9cd5-bad68f245b2b. Maximum amount of resources of Type = 'memory', tag = 'null' for Account Name = d1 in Domain Id = 2 is exceeded: Account Resource Limit = 600, Current Account Resource Amount = 600, Current Account Resource Reservation = 0, Requested Resource Amount = 512."
  },
  "jobresultcode": 530,
  "jobresulttype": "object",
  "jobstatus": 2,
  "userid": "f340dd7f-17c4-11f1-9235-32e0826870ba"
}
🙈 Error: async API failed for job 575eca23-fcca-45fd-89d7-1095b06dec94
fabricio@fabricio-XPS-13-9310 ~/g/s/d/debbuild (fix-regression-vm-import) [1]> cmk -p admin import vm name=test-1 clusterid=c3e550b6-8ba7-4a83-a9b4-a8b8c237c962 displayname=aaaaaa zoneid=5624e7c9-f374-468a-9379-6237fa56ccc7 importsource=external hypervisor=kvm host=192.168.32.11 username=root password=password diskpath= temppath= serviceofferingid=77204376-3820-45a7-9229-26968c4a7750 details[0].cpuNumber=1 details[0].cpuSpeed=1000 details[0].memory=512 domainid=530517b5-1ef5-4c1d-8e61-93c740cd72e9 account=d1 nicnetworklist[0].nic=0 nicnetworklist[0].network=ecf4cb79-2d74-4fdb-be24-519fa67e6630
{
  "virtualmachine": {
    "account": "d1",
    "affinitygroup": [],
    "arch": "x86_64",
    "cpunumber": 1,
    "cpuspeed": 1000,
    "created": "2026-03-26T08:59:10-0300",
    "deleteprotection": false,
    "details": {
      "cpuNumber": "1",
      "cpuSpeed": "1000",
      "deployvm": "true",
      "memory": "512",
      "nicAdapter": "virtio",
      "rootDiskController": "virtio"
    },
    "displayname": "aaaaaa",
    "displayvm": true,
    "domain": "d1",
    "domainid": "530517b5-1ef5-4c1d-8e61-93c740cd72e9",
    "domainpath": "/d1/",
    "guestosid": "d1e13ee9-17c4-11f1-9235-32e0826870ba",
    "haenable": false,
    "hasannotations": false,
    "hypervisor": "KVM",
    "id": "8fefeb4a-098c-43a8-abeb-11e370af2e9a",
    "instancename": "i-4-25-VM",
    "isdynamicallyscalable": false,
    "memory": 512,
    "name": "test-1",
    "nic": [
      {
        "broadcasturi": "vlan://125",
        "deviceid": "0",
        "extradhcpoption": [],
        "id": "f383aa48-ac50-4871-b183-7c94456ac240",
        "isdefault": true,
        "isolationuri": "vlan://125",
        "macaddress": "52:54:00:5d:1d:00",
        "networkid": "ecf4cb79-2d74-4fdb-be24-519fa67e6630",
        "networkname": "q",
        "secondaryip": [],
        "traffictype": "Guest",
        "type": "L2"
      }
    ],
    "osdisplayname": "CentOS 4.5 (32-bit)",
    "ostypeid": "d1e13ee9-17c4-11f1-9235-32e0826870ba",
    "passwordenabled": false,
    "pooltype": "Filesystem",
    "receivedbytes": 0,
    "rootdeviceid": 0,
    "rootdevicetype": "ROOT",
    "securitygroup": [],
    "sentbytes": 0,
    "serviceofferingid": "77204376-3820-45a7-9229-26968c4a7750",
    "serviceofferingname": "custom constrained",
    "state": "Stopped",
    "tags": [],
    "templatedisplaytext": "VM Import Default Template",
    "templateformat": "ISO",
    "templateid": "3e2084ff-9a45-4a64-b9bb-5f56b49ba249",
    "templatename": "kvm-default-vm-import-dummy-template",
    "templatetype": "SYSTEM",
    "userid": "28b1c54f-0636-4352-ad04-ee9117135aac",
    "username": "d1",
    "zoneid": "5624e7c9-f374-468a-9379-6237fa56ccc7",
    "zonename": "zn"
  }
}

Import using a fixed offering

fabricio@fabricio-XPS-13-9310 ~/g/s/d/debbuild (fix-regression-vm-import) [1]> cmk -p admin import vm name=test-1 clusterid=c3e550b6-8ba7-4a83-a9b4-a8b8c237c962 displayname=aaaaaa zoneid=5624e7c9-f374-468a-9379-6237fa56ccc7 importsource=external hypervisor=kvm host=192.168.32.11 username=root password=password diskpath= temppath= serviceofferingid=de951628-d8b4-41c3-a4f4-0b19b7f1b447 domainid=530517b5-1ef5-4c1d-8e61-93c740cd72e9 account=d1 nicnetworklist[0].nic=0 nicnetworklist[0].network=ecf4cb79-2d74-4fdb-be24-519fa67e6630
{
  "account": "admin",
  "accountid": "f3405c8c-17c4-11f1-9235-32e0826870ba",
  "cmd": "org.apache.cloudstack.api.command.admin.vm.ImportVmCmd",
  "completed": "2026-03-26T09:01:57-0300",
  "created": "2026-03-26T09:01:57-0300",
  "domainid": "d1dca945-17c4-11f1-9235-32e0826870ba",
  "domainpath": "ROOT",
  "jobid": "6d7f1277-8877-4f19-8cca-0ac5e4c2cb9c",
  "jobprocstatus": 0,
  "jobresult": {
    "errorcode": 530,
    "errortext": "VM resource allocation error for account: f3887e64-5dd4-46ca-9cd5-bad68f245b2b. Maximum amount of resources of Type = 'cpu', tag = 'null' for Account Name = d1 in Domain Id = 2 is exceeded: Account Resource Limit = 1, Current Account Resource Amount = 1, Current Account Resource Reservation = 0, Requested Resource Amount = 1."
  },
  "jobresultcode": 530,
  "jobresulttype": "object",
  "jobstatus": 2,
  "userid": "f340dd7f-17c4-11f1-9235-32e0826870ba"
}
🙈 Error: async API failed for job 6d7f1277-8877-4f19-8cca-0ac5e4c2cb9c
fabricio@fabricio-XPS-13-9310 ~/g/s/d/debbuild (fix-regression-vm-import) [1]> cmk -p admin import vm name=test-3 clusterid=c3e550b6-8ba7-4a83-a9b4-a8b8c237c962 displayname=aaaaaa zoneid=5624e7c9-f374-468a-9379-6237fa56ccc7 importsource=external hypervisor=kvm host=192.168.32.11 username=root password=password diskpath= temppath= serviceofferingid=de951628-d8b4-41c3-a4f4-0b19b7f1b447 domainid=530517b5-1ef5-4c1d-8e61-93c740cd72e9 account=d1 nicnetworklist[0].nic=0 nicnetworklist[0].network=ecf4cb79-2d74-4fdb-be24-519fa67e6630
{
  "account": "admin",
  "accountid": "f3405c8c-17c4-11f1-9235-32e0826870ba",
  "cmd": "org.apache.cloudstack.api.command.admin.vm.ImportVmCmd",
  "completed": "2026-03-26T09:03:44-0300",
  "created": "2026-03-26T09:03:44-0300",
  "domainid": "d1dca945-17c4-11f1-9235-32e0826870ba",
  "domainpath": "ROOT",
  "jobid": "35447424-533b-4654-bf29-b7a63d01c4c7",
  "jobprocstatus": 0,
  "jobresult": {
    "errorcode": 530,
    "errortext": "VM resource allocation error for account: f3887e64-5dd4-46ca-9cd5-bad68f245b2b. Maximum amount of resources of Type = 'memory', tag = 'null' for Account Name = d1 in Domain Id = 2 is exceeded: Account Resource Limit = 1600, Current Account Resource Amount = 1536, Current Account Resource Reservation = 0, Requested Resource Amount = 512."
  },
  "jobresultcode": 530,
  "jobresulttype": "object",
  "jobstatus": 2,
  "userid": "f340dd7f-17c4-11f1-9235-32e0826870ba"
}
🙈 Error: async API failed for job 35447424-533b-4654-bf29-b7a63d01c4c7
fabricio@fabricio-XPS-13-9310 ~/g/s/d/debbuild (fix-regression-vm-import) [1]> cmk -p admin import vm name=test-1 clusterid=c3e550b6-8ba7-4a83-a9b4-a8b8c237c962 displayname=aaaaaa zoneid=5624e7c9-f374-468a-9379-6237fa56ccc7 importsource=external hypervisor=kvm host=192.168.32.11 username=root password=password diskpath= temppath= serviceofferingid=de951628-d8b4-41c3-a4f4-0b19b7f1b447 domainid=530517b5-1ef5-4c1d-8e61-93c740cd72e9 account=d1 nicnetworklist[0].nic=0 nicnetworklist[0].network=ecf4cb79-2d74-4fdb-be24-519fa67e6630
{
  "virtualmachine": {
    "account": "d1",
    "affinitygroup": [],
    "arch": "x86_64",
    "cpunumber": 1,
    "cpuspeed": 500,
    "created": "2026-03-26T09:02:16-0300",
    "deleteprotection": false,
    "details": {
      "deployvm": "true",
      "nicAdapter": "virtio",
      "rootDiskController": "virtio"
    },
    "displayname": "aaaaaa",
    "displayvm": true,
    "domain": "d1",
    "domainid": "530517b5-1ef5-4c1d-8e61-93c740cd72e9",
    "domainpath": "/d1/",
    "guestosid": "d1e13ee9-17c4-11f1-9235-32e0826870ba",
    "haenable": false,
    "hasannotations": false,
    "hypervisor": "KVM",
    "id": "7e005112-014f-4cbe-86cb-21380e0a13b9",
    "instancename": "i-4-27-VM",
    "isdynamicallyscalable": false,
    "memory": 512,
    "name": "test-1",
    "nic": [
      {
        "broadcasturi": "vlan://125",
        "deviceid": "0",
        "extradhcpoption": [],
        "id": "d3978211-380d-4293-a899-3cb4186757ac",
        "isdefault": true,
        "isolationuri": "vlan://125",
        "macaddress": "02:01:00:ce:00:08",
        "networkid": "ecf4cb79-2d74-4fdb-be24-519fa67e6630",
        "networkname": "q",
        "secondaryip": [],
        "traffictype": "Guest",
        "type": "L2"
      }
    ],
    "osdisplayname": "CentOS 4.5 (32-bit)",
    "ostypeid": "d1e13ee9-17c4-11f1-9235-32e0826870ba",
    "passwordenabled": false,
    "pooltype": "NetworkFilesystem",
    "receivedbytes": 0,
    "rootdeviceid": 0,
    "rootdevicetype": "ROOT",
    "securitygroup": [],
    "sentbytes": 0,
    "serviceofferingid": "de951628-d8b4-41c3-a4f4-0b19b7f1b447",
    "serviceofferingname": "Small Instance",
    "state": "Stopped",
    "tags": [],
    "templatedisplaytext": "VM Import Default Template",
    "templateformat": "ISO",
    "templateid": "3e2084ff-9a45-4a64-b9bb-5f56b49ba249",
    "templatename": "kvm-default-vm-import-dummy-template",
    "templatetype": "SYSTEM",
    "userid": "28b1c54f-0636-4352-ad04-ee9117135aac",
    "username": "d1",
    "zoneid": "5624e7c9-f374-468a-9379-6237fa56ccc7",
    "zonename": "zn"
  }
}

Only the unit tests suggested at #12884 (comment) are pending now.

@blueorangutan
Copy link

[SF] Trillian test result (tid-15742)
Environment: kvm-ol8 (x2), zone: Advanced Networking with Mgmt server ol8
Total time taken: 59092 seconds
Marvin logs: https://github.com/blueorangutan/acs-prs/releases/download/trillian/pr12884-t15742-kvm-ol8.zip
Smoke tests completed. 140 look OK, 1 have errors, 0 did not run
Only failed and skipped tests results shown below:

Test Result Time (s) Test File
ContextSuite context=TestClusterDRS>:setup Error 0.00 test_cluster_drs.py

@winterhazel
Copy link
Member Author

@blueorangutan package

@blueorangutan
Copy link

@winterhazel a [SL] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress.

@winterhazel winterhazel marked this pull request as ready for review March 26, 2026 21:56
@blueorangutan
Copy link

Packaging result [SF]: ✔️ el8 ✔️ el9 ✔️ el10 ✔️ debian ✔️ suse15. SL-JID 17257

@abh1sar abh1sar merged commit 2416db2 into apache:4.20 Mar 27, 2026
25 of 26 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants