Skip to content

Enable creation of vms with solidfire root volumes on vmware67#4977

Closed
skattoju4 wants to merge 3 commits intoapache:mainfrom
skattoju4:solidfire_root_volumes_vmware_67
Closed

Enable creation of vms with solidfire root volumes on vmware67#4977
skattoju4 wants to merge 3 commits intoapache:mainfrom
skattoju4:solidfire_root_volumes_vmware_67

Conversation

@skattoju4
Copy link
Copy Markdown
Contributor

@skattoju4 skattoju4 commented Apr 30, 2021

Description

This PR is an attempt to fix the issue where vms cannot be created with solidfire root disks on vmware 6.5 and above #3598

Previously the approximate steps would be:

  1. Create snapshot of template volume
  2. Resignature the resulting vmfs datastore
  3. Expand the vmfs datastore if needed
  4. Rename the vmfs datastore (iscsi name with / replaced by -)
  5. Unmount the vmfs datastore
  6. Remove iscsi target (remove volume from host)

Seems like previously vmware would rescan the storage and remount the datastore but newer versions do not seem to do this.

When attempting to manually mount the datastore the following error occurs:

Operation failed, diagnostics report: Unable to find volume uuid[523486b0-32e55252-da6b-0cc47a30b11e] lvm [snap-4756ea21-52348590-76c4f8d5-a176-0cc47a30b11e] devices.

Note: Newer versions of vmware in addition to updating the uuid also add a label snap-snapID-oldLabel during the resignature process.

Its possible the devices are being filtered for some reason and therefore not getting re-added after a rescan. https://kb.vmware.com/s/article/1016222

  1. Start vm fails --> java.lang.RuntimeException: Datastore '-iqn.2010-01.com.solidfire:t5eb.root-150.150-0' is not accessible. No connected and accessible host is attached to this datastore.

This change modifies the steps as followings

  1. Create snapshot of template volume
  2. Resignature the resulting vmfs datastore (update the uuid since it would be identical to the template volume at creation)
  3. Expand the vmfs datastore if needed
  4. Rename the vmfs datastore (iscsi name with '/' replaced by '-')
  5. SKIP the step that unmounts the vmfs datastore
  6. SKIP the step that removes the iscsi target

Some checks expect a fcd folder to be present, ADD check to ensure folder exists before testing for existence of files inside it.

  1. Start the vm --> this now works.

However, this is not a final solution.

Steps to reproduce:
Create a vm with solidfire root disks on vmware 6.5 and above.
Expected: Sucessful vm creation
Actual: fails with java.lang.RuntimeException: Datastore '-iqn.2010-01.com.solidfire:t5eb.root-150.150-0' is not accessible. No connected and accessible host is attached to this datastore.

Attempts to Fixe: #3598

Types of changes

  • Breaking change (fix or feature that would cause existing functionality to change)
  • New feature (non-breaking change which adds functionality)
  • Bug fix (non-breaking change which fixes an issue)
  • Enhancement (improves an existing feature and functionality)
  • Cleanup (Code refactoring and cleanup, that may add test cases)

Feature/Enhancement Scale or Bug Severity

Feature/Enhancement Scale

  • Major
  • Minor

Bug Severity

  • BLOCKER
  • Critical
  • Major
  • Minor
  • Trivial

Screenshots (if appropriate):

How Has This Been Tested?

This has been tested on vmware 6.7 using cloudstack 4.15 and solidfire 12

@skattoju4 skattoju4 marked this pull request as draft April 30, 2021 01:10
@yadvr
Copy link
Copy Markdown
Member

yadvr commented Apr 30, 2021

@skattoju4 can you change PR branch to 4.15 and rebase it to 4.15 if this is a bugfix?

@skattoju4 skattoju4 changed the base branch from master to 4.15 April 30, 2021 05:23
@skattoju4 skattoju4 changed the title enable creation of vms with solidfire root volumes on vmware67 Enable creation of vms with solidfire root volumes on vmware67 Apr 30, 2021
@skattoju4
Copy link
Copy Markdown
Contributor Author

Ok done. This is currently somewhat of a hack and may not be finalized in time for 4.15 but we can adjust later.

@yadvr yadvr added this to the 4.15.1.0 milestone Apr 30, 2021
@yadvr
Copy link
Copy Markdown
Member

yadvr commented Apr 30, 2021

@skattoju4 alright I've tentatively moved to 4.15.1

@yadvr
Copy link
Copy Markdown
Member

yadvr commented May 10, 2021

Ping @skattoju4 any update on this, would this make it into 4.15.1, or should we move the milestone?

Comment on lines +1396 to +1398
//if (HypervisorType.VMware.equals(templateInfo.getHypervisorType())) {
// disconnectHostFromVolume(hostVO, volumeInfo.getPoolId(), volumeInfo.get_iScsiName());
//}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As git will keep the history, we can remove the code and move the comment from code to commit message.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

Comment on lines +283 to +284
//vmware 6.7 does not automatically mount datastores after rescanning once removed
//removeVmfsDatastore(cmd, hyperHost, datastoreName, storageHost, storagePortNumber, trimIqn(iScsiName), lstHosts);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As git will keep the history, we can remove the code and move the comment from code to commit message.

s_logger.info("File " + fileFullPath + " exists on datastore");
return true;
Boolean folderExists = true;
if(file.getDir() != ""){
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

import org.apache.commons.lang3.StringUtils;
        if (StringUtils.isNotEmpty(file.getDir())){

HostDatastoreBrowserSearchResults results = browserMo.searchDatastore(dirFile.getPath(), file.getFileName(), true);
if (results != null) {
List<FileInfo> info = results.getFile();
if (info != null && info.size() > 0) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

import org.apache.commons.collections.CollectionUtils;
            if (CollectionUtils.isNotEmpty(info)) {

@skattoju4
Copy link
Copy Markdown
Contributor Author

Ping @skattoju4 any update on this, would this make it into 4.15.1, or should we move the milestone?

No update yet. Not sure how to approach this. @Sjnas and @pdion891 are looking into it.

@yadvr
Copy link
Copy Markdown
Member

yadvr commented May 11, 2021

Okay @skattoju4 @Sjnas @pdion891 - pl advise by end of this week, otherwise we can revisit this for next dot-release. I've proposed to cut RC1 b/w 24-31 this month on dev ML.

…toreMO.java

Co-authored-by: Daniel Augusto Veronezi Salvador <38945620+GutoVeronezi@users.noreply.github.com>
@pdion891
Copy link
Copy Markdown
Contributor

@skattoju4 are you able to update the PR? if not, can you tell me what I can do to help. @rhtyd I'd apreciate if we could merge this fix in the next release. will this also go into master branch ?

@skattoju4
Copy link
Copy Markdown
Contributor Author

@pdion891 It needs more investigation and testing in a solidfire + vmware env. I have essentially removed some steps from the deployment process to make it work. There might be a better approach. Not sure if this breaks compatibility with older versions of vmware or if there are any side effects.

@yadvr
Copy link
Copy Markdown
Member

yadvr commented May 12, 2021

I'm happy to help @pdion891 as long as this PR can manage to get it reviewed/tested/merged by end of next week; as we've proposed a timeline for 4.15.1 RC1; otherwise we can target it for 4.15.2/4.16 in future.

ds.makeDirectory(String.format("[%s] %s", ds.getName(), vmName), dcMo.getMor());
}

if (!ds.folderExists(String.format("[%s]", ds.getName()), "fcd")) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use the constant "HypervisorHostHelper.VSPHERE_DATASTORE_BASE_FOLDER" for "fcd"

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use the constant "HypervisorHostHelper.VSPHERE_DATASTORE_BASE_FOLDER" for "fcd"

@skattoju4 any update on the changes suggested.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry for the delay.. will update later this evening.


if (!ds.folderExists(String.format("[%s]", ds.getName()), "fcd")) {
s_logger.info("fcd folder does not exist on target datastore, we will create one. vm: " + vmName + ", datastore: " + ds.getName());
ds.makeDirectory(String.format("[%s] %s", ds.getName(), "fcd"), dcMo.getMor());
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"fcd" => HypervisorHostHelper.VSPHERE_DATASTORE_BASE_FOLDER

@yadvr
Copy link
Copy Markdown
Member

yadvr commented May 14, 2021

@blueorangutan package

@blueorangutan
Copy link
Copy Markdown

@rhtyd a Jenkins job has been kicked to build packages. I'll keep you posted as I make progress.

@blueorangutan
Copy link
Copy Markdown

Packaging result: ✔️ centos7 ✔️ centos8 ✔️ debian. SL-JID 25

@yadvr
Copy link
Copy Markdown
Member

yadvr commented May 14, 2021

@blueorangutan test centos7 vmware-67u3

@blueorangutan
Copy link
Copy Markdown

@rhtyd a Trillian-Jenkins test job (centos7 mgmt + vmware-67u3) has been kicked to run smoke tests

@blueorangutan
Copy link
Copy Markdown

Trillian test result (tid-686)
Environment: vmware-67u3 (x2), Advanced Networking with Mgmt server 7
Total time taken: 57110 seconds
Marvin logs: https://github.com/blueorangutan/acs-prs/releases/download/trillian/pr4977-t686-vmware-67u3.zip
Intermittent failure detected: /marvin/tests/smoke/test_internal_lb.py
Intermittent failure detected: /marvin/tests/smoke/test_kubernetes_clusters.py
Intermittent failure detected: /marvin/tests/smoke/test_privategw_acl.py
Intermittent failure detected: /marvin/tests/smoke/test_routers_network_ops.py
Intermittent failure detected: /marvin/tests/smoke/test_vpc_redundant.py
Smoke tests completed. 84 look OK, 3 have error(s)
Only failed tests results shown below:

Test Result Time (s) Test File
test_03_vpc_privategw_restart_vpc_cleanup Failure 529.35 test_privategw_acl.py
test_01_RVR_Network_FW_PF_SSH_default_routes_egress_true Failure 352.64 test_routers_network_ops.py
test_02_RVR_Network_FW_PF_SSH_default_routes_egress_false Failure 353.61 test_routers_network_ops.py
test_01_create_redundant_VPC_2tiers_4VMs_4IPs_4PF_ACL Failure 593.11 test_vpc_redundant.py
test_03_create_redundant_VPC_1tier_2VMs_2IPs_2PF_ACL_reboot_routers Failure 528.50 test_vpc_redundant.py
test_05_rvpc_multi_tiers Failure 533.91 test_vpc_redundant.py
test_05_rvpc_multi_tiers Error 566.18 test_vpc_redundant.py

@yadvr
Copy link
Copy Markdown
Member

yadvr commented May 17, 2021

Ping @skattoju4 @pdion891 any update ? Has this passed your testing/validation and is this ready for merging?

@yadvr
Copy link
Copy Markdown
Member

yadvr commented May 19, 2021

cc @andrijapanicsb

@andrijapanicsb
Copy link
Copy Markdown
Contributor

cc @andrijapanicsb

Can't really help myself, as we don't use SolidFire - but I would be happy (for others) to see this merged, if @skattoju3 or @skattoju4 is happy with testing

@yadvr
Copy link
Copy Markdown
Member

yadvr commented May 20, 2021

Agree @andrijapanicsb. @skattoju4 @pdion891 @swill - can you advise when/if this is ready for merging and share your tests. We're looking to tentatively cut RC1 by 31st May 2021. Thanks.

// Seems like vmware 6.5 and above does not automatically remount the datastore after rescanning the storage.
// Not sure if this is needed.
// If using VMware, have the host rescan its software HBA if dynamic discovery is in use.
if (HypervisorType.VMware.equals(templateInfo.getHypervisorType())) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @harikrishna-patnala @nvazquez could this cause any regression?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@skattoju4 There are two more methods in which disconnectHostFromVolume() method is being called. Any specific reason why you want to remove this method call only here. Can you check the other two methods as well handleCreateManagedVolumeFromManagedSnapshot() and handleCopyAsyncToSecondaryStorage(). These two are also related to managed storage.

@yadvr
Copy link
Copy Markdown
Member

yadvr commented Aug 31, 2021

Ping @skattoju4

@skattoju skattoju force-pushed the solidfire_root_volumes_vmware_67 branch from fb1189a to a6e9386 Compare August 31, 2021 13:31
@skattoju4
Copy link
Copy Markdown
Contributor Author

have a bunch of conflicts to resolve and no vmware env to test currently.. don't think it will make 4.15.2 :(

@nvazquez
Copy link
Copy Markdown
Contributor

@skattoju thanks, moving it into the 4.16 milestone

@nvazquez nvazquez modified the milestones: 4.15.2.0, 4.16.0.0 Aug 31, 2021
@yadvr
Copy link
Copy Markdown
Member

yadvr commented Sep 14, 2021

Ping @skattoju4 any update on this for 4.16, would it say be ready in two weeks (which is tentative date for cutting RC)

@yadvr yadvr changed the base branch from 4.15 to main September 20, 2021 07:23
@yadvr
Copy link
Copy Markdown
Member

yadvr commented Sep 21, 2021

Ping @skattoju4

@yadvr yadvr modified the milestones: 4.16.0.0, 4.16.1.0 Sep 22, 2021
@yadvr
Copy link
Copy Markdown
Member

yadvr commented Sep 22, 2021

@skattoju4 moved to next minor 4.16.1, pl advise if this PR is ready and can make into 4.16.0 by end of this/next week. Thanks.

@yadvr yadvr modified the milestones: 4.16.1.0, 4.17.0.0 Nov 25, 2021
@nvazquez
Copy link
Copy Markdown
Contributor

nvazquez commented Feb 7, 2022

Hi @skattoju4 is this PR ready?

@skattoju
Copy link
Copy Markdown
Contributor

skattoju commented Feb 7, 2022

Hey @nvazquez have not worked on this in a while not sure if the problem still exists or if there is interest in resolving it..

@tsinik-dw
Copy link
Copy Markdown

Hi @skattoju4, I think this is a critical issue that deserves to be fixed if possible. The same problem exists since VMware 6.5 (#3598) and it became a no go factor for the ACS+Solidfire+VMware combination. In our last testing with ACS 4.16.0, the issue still exists. Thus, there is an active interest in us as our managed storage is based on Solidfire and are willing to help with any testing.
Furthermore, the End of Technical Guidance date for VMware 6.7 is on November 15, 2023 and it will be in production environments for some time more.

@svenvogel
Copy link
Copy Markdown
Contributor

@skattoju4 i think this should be fixed. so for production its important to fix this.

@nvazquez
Copy link
Copy Markdown
Contributor

nvazquez commented Mar 2, 2022

Hi @skattoju4 please advise if you are able to continue this work to include it on 4.17

@skattoju
Copy link
Copy Markdown
Contributor

skattoju commented Mar 2, 2022 via email

@nvazquez
Copy link
Copy Markdown
Contributor

nvazquez commented Mar 4, 2022

Ok @skattoju thanks for letting us know, I'll tentatively move it to the 4.18 milestone

@shwstppr
Copy link
Copy Markdown
Contributor

Closing this in favour of #6598

@shwstppr shwstppr closed this Jul 12, 2022
@pdion891
Copy link
Copy Markdown
Contributor

#6548

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.