Problems discovering Infiniband Switches in Cloud Control 13c

We you buy an Exadata machine, you will probably use Oracle Could Control to manage the system and all the databases you are going to have inside.

In order to use it, you need to discover the Exadata Rack in Cloud Control.

The best approach to do this is to make a prerequisites validation using exadataDiscoveryPreCheck.pl script.

I have a customer trying to discover an Exadata Rack in Cloud Control 13c. Process was failing because an issue trying to validate the infiniband switches.

 

Trying to execute the prerequisites validation using exadataDiscoveryPreCheck.pl script, we found this issue:

 

 

Verifying Infiniband Switch version...
--------------------------------------
  Verifying version for xfpnhiddb001-iba1.uk.tsb infiniband switch...
  Could not invoke command version using SSH ===> Not ok
   * Please check the password and host status.
     Additionally please check that SSH is not blocked by a firewall.
  Verifying version for xfpnhiddb001-ibb1.uk.tsb infiniband switch...
   Could not invoke command version using SSH ===> Not ok
   * Please check the password and host status.
     Additionally please check that SSH is not blocked by a firewall.

When we tried to execute the version command manually from this host, it seems to work correctly

[oracle@xfpnhiddb001dbadm01:aci02pro ~]$ ssh nm2user@xfpnhiddb001-ibb1.uk.tsbversion
Password:
SUN DCS 36p version: 2.2.4-3
Build time: Dec  6 2016 13:08:04
SP board info:
Manufacturing Date: 2015.03.26
Serial Number: "NCDIW0345"
Hardware Revision: 0x0200
Firmware Revision: 0x0000
BIOS version: SUN0R100
BIOS date: 06/22/2010

[oracle@xfpnhiddb001dbadm01:aci02pro ~]$ ssh nm2user@xfpnhiddb001-iba1.uk.tsbversion
Password:
SUN DCS 36p version: 2.2.4-3
Build time: Dec  6 2016 13:08:04
SP board info:
Manufacturing Date: 2015.03.27
Serial Number: "NCDIW0176"
Hardware Revision: 0x0200
Firmware Revision: 0x0000
BIOS version: SUN0R100
BIOS date: 06/22/2010

We were trying to execute the validation in the same subnet and I set the password in a parameter file to avoid mistyping…, but the issue was still there.

Reviewing the exadataDiscoveryPreCheck.pl I saw that it is using ssh command in this way:

/usr/bin/ssh -o StrictHostKeyChecking=no -o ConnectTimeout=30  -o PreferredAuthentications=password -o NumberOfPasswordPrompts=1 nm2user@xfpnhiddb001-ibb1.uk.tsb version

 

Debugging this command I found that PreferredAuthentications=password was not a valid Authentication method for ssh in for the Infiniband …

[oracle@xfpnhiddb001dbadm01:aci02pro ~]$ /usr/bin/ssh -v -o StrictHostKeyChecking=no -o ConnectTimeout=30  -o PreferredAuthentications=password -o NumberOfPasswordPrompts=1 nm2user@xfpnhiddb001-ibb1.uk.tsb version
OpenSSH_5.3p1, OpenSSL 1.0.1e-fips 11 Feb 2013
debug1: Connecting to xfpnhiddb001-ibb1.uk.tsb [10.184.196.19] port 22.
debug1: fd 3 clearing O_NONBLOCK
debug1: Connection established.
debug1: identity file /home/oracle/.ssh/identity type -1
debug1: identity file /home/oracle/.ssh/identity-cert type -1
debug1: identity file /home/oracle/.ssh/id_rsa type 1
debug1: identity file /home/oracle/.ssh/id_rsa-cert type -1
debug1: identity file /home/oracle/.ssh/id_dsa type -1
debug1: identity file /home/oracle/.ssh/id_dsa-cert type -1
debug1: identity file /home/oracle/.ssh/id_ecdsa type -1
debug1: identity file /home/oracle/.ssh/id_ecdsa-cert type -1
debug1: Remote protocol version 2.0, remote software version OpenSSH_5.3
debug1: match: OpenSSH_5.3 pat OpenSSH*
debug1: Enabling compatibility mode for protocol 2.0
debug1: Local version string SSH-2.0-OpenSSH_5.3
debug1: SSH2_MSG_KEXINIT sent
debug1: SSH2_MSG_KEXINIT received
debug1: kex: server->client aes128-ctr hmac-sha1 none
debug1: kex: client->server aes128-ctr hmac-sha1 none
debug1: SSH2_MSG_KEX_DH_GEX_REQUEST(1024<2048<8192) sent
debug1: expecting SSH2_MSG_KEX_DH_GEX_GROUP
debug1: SSH2_MSG_KEX_DH_GEX_INIT sent
debug1: expecting SSH2_MSG_KEX_DH_GEX_REPLY
debug1: Host 'xfpnhiddb001-ibb1.uk.tsb' is known and matches the RSA host key.
debug1: Found key in /home/oracle/.ssh/known_hosts:19
debug1: ssh_rsa_verify: signature correct
debug1: SSH2_MSG_NEWKEYS sent
debug1: expecting SSH2_MSG_NEWKEYS
debug1: SSH2_MSG_NEWKEYS received
debug1: SSH2_MSG_SERVICE_REQUEST sent
debug1: SSH2_MSG_SERVICE_ACCEPT received
debug1: Authentications that can continue: publickey,keyboard-interactive
debug1: No more authentication methods to try.
Permission denied (publickey,keyboard-interactive).

 

This is caused due to the setting PasswordAuthentication no in the IB Switch’s /etc/ssh/sshd_config file, which disables clear text password authentication:

 

#cat /etc/ssh/sshd_config

# To disable tunneled clear text passwords, change to no here!
 #PasswordAuthentication yes
 # NM2 change - set PermitEmptyPasswords to no
 PermitEmptyPasswords no
 # NM2 change - set PasswordAuthentication to no
 PasswordAuthentication no

 

After changing this value to PasswordAuthentication yes it worked correctly the Exadata could be discovered.