Hi,
service fabric cluster configuration is set to use certificate common names instead of thumbprints.
After preparing and running the infrastructure scripts for SSL certificate rotation one orchestrator node did not get the certificate installed,
scripts did not indicate that anything failed. Test script showed also no errors or warnings but the certificate was not installed to this one node.
Result is that this node is down and unable to connect to the SF.
Error:
Get applications failed. Code: FABRIC_E_INVALID_ADDRESS Message: The supplied address was invalid.
I have tried everything, manually installed the certificate, removed and tried with scripts again which this time succeeded but the status on the node is still down.
Restarts do not help and there are warnings in the Windows Administrative Events logs:
Failed SecurityUtil::VerifyCertificate, error -2147017538, SecuritySetting: {provider=SSL protection=EncryptAndSign certType = 'cluster'
store='LocalMachine/My' findValue='FindByCommonName:*.abcd.ee' remoteX509Names=('*.abcd.ee',issuer=(alg = 1.2.840.113549.1.1.1, param =
ptr=0x161b12add00, size=2, key = ptr=0x161b0fb9fa0, size=20, bytes=3082010a0282010100c14bb3654770bcdd4f58db)) RemoteCertIssuers=('DigiCert
TLS RSA SHA256 2020 CA1', Store = Root) certChainFlags=40000000 isClientRoleInEffect=false claimBasedClientAuthEnabled=false }
client-10.30.0.20:19000/10.30.0.20:19000: error = S_OK, failureCount=257. This is conclusive that there is no listener. Connection failure is expected if listener was never started, or listener / its process was stopped before / during connecting. Filter by (type~Transport.St && ~"(?i)10.30.0.20:19000") on the other node to get listener lifecycle, or (type~Transport.GlobalTransportData) for all listeners.
How to get the node online again? Any help appreciated!