Moveworks Agent Troubleshooting Guide
Common Procedures
Configuring LDAP over SSL aka LDAPS (636)
Prerequisites
First, understand if you need a certificate file to connect to the LDAP server over SSL. If so, you will need to supply a base 64 encoded ASCII certificate file, which is typically a .pem server cert file. This certificate is used to verify that the LDAP server we are communicating to is who they say they are. The content of the cert should be a “cert chain” should look like something below, and should live in the ./certs
directory. Wildcard certs are not accepted per Microsoft: https://docs.microsoft.com/en-US/troubleshoot/windows-server/identity/enable-ldap-over-ssl-3rd-certification-authority
The file should look like below and be a .pem server cert file.
-----BEGIN CERTIFICATE-----
content of your domain certificate
-----END CERTIFICATE-----
-----BEGIN CERTIFICATE-----
content of any intermediate CA certificate
-----END CERTIFICATE-----
If the exported certificate is a .p7b
file, you will need to use the following command to convert it from .p7b
to .pem
instead of simply renaming the file:
openssl pkcs7 -print_certs -in certnew.p7b -out ca_chain.pem
More details here: https://knowledge.digicert.com/solution/SO21448.html
Verifying the cert is valid:
Run the following command to verify that the certificate is signed by a known certificate authority. If the certificate is only signed by the client, that will not work.
openssl x509 -in <certificate_name> -text -noout
The Active Directory fully qualified domain name of the domain controller appears in one of the following locations:
- The common name (CN) in the Subject field
- The Subject Alternative Name (SAN) extension in the DNS entry
Pulling the Cert on Your Own
You may be able to pull the cert by connecting the the LDAPs server:
openssl s_client -connect <domain.host.com>:<port> -showcerts </dev/null 2>/dev/null | openssl x509 -outform PEM > cert.pem
Configuration Changes/ Updates:
RECOMMENDED: Run the setup script again, and choose the SSL option to supply the LDAP certificate path.
You can also do the following changes manually (after shutting down the agent).
I. Modify /conf/agent_config
.yml
- Must use LDAP hostname, not IP address.
- Use
port: 636
- Add
use_ssl: true
- If using a cert that is not well-known, you must set the container path of the certificate. The directory path is static as it refers to the path within the docker image. The certificate file name will change with what you’ve named it locally.
path_to_cert: /home/moveworks/agent/certs/mycert.pem
II. Modify ./start_agent.sh
- Add this additional argument to the script, the first directory should be the absolute directory path on the local VM.
v "/home/svcmoveworks/certs:/home/moveworks/agent/certs" \
Example agent_config.yml with LDAPS enabled
Example config when using a cert:
ldap_config:
enabled: true
host: ldaps.customer.com
port: 636
service_password:
encrypted_value: <password>
service_user: <username>
use_ssl: true
path_to_cert: /home/moveworks/agent/certs/mycert.pem
Example config when CA is a well-known: NOTE: port # is set to 636 instead or 389, use_ssl is set to true, but no cert is supplied in the config
ldap_config:
enabled: true
host: ldaps.customer.com
port: 636
service_password:
encrypted_value: <password>
service_user: <username>
use_ssl: true
Example config when switching to LDAP
ldap_config:
enabled: true
host: ldap.customer.com
port: 389
service_password:
encrypted_value: <password>
service_user: <username>
TLS Skip Verify
Use for self signed certificates
ldap_config:
enabled: true
host: ldaps.customer.com
port: 636
service_password:
encrypted_value: <password>
service_user: <username>
use_ssl: true
path_to_cert: /home/moveworks/agent/certs/mycert.pem
tls_skip_verify: true
Validating LDAP Service Account
Verifying Access via AD Explorer
To test the access level of the LDAP service account, we can use AD Explorer (downloaded from here) to validate if the service account has access to the LDAP server.
- First, grab the host, service account username, and password.
- Open AD Explorer, and connect to the LDAP host. Note that you may need to connect to VPN in order to connect successfully.
- Once logged in successfully, you can navigate to groups and validate as needed.
- Select one of the distribution groups. On the right side, you can see the full details.
Note: we've run into issues where a service password with a special character might cause problems. If all else fails, you can try resetting the password to something without special characters.
Verifying Access via Bash
-
To find the LDAP host name or IP address, you can run the command:
nslookup -q=SRV _ldap._tcp.<DOMAIN_NAME>.com
-
For Active Directory, LDAP Service User is formatted as netbios domain name with a backlash and then the service account name (e.g. WHS\svc_moveworks)
-
To test and validate the service account credentials, you can run these commands:
-
Operation to request user's authzid
ldapwhoami -x -H <LDAP_SERVER_HOSTNAME_OR_IP> -w <SVC_ACCOUNT_PASSWORD> -D "<NETBIOS_DOMAIN>\<SVC_ACCOUNT_NAME>"
e.g. - u:WHS/svc_moveworks
-
Operation to search for object
ldapsearch -x -H ldaps://<LDAP_SERVER_HOSTNAME_OR_IP>:636 -W -D "<NETBIOS_DOMAIN>\<SVC_ACCOUNT_NAME>" -b "<BASE_DN>"
Note: For RedHat OS, you will need to install openldap to run the above commands
-
To install run:
yum install openldap*
-
-
Another LDAPs search example (returns users):
ldapsearch -x -LLL -H ldaps://ldap.custom.com:636 -D "username_here" -w "password_here" -b "dc=customer,dc=com" -s sub "(objectClass=user)" givenName
-
-
After completing the configuration process, an agent_config.yml file will be generated in ./config. Please verify the contents of the ./config/agent_config.yml are as expected.
-
Run the following command to get the NAT Gateway IP that needs to be whitelisted on our end. When doing this, it is best to also confirm with the customer.
-
If VM is internet accessible, you can run the following to get the IP address:
wget -qO- http://ipecho.net/plain ; echo
-
-
Whitelist IP
- Enter the IP in this file
infra/terraform/agent/prod/variables.t
- Enter the IP in this file
Confirming NET_BIOS Domain for AD Service Account
-
Sometimes a NETBIOS_DOMAIN must appear in the username of the service account in the format of NETBIOS_DOMAIN\username
- e.g:
MOVEWORKS\svc_ad_moveworks
- e.g:
-
To check open ‘Active Directory Users and Computers’
-
Click on the ‘Find Objects in Active Directory Domain Services’ icon on the toolbar. (Looks like folder with magnifying glass)
-
Search for service account
-
Double-click on the account
-
Select ‘Account’ in pop-up and the if there is something in the ‘User logon name’, that should be used as the
netbios\\domain
-
Update Service Account Username in config to
netbios_domain\\username
, for the above it would be MOVEWORKS\svc_ad_moveworks
Known Issues and their Fixes
🔴 config: Moveworks access secret is invalid
This is exactly what it sounds like, your secret is the problem
Sometimes, this means that the account is locked out. Look for the data
error code in the log line.
52e: Invalid credentials—password expiry could be one of the main reasons. Please reset the service account password on call, lock and unlock the account, and try again after updating the new credentials in the agent_config.yml file.
775: account is locked out
🔴 Permission Denied when using start agent script
open /home/moveworks/agent/conf/agent_config.yml: permission denied
This happens when you try to start the agent configuration script with a user whose group ID does not match what is expected in our script. Run through the steps again and ensure you use the correct group ID listed in our setup guide - we cannot use any other ID since the agent is looking for a specific one.
🔴 Permission Denied when accessing certificate file
Often when installing updated certificate files, we SCP the new cert into the desired machine’s cert folder. However, this new cert does not have the same file permissions as the previous cert our agent was leveraging. In order to quickly copy file permissions we can use the chmod --reference=reference_file file
command.
# navigate to your certs directory
cd /home/moveworks/certs
# view file permissions for each file in the directory
ls -ltr
# locate the file that you want to copy file permissions from (likely the old certificate)
# apply the file permissions from your old cert to your new
sudo chmod --reference=oldCERT.pem newCERT.pem
ls -l newCERT.pem
🔴 Agent cannot communicate with Moveworks
bond server reply: rest_pool: Get \"https://{{customer_itsm}}/rest/servicedeskapi/servicedesk/1\": Proxy Authentication Required",
This happens when you have a Proxy for a system (AD for e.g.), but your internal ITSM system or Knowledge System needs the Agent to bypass the proxy. In such cases, please add the following line to the system's config line that needs to bypass the proxy (same level as enabled: true
):
do_not_use_rest_proxy: true
🔴 Agent cannot communicate with customer’s system (e.g. Confluence)
bond server reply: rest_pool: Get \"https://<base_url>/rest/api/content?limit=25&spaceKey=AHLP\": Forbidden
This happens when you have set up a proxy for outbound connections but don’t require a proxy for their internal system. In such cases, please add the following line to the system's config line that needs to bypass the proxy (same level as enabled: true
):
do_not_use_rest_proxy: true
🔴 Failure when authenticating to Moveworks
retry: Received status code 401 while trying to authenticate (moveworks_user=xxxx)
- Invalid Access Secret (token expires if not used) - if the token is not used within 30 days of generating, it will expire. Please generate a new token in the MW Setup Agent Page.
To validate if a token works, you can use the following command:
curl --header "Content-Type: application/json" --request POST
--data '{"access_secret":"{ADD YOUR TOKEN HERE}", "access_key": "mykey"}' https://agent.moveworks.com/api/v1/auth
Config fail:
curl --header "Content-Type: application/json" --header "Authorization: " --request POST https://agent.moveworks.com/api/v1/config
Firewall / Whitelisting tests:
- Run ping google.com to see if outgoing traffic is allowed
- Try to curl https://agent.moveworks.com/api/v1/web to see if our agent service can be hit
🔴 Agent fails to start with “Operation Not Permitted” Error
Issue: This error is caused by lack of executable permissions to the user that you are trying to run this containers as. Work with your sys admin to get this resolved on a VM level. If not, simply removing this line from start_agent.sh
will allow the container to start.
--security-opt=no-new-privileges
🔴 I/O Timeout in Agent Logs
[ERROR] [2020-09-11T00:43:20Z] [moveworks/golang/utils/retry/retry.go:177] retry: Post "https://agent.moveworks.com/***********": dial tcp: lookup agent.moveworks.com on 10.255.255.6:53: read udp 172.17.0.2:47416->10.255.255.6:53: i/o timeout (moveworks_user=equinix)
Issue: This error shows that the agent residing inside docker is unable to make an outbound connection to our agent system. This can be caused by several reasons such as firewall on your side or whitelisting on our side or potential a docker networking issue.
Try running the agent in host network mode
(see below Docker Networking issue - ‘No Route to Host’ Error)
🔴 CA Bundle Certificate missing
Error: retry: Post " *******": context deadline exceeded (Client.Timeout exceeded while awaiting headers) (moveworks_user=chubb)
Issue: CA Bundle certs or agent certs are not being recognized
Potential Fix:
-
Find where the ca bundle is hosted on the machine by running
openssl version -d
Sample output:OPENSSLDIR: "/etc/pki/tls"
-
Go to the path and find the ca bundle cert Sample path:
/etc/pki/tls/certs/ca-bundle.crt
-
Mount the volume in the start_agent.sh
- Sample parameter:
-v /etc/pki/:/etc/pki/
- Sample parameter:
-
Map the path in
conf/agent_config.yml
Sample config:moveworks_config: path_to_cert: /etc/pki/tls/certs/ca-bundle.crt
🔴 Docker Networking issue - ‘No Route to Host’ Error
- Restart the docker daemon on the machine by running
sudo systemctl restart docker
and then bring up a new agent
If you run into a no route to host error, try using telnet on the server to see if the server can even access the LDAP Domain Controller:
If you see something like below, then there is a networking issue on the customer’s side
If not, then you can try the following steps as a workaround to use the host machine’s networking configuration.
- Edit the
start_agent.sh
and add the--net=host
flag to enablehost network mode
- More information on that setting here: https://stackoverflow.com/questions/43316376/what-does-net-host-option-in-docker-command-really-do
- Restart the agent
- Note: The logs under logs folder will no longer have the image ID in the title and will instead use the machine name.
🔴 Docker container is shut down upon server restart
- Run
sudo systemctl enable docker
after installing the agent to ensure docker is restarted whenever the server is.
Docker complains about the log size constraints
When running the config script you get the following error:
docker: Error response from daemon: unknown log opt ‘max-size’/‘max-file’/etc for journald log driver
Solution: Edit the config script in vim and add the following line --log-driver json-file
🔴 Docker ps says image is in ‘Restarting’ state, logs might not be generating
This is likely a permission issue on either the config file, or the certs file. When copying or creating files, the strictest permissions are applied. Run chmod 777
on the config file and the certificate file if you have it. Restart the Agent. if it is still an issue, run it on all of the agent related files.
🔴 Certificate Issues
If root CA certs don’t work, try cat-ing them to see if they’re in base64. It should look something like:
-----BEGIN CERTIFICATE-----
MIIFbTCCA1WgAwIBAgIJAN338vEmMtLsMA0GCSqGSIb3DQEBCwUAME0xCzAJBgNV
BAYTAlVLMRMwEQYDVQQIDApUZXN0LVN0YXRlMRUwEwYDVQQKDAxHb2xhbmcgVGVz
dHMxEjAQBgNVBAMMCXRlc3QtZmlsZTAeFw0xNzAyMDEyMzUyMDhaFw0yNzAxMzAy
MzUyMDhaME0xCzAJBgNVBAYTAlVLMRMwEQYDVQQIDApUZXN0LVN0YXRlMRUwEwYD
VQQKDAxHb2xhbmcgVGVzdHMxEjAQBgNVBAMMCXRlc3QtZmlsZTCCAiIwDQYJKoZI
GKj0lGpnLfGqwhs2/s3jpY7+pcvVQxEpvVTId5byDxu1ujP4HjO/VTQ2P72rE8Ft
r05pE3PdHn9JrCl4iWdVlgtiI9BoPtQyDfa/OEFaScE8KYR8LxaAgdgp3zYncWls
BpwQ6Y/A2wIkhlD9eEp5Ib2hz7isXOs9UwjdriKqrBXqcIAE5M+YIk3+KAQKxAtd
4YsK3CSJ010uphr12YKqlScj4vuKFjuOtd5RyyMIxUG3lrrhAu2AzCeKCLdVgA8+
75FrYMApUdvcjp4uzbBoED4XRQlx9kdFHVbYgmE/+yddBYJM8u4YlgAL0hW2/D8p
z9JWIfxVmjJnBnXaKGBuiUyZ864A3PJndP6EMMo7TzS2CDnfCYuJjvI0KvDjFNmc
rQA04+qfMSEz3nmKhbbZu4eYLzlADhfH8tT4GMtXf71WLA5AUHGf2Y4+HIHTsmHG
vQ==
-----END CERTIFICATE-----
🔴 x509: certificate signed by unknown authority
There are a few different places where this can occur:
-
Error when connecting to the Moveworks agent API
If you see this error in the agent log when attempting to connect to agent.moveworks.com it is likely a problem of not having a local cert signed by a Known Authority
Note: This error is not related to the certificate provided to connect to the customer’s on-premise servers (.pem file), it’s the system cert used to connect to our API[ERROR] [2021-12-23T14:40:10Z] [moveworks/golang/utils/retry/retry.go:177] retry: Post "https://agent.moveworks.com/***********": x509: certificate signed by unknown authority (moveworks_user={org})
You can verify this using this curl command. (if you run this command on your local machine you should see the expected output)
curl -vs https://agent.moveworks.com/api/v1/web
The output will show you the handshake process step-by-step:
* Trying 54.149.4.60:443... * TCP_NODELAY set * Connected to agent.moveworks.com (54.149.4.60) port 443 (#0) * ALPN, offering h2 * ALPN, offering http/1.1 * successfully set certificate verify locations: * CAfile: /etc/ssl/certs/ca-certificates.crt CApath: /etc/ssl/certs * TLSv1.3 (OUT), TLS handshake, Client hello (1): * TLSv1.3 (IN), TLS handshake, Server hello (2): * TLSv1.3 (OUT), TLS change cipher, Change cipher spec (1): * TLSv1.3 (OUT), TLS handshake, Client hello (1): * TLSv1.3 (IN), TLS handshake, Server hello (2): * TLSv1.3 (IN), TLS handshake, Encrypted Extensions (8): * TLSv1.3 (IN), TLS handshake, Certificate (11): * TLSv1.3 (OUT), TLS alert, unknown CA (560): * SSL certificate problem: unable to get local issuer certificate * Closing connection 0
Things to note:
CApath: /etc/ssl/certs
Shows you where the certs are stored locally.CAfile: /etc/ssl/certs/ca-certificates.crt
Shows you the specific cert being used in this call.TLSv1.3 (OUT), TLS alert, unknown CA (560):
Means the cert is signed by an unknown authoritySSL certificate problem: unable to get local issuer certificate
We were unable to find an acceptable cert
Solution: Ask the customer’s admins to install a cert from a known trusted certificate authority and reference it
Last resort: Copy the combine cert file (sometimes in/etc/ssl/certificates/ca-certificates.crt
) to the agent’s/cert
folder, mount it in./start_agent.sh
and the refer to it undermoveworks_config
#!/bin/bash docker run \ -d \ --read-only \ --security-opt=no-new-privileges \ --restart=unless-stopped \ --log-driver=json-file \ --log-opt max-size=10m \ --log-opt max-file=5 \ -v "$(pwd)/conf":/home/moveworks/agent/conf \ -v "$(pwd)/logs":/var/log/moveworks \ -v "$(pwd)/certs":/home/moveworks/agent/certs \ moveworks_agent
ldap_config: enabled: true host: #hostname here port: 636 service_password: #password here service_user: #userame here use_ssl: true path_to_cert: /home/moveworks/agent/certs/ca-certificates.crt # This is the cert used to connect to LDAP moveworks_config: access_key: #orgname here access_secret: #org secret here auth_url: https://agent.moveworks.com/api/v1/auth config_url: https://agent.moveworks.com/api/v1/config proxy_url_enc: #proxy secret here path_to_cert: /home/moveworks/agent/certs/ca-certificates.crt # < -------- This is the cert used to connect to moveworks API
You can also test specific certs using curl to see if they work:
curl -vs --cacert {cert_path} https://agent.moveworks.com/api/v1/web
A successful connection looks like this:
* Trying 54.149.4.60:443... * TCP_NODELAY set * Connected to agent.moveworks.com (54.149.4.60) port 443 (#0) * ALPN, offering h2 * ALPN, offering http/1.1 * successfully set certificate verify locations: * CAfile: /etc/ssl/certs/ca-certificates.crt CApath: /etc/ssl/certs * TLSv1.3 (OUT), TLS handshake, Client hello (1): * TLSv1.3 (IN), TLS handshake, Server hello (2): * TLSv1.3 (OUT), TLS change cipher, Change cipher spec (1): * TLSv1.3 (OUT), TLS handshake, Client hello (1): * TLSv1.3 (IN), TLS handshake, Server hello (2): * TLSv1.3 (IN), TLS handshake, Encrypted Extensions (8): * TLSv1.3 (IN), TLS handshake, Certificate (11): * TLSv1.3 (IN), TLS handshake, CERT verify (15): * TLSv1.3 (IN), TLS handshake, Finished (20): * TLSv1.3 (OUT), TLS handshake, Finished (20): * SSL connection using TLSv1.3 / TLS_AES_256_GCM_SHA384 * ALPN, server did not agree to a protocol * Server certificate: * subject: CN=*.moveworks.com * start date: Jan 1 07:40:57 2022 GMT * expire date: Jan 15 07:40:57 2022 GMT * subjectAltName: host "agent.moveworks.com" matched cert's "*.moveworks.com" * issuer: C=US; ST=California; O=Zscaler Inc.; OU=Zscaler Inc.; CN=Zscaler Intermediate Root CA (zscalertwo.net) (t) * SSL certificate verify ok. > GET /api/v1/web HTTP/1.1 > Host: agent.moveworks.com > User-Agent: curl/7.68.0 > Accept: */* > * Mark bundle as not supporting multiuse < HTTP/1.1 401 Unauthorized < Date: Tue, 04 Jan 2022 18:25:12 GMT < Content-Type: text/plain; charset=utf-8 < Content-Length: 59 < Connection: keep-alive < Set-Cookie: AWSALB=A/1VlVGMWS42786wbIPJjuulX0B4VWnBkO+lOo4QOhwlfQnUfwh57m86me/84Lavi909lCF2nKW9FezGKypi/vF+ns6a2fckfhAx8Z54Z1vwt6a8SHl8QCEDPXea; Expires=Tue, 11 Jan 2022 18:25:12 GMT; Path=/ < Set-Cookie: AWSALBCORS=A/1VlVGMWS42786wbIPJjuulX0B4VWnBkO+lOo4QOhwlfQnUfwh57m86me/84Lavi909lCF2nKW9FezGKypi/vF+ns6a2fckfhAx8Z54Z1vwt6a8SHl8QCEDPXea; Expires=Tue, 11 Jan 2022 18:25:12 GMT; Path=/; SameSite=None; Secure < X-Content-Type-Options: nosniff < auth: token is not valid: you must provide a jwt to verify * Connection #0 to host agent.moveworks.com left intact
🔴 Error when connecting to the LDAP service
You will see an error in the agent log referring to an LDAP error code. This means the cert used to connect to LDAP is signed by an unknown authority. This may mean that you are using a custom CA, and you need the full cert chain for this to work. Double check the cert .pem
file being used has the whole chain.
[ERROR] [2022-01-04T18:18:14Z] [moveworks/golang/utils/retry/retry.go:177] retry: DialURL: LDAP Result Code 200 "Network Error": x509: certificate signed by unknown authority (moveworks_user=ORGNAME)
Try pulling the cert on your own, then adding the cert to the ldap_config
:
ldap_config:
enabled: true
host: # host here
port: 636
service_password: # password
service_user: # user
use_ssl: true
path_to_cert: /home/moveworks/agent/certs/cert.pem # <----- this is the ldap cert
🔴 Now getting No such File or Directory: TLS config: TLS cert path: open /home/moveworks/agent/certs/cert.pem
?
No such File or Directory: TLS config: TLS cert path: open /home/moveworks/agent/certs/cert.pem
?The cert wasn’t showing up in the container, it was only on the local machine and not in the container. So adding a volume that maps the location on he local machine to the location in the container works.
Which is solved by adding the volume /home/moveworks/agent/certs
to start_agent.sh
docker run \
-d \
--read-only \
--security-opt=no-new-privileges \
--restart=unless-stopped \
--log-driver=json-file \
--log-opt max-size=10m \
--log-opt max-file=5 \
-v "$(pwd)/conf":/home/moveworks/agent/conf \
-v "$(pwd)/agent/certs":/home/moveworks/agent/certs \
-v "$(pwd)/logs":/var/log/moveworks \
moveworks_agent
🔴 x509 Certificate Error when connect to an On-Prem REST service (Confluence, Jira, BMC Remedy, etc)
Self-signed certificate which is untrusted
bond server reply: rest_pool: Get "https://{{instance_name}}/rest/api/latest/issue/{{ticket_id}}\": x509: certificate signed by unknown authority
Certificate on the on-prem jira instance is expired.
bond server reply: rest_pool: Post "https://{{instance_name}}/rest/servicedeskapi/request": x509: certificate has expired or is not yet valid
Any of the above x509: certificate errors could be due to the fact that your on-prem REST instance we are connecting to has invalid certs for HTTPS. You can allow the Agent to ignore self-signed certificate errors by adding the following config to the agent_config.yml
file under rest_configs
:
tls_skip_verify: true
If you see the following error:Unknown Certificate Authority
it could be because the certificate chain the customer exported is in a different format than .pem
if its a .p7b
file, convert it from .p7b
to .pem
instead of simply renaming the file:
openssl pkcs7 -print_certs -in certnew.p7b -out ca_chain.pem
Source: https://knowledge.digicert.com/solution/SO21448.html
If it is a .cer
file, convert it from .cer
to .pem
instead of simply renaming the file:
openssl x509 -inform der -in MTCAD-root.cer -out MTCAD-root.pem
Source: https://www.sslshopper.com/article-most-common-openssl-commands.html
🔴 Certificate Error: Certificate Relies on Legacy Common Name Field
x509: certificate relies on legacy Common Name field, use SANs or temporarily enable Common Name matching with GODEBUG=x509ignoreCN=0
This means that the certificate only includes a CN and there is no SAN. This type of cert has been depreciated.
enable tls_skip_verify: true
in the agent config
🔴 404 Not Found for config request
[ERROR] [2021-12-07T02:09:03Z] [moveworks/golang/utils/retry/retry.go:177] retry: received non-200 response status status [404 Not Found] for config request (moveworks_user=ORGNAME)
This error can occur if the config url
is incorrect.
🔴 Unable to read LDAP Response Packet: connection reset by peer
Solution: This indicates an issue with the LDAP Server denying the connection, which is likely due to a bad cert, or port issues. In this case, try opening port 389 and connecting via LDAP as a workaround. These rules out there are no network issues; from there, you can work with the relevant teams to pull an appropriate cert for LDAPS connection.
🔴 TLS handshake error: first record does not look like a TLS handshake
Solution: This typically happens when you are using an HTTP_PROXY
or HTTPS_PROXY
and the proxy either supports HTTPS and you are using HTTP or vice versa, where it supports HTTP only and what is configured is https.
🔴 After starting the agent, the container does not show up when we run podman ps or docker ps
Solution: Check agent_config.yml
file as there might be typos.
🔴 404 Not Found for authentication request
[ERROR] [2021-1-18T02:09:01Z] [moveworks/golang/utils/retry/retry.go:177] retry: Received status code 404 while trying to authenticate (moveworks_user=ORGNAME)
This error occurs when the auth_url
is incorrect in the config file on your server.
The screenshot shows that the auth_url
has a space and single quotes. That caused the error
SOLUTION:
Fix auth_url
, by replacing it with https://agent.moveworks.com/api/v1/auth
without quotes.
🔴 LDAP Result Code 200 Certificate is Valid Error
[ERROR] [2022-1-18T21:00:44Z] [moveworks/golang/utils/retry/retry.go:177] retry: DialURL: LDAP Result Code 200 "Network Error": x509: certificate is valid for NMH-DC01.ADNMHC.NMMC.com
[ERROR] [2022-1-18T21:00:55Z] [moveworks/golang/utils/retry/retry.go:177] retry: DialURL: LDAP Result Code 200 "Network Error": x509: certificate is valid for NMH-DC02.ADNMHC.NMMC.com
[ERROR] [2022-1-18T21:00:60Z] [moveworks/golang/utils/retry/retry.go:177] retry: DialURL: LDAP Result Code 200 "Network Error": x509: certificate is valid for NMH-DC03.ADNMHC.NMMC.com
This error occurs when we are pointing to a domain such as abc.xyz.com
which actually has 3 LDAP servers that can handle requests for anyone trying to access the domain abc.xyz.com
and the certificate is one that was made for the server instances. In this case, there were 3 servers associated with 1 domain.
SOLUTION:
In the agent config on the customers server change the host
to not use the domain which could be 3 different servers, to use 1 specific server
ldap_config:
host: abc.xyz.com
to:
ldap_config:
host: nmh-dc02.abc.xyz.com
🔴LDAP Result Code 200 Network Error “server misbehaving”
LDAP Result Code 200 "Network Error": dial tcp: lookup d1.d2.,w.com on 172.22.160.252:53: server misbehaving
Where [d1.d2.mw.com](http://d1.d2.mw.com)
is the LDAP host address configured in agent_config.yml
file.
SOLUTION:
-
Check
~/etc/resolv.conf
on the host machine as well as on the Docker container to ensure the DNS resolver configurations are the samecat ~/etc/resolv.conf
-
Next stop and kill all containers (in all states)
docker stop $(docker ps -a -q) docker rm $(docker ps -a -q)
-
Restart docker service and spin up a new agent container
sudo systemctl restart docker ./start_agent.sh docker update --restart unless-stopped $(docker ps -q)
ALTERNATIVE SOLUTION:
Enable host networking mode in the start agent script as indicated earlier.
🔴 Podman User Namespaces not enabled error
Podman run error in non-root mode: "user namespaces are not enabled in /proc/sys/user/max_user_namespaces"
SOLUTION:
Documented here: https://github.com/containers/podman/issues/7704 CentOS 7 requires running
echo “user.max_user_namespaces=10000” > /etc/sysctl.d/42-rootless.conf and sysctl --system as root
🔴 LDAP Result Code 200 Cert is Expired
[ERROR] [2022-01-20T02:56:06Z] [moveworks/golang/utils/retry/retry.go:177] retry: DialURL: LDAP Result Code 200 "Network Error": x509: certificate has expired or is not yet valid: current time 2022-01-20T02:56:06Z is after 2022-01-04T23:54:14Z (moveworks_user=ORGNAME)
This error can mean two different things so both need to be checked:
- CERT ON THE VM IS EXPIRED:
Use the following command to verify if the cert is expired: openssl x509 --enddate --noout cert.pem
If the cert is a cert chain, you may need to split the cert into 2 or 3 .pem
files and run the command for each individual part of the cert. This will help you narrow down the cert that is expired.
- YOUR LDAP HOST CERT IS EXPIRED:
Get your cert and check the expiration date:
👇 Replace domain.host.com
with the host domain configured in agent_config.yml
# reachout to the LDAP host and download the certificate
openssl s_client -connect domain.host.com:636 -showcerts </dev/null 2>/dev/null | openssl x509 -outform PEM > cert.pem
# use keytool to read the contents of the certificate
keytool -printcert -v -file cert.pem
🔴 LDAP connection reset by peer
This implies the port is closed, if you are connecting to port 389, then you will likely need to use port 636 with a cert and configure LDAPS.
🔴 Common LDAP Error Codes
Some common LDAP errors to help with troubleshooting when running the ldapwhoami
or ldapsearch
command.
Error Code | Error | Description |
---|---|---|
525 | User not found | Returned when an invalid username is supplied. |
52e | Invalid credentials | Returned when a valid username is supplied but an invalid password/credential is supplied. If this error is received, it will prevent most other errors from being displayed. |
530 | Not permitted to logon at this time | Returned when a valid username and password/credential are supplied during times when login is restricted. |
531 | Not permitted to logon from this workstation | Returned when a valid username and password/credential are supplied, but the user is restricted from using the workstation where the login was attempted. |
532 | Password expired | Returned when a valid username is supplied, and the supplied password is valid but expired. |
533 | Account disabled | Returned when a valid username and password/credential are supplied but the account has been disabled. |
701 | Account expired | Returned when a valid username and password/credential are supplied but the account has expired. |
773 | User must reset password | Returned when a valid username and password/credential are supplied, but the user must change their password immediately (before logging in for the first time, or after the password was reset by an administrator). |
775 | Account locked out | Returned when a valid username is supplied, but the account is locked out. Note that this error will be returned regardless of whether or not the password is invalid. |
🔴 Permission Denied for config file
When starting the config script and you run into
open /home/moveworks/agent/conf/agent_config.yml: permission denied
This is what the directory permissions for the agent should look like after running the setup guide. Make sure the group ID is 17540
.
🔴 Failing to connect to agent.moveworks.com TLS timing out on client hello
This is possibly related to MTS (maximum transmission unit) being too small.
https://linuxhint.com/how-to-change-mtu-size-in-linux/
ifconfig | grep mtu
ifconfig <Interface_name> mtu <mtu_size> up
🔴 No Such File or Directory for start_agent.sh
start_agent.sh
If you see the following error:
open/ home/moveworks/agent/scripts/start_agent.sh: no such file or directory
This might mean that the command to run the configuration script may be incorrect (check for new line characters).
🔴 [401 Forbidden] when trying to pull Agent image onto the machine
When using wget
to pull the agent image from the s3 URL, you may get this error. This means the VM may not have outbound network connectivity enabled.
🔴 LDAP Result Code 200 “Network Error”: dial tcp:
The error will reference the name of the server that is erroring out. Verify that the FQDN of the LDAP server is correct as specified in the agent_config.yml file. It’s possible the server was renamed or decommissioned.
🔴 LDAP Result Code 8 “Strong Auth Required”
Typically, this means you are attempting to use port 389 with LDAP, but the server expects LDAPS with port 636 and a certificate.
You will need to reconfigure the agent with a cert using LDAPS.
See Slack Overflow post for more info: https://stackoverflow.com/questions/24385929/stronger-authentication-required
🔴 moveworks/agent/certs/agent_key.pem
permission denied
moveworks/agent/certs/agent_key.pem
permission deniedThe key doesn’t exist in this case because it couldn’t be written there. We need to update the permissions of the directory. Try running sudo chown -R 17540:17540 .
in the root of the agent directory.
🔴 ITSM_DISCONNECTED error message
If your configuration is leveraging an on-prem agent for ticketing (ServiceNow, Jira, etc.) and you are seeing consistent timeouts within the bot when opening/closing tickets, then please check the following:
- There are no WAFs that are bottlenecking the connection from the agent to your ITSM system
- Typically, lower environment ITSM systems have lower CPU/storage due to low traffic. Please verify that the timeout is not caused by a throttling of the ITSM system. You can increase the CPU/storage to resolve this.
Handy Docker Commands when Debugging
- Tail docker logs and ensure no startup errors
- Show only new logs:
docker logs -f containerName
- Show only new logs:
- Tail INFO logs outputted by agent
- navigate to logs directory
- tail -f *.INFO.log
- Access the command line inside the docker container:
docker exec -it $(docker ps --format '{{.Names}}') /bin/bash
- Kill all running docker containers:
docker kill $(docker ps -q)
- Remove all stopped docker containers:
docker rm $(docker ps -aq)
- Remove all docker images (note: requires you to reload the agent image):
docker rmi $(docker images -q)
- Warning: This removes all docker images, don’t do this if the customer has other docker images loaded on this machine.
- Instead do
docker rmi {docker image id here}
- Get the image Id from
docker images
- Verify the agent build image is the date expected
- Get the image Id from
Other Tips
- For terminal commands on the host machine, you may need to run
sudo bin/bash
before performing any commands. - If scripts don't work on the host machine, try
cat <script>.sh
and run the actualdocker
commands. - You can use telnet to verify network connections/whitelisting e.g: telnet kafka-elb.moveworks.io 19092 or telnet <LDAP_DOMAIN> 636
Updated about 2 months ago