Moveworks Agent Troubleshooting Guide

Common Procedures

Configuring LDAP over SSL aka LDAPS (636)

Prerequisites

First, understand if you need a certificate file to connect to the LDAP server over SSL. If so, you will need to supply a base 64 encoded ASCII certificate file, which is typically a .pem server cert file. This certificate is used to verify that the LDAP server we are communicating to is who they say they are. The content of the cert should be a “cert chain” should look like something below, and should live in the ./certs directory. Wildcard certs are not accepted per Microsoft: https://docs.microsoft.com/en-US/troubleshoot/windows-server/identity/enable-ldap-over-ssl-3rd-certification-authority

The file should look like below and be a .pem server cert file.

-----BEGIN CERTIFICATE-----
content of your domain certificate
-----END CERTIFICATE-----
-----BEGIN CERTIFICATE-----
content of any intermediate CA certificate
-----END CERTIFICATE-----

If the exported certificate is a .p7b file, you will need to use the following command to convert it from .p7b to .pem instead of simply renaming the file:

openssl pkcs7 -print_certs -in certnew.p7b -out ca_chain.pem

More details here: https://knowledge.digicert.com/solution/SO21448.html

Verifying the cert is valid:

Run the following command to verify that the certificate is signed by a known certificate authority. If the certificate is only signed by the client, that will not work.

openssl x509 -in <certificate_name> -text -noout

The Active Directory fully qualified domain name of the domain controller appears in one of the following locations:

  • The common name (CN) in the Subject field
  • The Subject Alternative Name (SAN) extension in the DNS entry

Pulling the Cert on Your Own

You may be able to pull the cert by connecting the the LDAPs server:

openssl s_client -connect <domain.host.com>:<port> -showcerts </dev/null 2>/dev/null | openssl x509 -outform PEM > cert.pem

Configuration Changes/ Updates:

RECOMMENDED: Run the setup script again, and choose the SSL option to supply the LDAP certificate path.

You can also do the following changes manually (after shutting down the agent).

I. Modify /conf/agent_config.yml

  1. Must use LDAP hostname, not IP address.
  2. Use port: 636
  3. Add use_ssl: true
  4. If using a cert that is not well-known, you must set the container path of the certificate. The directory path is static as it refers to the path within the docker image. The certificate file name will change with what you’ve named it locally.
    • path_to_cert: /home/moveworks/agent/certs/mycert.pem

II. Modify ./start_agent.sh

  • Add this additional argument to the script, the first directory should be the absolute directory path on the local VM.
    • v "/home/svcmoveworks/certs:/home/moveworks/agent/certs" \

Example agent_config.yml with LDAPS enabled

Example config when using a cert:

ldap_config:
  enabled: true
  host: ldaps.customer.com
  port: 636
  service_password:
    encrypted_value: <password>
  service_user: <username>
  use_ssl: true
  path_to_cert: /home/moveworks/agent/certs/mycert.pem

Example config when CA is a well-known: NOTE: port # is set to 636 instead or 389, use_ssl is set to true, but no cert is supplied in the config

ldap_config:
  enabled: true
  host: ldaps.customer.com
  port: 636
  service_password:
    encrypted_value: <password>
  service_user: <username>
  use_ssl: true

Example config when switching to LDAP

ldap_config:
  enabled: true
  host: ldap.customer.com
  port: 389
  service_password:
		encrypted_value: <password>
  service_user: <username>

TLS Skip Verify

Use for self signed certificates

ldap_config:
  enabled: true
  host: ldaps.customer.com
  port: 636
  service_password:
		encrypted_value: <password>
  service_user: <username>
  use_ssl: true
  path_to_cert: /home/moveworks/agent/certs/mycert.pem
	tls_skip_verify: true

Validating LDAP Service Account

Verifying Access via AD Explorer

To test the access level of the LDAP service account, we can use AD Explorer (downloaded from here) to validate if the service account has access to the LDAP server.

  1. First, grab the host, service account username, and password.
  2. Open AD Explorer, and connect to the LDAP host. Note that you may need to connect to VPN in order to connect successfully.
  3. Once logged in successfully, you can navigate to groups and validate as needed.
  4. Select one of the distribution groups. On the right side, you can see the full details.

Note: we've run into issues where a service password with a special character might cause problems. If all else fails, you can try resetting the password to something without special characters.

Verifying Access via Bash

  1. To find the LDAP host name or IP address, you can run the command:

    nslookup -q=SRV _ldap._tcp.<DOMAIN_NAME>.com
    
  2. For Active Directory, LDAP Service User is formatted as netbios domain name with a backlash and then the service account name (e.g. WHS\svc_moveworks)

  3. To test and validate the service account credentials, you can run these commands:

    1. Operation to request user's authzid

      ldapwhoami -x -H <LDAP_SERVER_HOSTNAME_OR_IP> -w <SVC_ACCOUNT_PASSWORD> -D "<NETBIOS_DOMAIN>\<SVC_ACCOUNT_NAME>"
      

      e.g. - u:WHS/svc_moveworks

    2. Operation to search for object

      ldapsearch -x -H ldaps://<LDAP_SERVER_HOSTNAME_OR_IP>:636 -W -D "<NETBIOS_DOMAIN>\<SVC_ACCOUNT_NAME>" -b "<BASE_DN>"
      

      Note: For RedHat OS, you will need to install openldap to run the above commands

      1. To install run:

        yum install openldap*
        
        
    3. Another LDAPs search example (returns users):

      ldapsearch -x -LLL -H ldaps://ldap.custom.com:636 -D "username_here" -w "password_here" -b "dc=customer,dc=com" -s sub "(objectClass=user)" givenName
      
  4. After completing the configuration process, an agent_config.yml file will be generated in ./config. Please verify the contents of the ./config/agent_config.yml are as expected.

  5. Run the following command to get the NAT Gateway IP that needs to be whitelisted on our end. When doing this, it is best to also confirm with the customer.

    1. If VM is internet accessible, you can run the following to get the IP address:

      wget -qO- http://ipecho.net/plain ; echo
      
  6. Whitelist IP

    1. Enter the IP in this file infra/terraform/agent/prod/variables.t

Confirming NET_BIOS Domain for AD Service Account

  1. Sometimes a NETBIOS_DOMAIN must appear in the username of the service account in the format of NETBIOS_DOMAIN\username

    1. e.g: MOVEWORKS\svc_ad_moveworks
  2. To check open ‘Active Directory Users and Computers’

  3. Click on the ‘Find Objects in Active Directory Domain Services’ icon on the toolbar. (Looks like folder with magnifying glass)

  4. Search for service account


  5. Double-click on the account

  6. Select ‘Account’ in pop-up and the if there is something in the ‘User logon name’, that should be used as the netbios\\domain

  7. Update Service Account Username in config to netbios_domain\\username, for the above it would be MOVEWORKS\svc_ad_moveworks

Known Issues and their Fixes

🔴 config: Moveworks access secret is invalid

This is exactly what it sounds like, your secret is the problem

Sometimes, this means that the account is locked out. Look for the data error code in the log line.

52e: Invalid credentials—password expiry could be one of the main reasons. Please reset the service account password on call, lock and unlock the account, and try again after updating the new credentials in the agent_config.yml file.

775: account is locked out

🔴 Permission Denied when using start agent script

open /home/moveworks/agent/conf/agent_config.yml: permission denied

This happens when you try to start the agent configuration script with a user whose group ID does not match what is expected in our script. Run through the steps again and ensure you use the correct group ID listed in our setup guide - we cannot use any other ID since the agent is looking for a specific one.

🔴 Permission Denied when accessing certificate file

Often when installing updated certificate files, we SCP the new cert into the desired machine’s cert folder. However, this new cert does not have the same file permissions as the previous cert our agent was leveraging. In order to quickly copy file permissions we can use the chmod --reference=reference_file file command.

# navigate to your certs directory
cd /home/moveworks/certs

# view file permissions for each file in the directory
ls -ltr

# locate the file that you want to copy file permissions from (likely the old certificate)
# apply the file permissions from your old cert to your new 
sudo chmod --reference=oldCERT.pem newCERT.pem
ls -l newCERT.pem

🔴 Agent cannot communicate with Moveworks

bond server reply: rest_pool: Get \"https://{{customer_itsm}}/rest/servicedeskapi/servicedesk/1\": Proxy Authentication Required",

This happens when you have a Proxy for a system (AD for e.g.), but your internal ITSM system or Knowledge System needs the Agent to bypass the proxy. In such cases, please add the following line to the system's config line that needs to bypass the proxy (same level as enabled: true):

do_not_use_rest_proxy: true

🔴 Agent cannot communicate with customer’s system (e.g. Confluence)

bond server reply: rest_pool: Get \"https://<base_url>/rest/api/content?limit=25&spaceKey=AHLP\": Forbidden

This happens when you have set up a proxy for outbound connections but don’t require a proxy for their internal system. In such cases, please add the following line to the system's config line that needs to bypass the proxy (same level as enabled: true):

do_not_use_rest_proxy: true

🔴 Failure when authenticating to Moveworks

retry: Received status code 401 while trying to authenticate (moveworks_user=xxxx)
  1. Invalid Access Secret (token expires if not used) - if the token is not used within 30 days of generating, it will expire. Please generate a new token in the MW Setup Agent Page.

To validate if a token works, you can use the following command:

curl --header "Content-Type: application/json" --request POST 
--data '{"access_secret":"{ADD YOUR TOKEN HERE}", "access_key": "mykey"}' https://agent.moveworks.com/api/v1/auth

Config fail:

curl --header "Content-Type: application/json" --header "Authorization: " --request POST https://agent.moveworks.com/api/v1/config

Firewall / Whitelisting tests:

  1. Run ping google.com to see if outgoing traffic is allowed
  2. Try to curl https://agent.moveworks.com/api/v1/web to see if our agent service can be hit

🔴 Agent fails to start with “Operation Not Permitted” Error

Issue: This error is caused by lack of executable permissions to the user that you are trying to run this containers as. Work with your sys admin to get this resolved on a VM level. If not, simply removing this line from start_agent.sh will allow the container to start.

  • --security-opt=no-new-privileges

🔴 I/O Timeout in Agent Logs

[ERROR] [2020-09-11T00:43:20Z] [moveworks/golang/utils/retry/retry.go:177] retry: Post "https://agent.moveworks.com/***********": dial tcp: lookup agent.moveworks.com on 10.255.255.6:53: read udp 172.17.0.2:47416->10.255.255.6:53: i/o timeout (moveworks_user=equinix)

Issue: This error shows that the agent residing inside docker is unable to make an outbound connection to our agent system. This can be caused by several reasons such as firewall on your side or whitelisting on our side or potential a docker networking issue.

Try running the agent in host network mode (see below Docker Networking issue - ‘No Route to Host’ Error)


🔴 CA Bundle Certificate missing

Error: retry: Post " *******": context deadline exceeded (Client.Timeout exceeded while awaiting headers) (moveworks_user=chubb)

Issue: CA Bundle certs or agent certs are not being recognized

Potential Fix:

  1. Find where the ca bundle is hosted on the machine by running openssl version -d Sample output: OPENSSLDIR: "/etc/pki/tls"

  2. Go to the path and find the ca bundle cert Sample path: /etc/pki/tls/certs/ca-bundle.crt

  3. Mount the volume in the start_agent.sh

    1. Sample parameter: -v /etc/pki/:/etc/pki/
  4. Map the path in conf/agent_config.yml Sample config:

    moveworks_config:
      path_to_cert: /etc/pki/tls/certs/ca-bundle.crt
    

🔴 Docker Networking issue - ‘No Route to Host’ Error

  1. Restart the docker daemon on the machine by running sudo systemctl restart docker and then bring up a new agent

If you run into a no route to host error, try using telnet on the server to see if the server can even access the LDAP Domain Controller:

If you see something like below, then there is a networking issue on the customer’s side

If not, then you can try the following steps as a workaround to use the host machine’s networking configuration.

  1. Edit the start_agent.sh and add the --net=host flag to enable host network mode
    1. More information on that setting here: https://stackoverflow.com/questions/43316376/what-does-net-host-option-in-docker-command-really-do
  2. Restart the agent
  3. Note: The logs under logs folder will no longer have the image ID in the title and will instead use the machine name.

🔴 Docker container is shut down upon server restart

  • Run sudo systemctl enable docker after installing the agent to ensure docker is restarted whenever the server is.

Docker complains about the log size constraints

When running the config script you get the following error:

docker: Error response from daemon: unknown log opt ‘max-size’/‘max-file’/etc for journald log driver

Solution: Edit the config script in vim and add the following line --log-driver json-file


🔴 Docker ps says image is in ‘Restarting’ state, logs might not be generating

This is likely a permission issue on either the config file, or the certs file. When copying or creating files, the strictest permissions are applied. Run chmod 777 on the config file and the certificate file if you have it. Restart the Agent. if it is still an issue, run it on all of the agent related files.


🔴 Certificate Issues

If root CA certs don’t work, try cat-ing them to see if they’re in base64. It should look something like:

-----BEGIN CERTIFICATE-----
MIIFbTCCA1WgAwIBAgIJAN338vEmMtLsMA0GCSqGSIb3DQEBCwUAME0xCzAJBgNV
BAYTAlVLMRMwEQYDVQQIDApUZXN0LVN0YXRlMRUwEwYDVQQKDAxHb2xhbmcgVGVz
dHMxEjAQBgNVBAMMCXRlc3QtZmlsZTAeFw0xNzAyMDEyMzUyMDhaFw0yNzAxMzAy
MzUyMDhaME0xCzAJBgNVBAYTAlVLMRMwEQYDVQQIDApUZXN0LVN0YXRlMRUwEwYD
VQQKDAxHb2xhbmcgVGVzdHMxEjAQBgNVBAMMCXRlc3QtZmlsZTCCAiIwDQYJKoZI

GKj0lGpnLfGqwhs2/s3jpY7+pcvVQxEpvVTId5byDxu1ujP4HjO/VTQ2P72rE8Ft
r05pE3PdHn9JrCl4iWdVlgtiI9BoPtQyDfa/OEFaScE8KYR8LxaAgdgp3zYncWls
BpwQ6Y/A2wIkhlD9eEp5Ib2hz7isXOs9UwjdriKqrBXqcIAE5M+YIk3+KAQKxAtd
4YsK3CSJ010uphr12YKqlScj4vuKFjuOtd5RyyMIxUG3lrrhAu2AzCeKCLdVgA8+
75FrYMApUdvcjp4uzbBoED4XRQlx9kdFHVbYgmE/+yddBYJM8u4YlgAL0hW2/D8p
z9JWIfxVmjJnBnXaKGBuiUyZ864A3PJndP6EMMo7TzS2CDnfCYuJjvI0KvDjFNmc
rQA04+qfMSEz3nmKhbbZu4eYLzlADhfH8tT4GMtXf71WLA5AUHGf2Y4+HIHTsmHG
vQ==
-----END CERTIFICATE-----

🔴 x509: certificate signed by unknown authority

There are a few different places where this can occur:

  • Error when connecting to the Moveworks agent API
    If you see this error in the agent log when attempting to connect to agent.moveworks.com it is likely a problem of not having a local cert signed by a Known Authority
    Note: This error is not related to the certificate provided to connect to the customer’s on-premise servers (.pem file), it’s the system cert used to connect to our API

    [ERROR] [2021-12-23T14:40:10Z] [moveworks/golang/utils/retry/retry.go:177] retry: Post "https://agent.moveworks.com/***********": x509: certificate signed by unknown authority (moveworks_user={org})
    

    You can verify this using this curl command. (if you run this command on your local machine you should see the expected output)

    curl -vs https://agent.moveworks.com/api/v1/web
    

    The output will show you the handshake process step-by-step:

    * Trying 54.149.4.60:443...
    * TCP_NODELAY set
    * Connected to agent.moveworks.com (54.149.4.60) port 443 (#0)
    * ALPN, offering h2
    * ALPN, offering http/1.1
    * successfully set certificate verify locations:
    *   CAfile: /etc/ssl/certs/ca-certificates.crt
      CApath: /etc/ssl/certs
    * TLSv1.3 (OUT), TLS handshake, Client hello (1):
    * TLSv1.3 (IN), TLS handshake, Server hello (2):
    * TLSv1.3 (OUT), TLS change cipher, Change cipher spec (1):
    * TLSv1.3 (OUT), TLS handshake, Client hello (1):
    * TLSv1.3 (IN), TLS handshake, Server hello (2):
    * TLSv1.3 (IN), TLS handshake, Encrypted Extensions (8):
    * TLSv1.3 (IN), TLS handshake, Certificate (11):
    * TLSv1.3 (OUT), TLS alert, unknown CA (560):
    * SSL certificate problem: unable to get local issuer certificate
    * Closing connection 0
    

    Things to note:

    • CApath: /etc/ssl/certs Shows you where the certs are stored locally.
    • CAfile: /etc/ssl/certs/ca-certificates.crt Shows you the specific cert being used in this call.
    • TLSv1.3 (OUT), TLS alert, unknown CA (560): Means the cert is signed by an unknown authority
    • SSL certificate problem: unable to get local issuer certificate We were unable to find an acceptable cert
      Solution: Ask the customer’s admins to install a cert from a known trusted certificate authority and reference it
      Last resort: Copy the combine cert file (sometimes in /etc/ssl/certificates/ca-certificates.crt ) to the agent’s /cert folder, mount it in ./start_agent.sh and the refer to it under moveworks_config
    #!/bin/bash
    
    docker run \
        -d \
        --read-only \
        --security-opt=no-new-privileges \
        --restart=unless-stopped \
        --log-driver=json-file \
        --log-opt max-size=10m \
        --log-opt max-file=5 \
        -v "$(pwd)/conf":/home/moveworks/agent/conf \
        -v "$(pwd)/logs":/var/log/moveworks \
        -v "$(pwd)/certs":/home/moveworks/agent/certs \
        moveworks_agent
    
    ldap_config:
      enabled: true
      host: #hostname here
      port: 636
      service_password: #password here
      service_user: #userame here
      use_ssl: true
      path_to_cert: /home/moveworks/agent/certs/ca-certificates.crt # This is the cert used to connect to LDAP
    moveworks_config:
      access_key: #orgname here
      access_secret: #org secret here
      auth_url: https://agent.moveworks.com/api/v1/auth
      config_url: https://agent.moveworks.com/api/v1/config
      proxy_url_enc: #proxy secret here
      path_to_cert: /home/moveworks/agent/certs/ca-certificates.crt # < -------- This is the cert used to connect to moveworks API
    

    You can also test specific certs using curl to see if they work:

    curl -vs --cacert {cert_path} https://agent.moveworks.com/api/v1/web
    

    A successful connection looks like this:

    *   Trying 54.149.4.60:443...
    * TCP_NODELAY set
    * Connected to agent.moveworks.com (54.149.4.60) port 443 (#0)
    * ALPN, offering h2
    * ALPN, offering http/1.1
    * successfully set certificate verify locations:
    *   CAfile: /etc/ssl/certs/ca-certificates.crt
      CApath: /etc/ssl/certs
    * TLSv1.3 (OUT), TLS handshake, Client hello (1):
    * TLSv1.3 (IN), TLS handshake, Server hello (2):
    * TLSv1.3 (OUT), TLS change cipher, Change cipher spec (1):
    * TLSv1.3 (OUT), TLS handshake, Client hello (1):
    * TLSv1.3 (IN), TLS handshake, Server hello (2):
    * TLSv1.3 (IN), TLS handshake, Encrypted Extensions (8):
    * TLSv1.3 (IN), TLS handshake, Certificate (11):
    * TLSv1.3 (IN), TLS handshake, CERT verify (15):
    * TLSv1.3 (IN), TLS handshake, Finished (20):
    * TLSv1.3 (OUT), TLS handshake, Finished (20):
    * SSL connection using TLSv1.3 / TLS_AES_256_GCM_SHA384
    * ALPN, server did not agree to a protocol
    * Server certificate:
    *  subject: CN=*.moveworks.com
    *  start date: Jan  1 07:40:57 2022 GMT
    *  expire date: Jan 15 07:40:57 2022 GMT
    *  subjectAltName: host "agent.moveworks.com" matched cert's "*.moveworks.com"
    *  issuer: C=US; ST=California; O=Zscaler Inc.; OU=Zscaler Inc.; CN=Zscaler Intermediate Root CA (zscalertwo.net) (t) 
    *  SSL certificate verify ok.
    > GET /api/v1/web HTTP/1.1
    > Host: agent.moveworks.com
    > User-Agent: curl/7.68.0
    > Accept: */*
    > 
    * Mark bundle as not supporting multiuse
    < HTTP/1.1 401 Unauthorized
    < Date: Tue, 04 Jan 2022 18:25:12 GMT
    < Content-Type: text/plain; charset=utf-8
    < Content-Length: 59
    < Connection: keep-alive
    < Set-Cookie: AWSALB=A/1VlVGMWS42786wbIPJjuulX0B4VWnBkO+lOo4QOhwlfQnUfwh57m86me/84Lavi909lCF2nKW9FezGKypi/vF+ns6a2fckfhAx8Z54Z1vwt6a8SHl8QCEDPXea; Expires=Tue, 11 Jan 2022 18:25:12 GMT; Path=/
    < Set-Cookie: AWSALBCORS=A/1VlVGMWS42786wbIPJjuulX0B4VWnBkO+lOo4QOhwlfQnUfwh57m86me/84Lavi909lCF2nKW9FezGKypi/vF+ns6a2fckfhAx8Z54Z1vwt6a8SHl8QCEDPXea; Expires=Tue, 11 Jan 2022 18:25:12 GMT; Path=/; SameSite=None; Secure
    < X-Content-Type-Options: nosniff
    < 
    auth: token is not valid: you must provide a jwt to verify
    * Connection #0 to host agent.moveworks.com left intact
    

🔴 Error when connecting to the LDAP service

You will see an error in the agent log referring to an LDAP error code. This means the cert used to connect to LDAP is signed by an unknown authority. This may mean that you are using a custom CA, and you need the full cert chain for this to work. Double check the cert .pem file being used has the whole chain.

[ERROR] [2022-01-04T18:18:14Z] [moveworks/golang/utils/retry/retry.go:177] retry: DialURL: LDAP Result Code 200 "Network Error": x509: certificate signed by unknown authority (moveworks_user=ORGNAME)

Try pulling the cert on your own, then adding the cert to the ldap_config:

ldap_config:
  enabled: true
  host: # host here
  port: 636
  service_password: # password
  service_user: # user
  use_ssl: true
  path_to_cert: /home/moveworks/agent/certs/cert.pem # <----- this is the ldap cert

🔴 Now getting No such File or Directory: TLS config: TLS cert path: open /home/moveworks/agent/certs/cert.pem?

The cert wasn’t showing up in the container, it was only on the local machine and not in the container. So adding a volume that maps the location on he local machine to the location in the container works.

Which is solved by adding the volume /home/moveworks/agent/certs to start_agent.sh

docker run \
  -d \
  --read-only \
  --security-opt=no-new-privileges \
  --restart=unless-stopped \
  --log-driver=json-file \
  --log-opt max-size=10m \
  --log-opt max-file=5 \
  -v "$(pwd)/conf":/home/moveworks/agent/conf \
  -v "$(pwd)/agent/certs":/home/moveworks/agent/certs \
  -v "$(pwd)/logs":/var/log/moveworks \
  moveworks_agent

🔴 x509 Certificate Error when connect to an On-Prem REST service (Confluence, Jira, BMC Remedy, etc)

Self-signed certificate which is untrusted

bond server reply: rest_pool: Get "https://{{instance_name}}/rest/api/latest/issue/{{ticket_id}}\": x509: certificate signed by unknown authority

Certificate on the on-prem jira instance is expired.

bond server reply: rest_pool: Post "https://{{instance_name}}/rest/servicedeskapi/request": x509: certificate has expired or is not yet valid

Any of the above x509: certificate errors could be due to the fact that your on-prem REST instance we are connecting to has invalid certs for HTTPS. You can allow the Agent to ignore self-signed certificate errors by adding the following config to the agent_config.yml file under rest_configs:

tls_skip_verify: true

If you see the following error:Unknown Certificate Authority

it could be because the certificate chain the customer exported is in a different format than .pem

if its a .p7b file, convert it from .p7b to .pem instead of simply renaming the file:

openssl pkcs7 -print_certs -in certnew.p7b -out ca_chain.pem

Source: https://knowledge.digicert.com/solution/SO21448.html

If it is a .cer file, convert it from .cer to .pem instead of simply renaming the file:

openssl x509 -inform der -in MTCAD-root.cer -out MTCAD-root.pem

Source: https://www.sslshopper.com/article-most-common-openssl-commands.html


🔴 Certificate Error: Certificate Relies on Legacy Common Name Field

x509: certificate relies on legacy Common Name field, use SANs or temporarily enable Common Name matching with GODEBUG=x509ignoreCN=0

This means that the certificate only includes a CN and there is no SAN. This type of cert has been depreciated.

enable tls_skip_verify: true in the agent config


🔴 404 Not Found for config request

[ERROR] [2021-12-07T02:09:03Z] [moveworks/golang/utils/retry/retry.go:177] retry: received non-200 response status status [404 Not Found] for config request (moveworks_user=ORGNAME)

This error can occur if the config url is incorrect.


🔴 Unable to read LDAP Response Packet: connection reset by peer

Solution: This indicates an issue with the LDAP Server denying the connection, which is likely due to a bad cert, or port issues. In this case, try opening port 389 and connecting via LDAP as a workaround. These rules out there are no network issues; from there, you can work with the relevant teams to pull an appropriate cert for LDAPS connection.


🔴 TLS handshake error: first record does not look like a TLS handshake

Solution: This typically happens when you are using an HTTP_PROXY or HTTPS_PROXY and the proxy either supports HTTPS and you are using HTTP or vice versa, where it supports HTTP only and what is configured is https.


🔴 After starting the agent, the container does not show up when we run podman ps or docker ps

Solution: Check agent_config.yml file as there might be typos.


🔴 404 Not Found for authentication request

[ERROR] [2021-1-18T02:09:01Z] [moveworks/golang/utils/retry/retry.go:177] retry: Received status code 404 while trying to authenticate (moveworks_user=ORGNAME)

This error occurs when the auth_url is incorrect in the config file on your server.

The screenshot shows that the auth_url has a space and single quotes. That caused the error

SOLUTION:

Fix auth_url, by replacing it with https://agent.moveworks.com/api/v1/auth without quotes.


🔴 LDAP Result Code 200 Certificate is Valid Error

[ERROR] [2022-1-18T21:00:44Z] [moveworks/golang/utils/retry/retry.go:177] retry: DialURL: LDAP Result Code 200 "Network Error": x509: certificate is valid for NMH-DC01.ADNMHC.NMMC.com
[ERROR] [2022-1-18T21:00:55Z] [moveworks/golang/utils/retry/retry.go:177] retry: DialURL: LDAP Result Code 200 "Network Error": x509: certificate is valid for NMH-DC02.ADNMHC.NMMC.com
[ERROR] [2022-1-18T21:00:60Z] [moveworks/golang/utils/retry/retry.go:177] retry: DialURL: LDAP Result Code 200 "Network Error": x509: certificate is valid for NMH-DC03.ADNMHC.NMMC.com

This error occurs when we are pointing to a domain such as abc.xyz.com which actually has 3 LDAP servers that can handle requests for anyone trying to access the domain abc.xyz.com and the certificate is one that was made for the server instances. In this case, there were 3 servers associated with 1 domain.

SOLUTION:

In the agent config on the customers server change the host to not use the domain which could be 3 different servers, to use 1 specific server

ldap_config:
	host: abc.xyz.com

to:

ldap_config:
	host: nmh-dc02.abc.xyz.com

🔴LDAP Result Code 200 Network Error “server misbehaving”

LDAP Result Code 200 "Network Error": dial tcp: lookup d1.d2.,w.com on 172.22.160.252:53: server misbehaving

Where [d1.d2.mw.com](http://d1.d2.mw.com) is the LDAP host address configured in agent_config.yml file.

SOLUTION:

  1. Check ~/etc/resolv.conf on the host machine as well as on the Docker container to ensure the DNS resolver configurations are the same

    cat ~/etc/resolv.conf
    
  2. Next stop and kill all containers (in all states)

    docker stop $(docker ps -a -q)
    docker rm $(docker ps -a -q)
    
  3. Restart docker service and spin up a new agent container

    sudo systemctl restart docker
    ./start_agent.sh
    docker update --restart unless-stopped $(docker ps -q)
    

ALTERNATIVE SOLUTION:
Enable host networking mode in the start agent script as indicated earlier.


🔴 Podman User Namespaces not enabled error

Podman run error in non-root mode: "user namespaces are not enabled in /proc/sys/user/max_user_namespaces"

SOLUTION:

Documented here: https://github.com/containers/podman/issues/7704 CentOS 7 requires running

echo “user.max_user_namespaces=10000” > /etc/sysctl.d/42-rootless.conf and sysctl --system as root

🔴 LDAP Result Code 200 Cert is Expired

[ERROR] [2022-01-20T02:56:06Z] [moveworks/golang/utils/retry/retry.go:177] retry: DialURL: LDAP Result Code 200 "Network Error": x509: certificate has expired or is not yet valid: current time 2022-01-20T02:56:06Z is after 2022-01-04T23:54:14Z (moveworks_user=ORGNAME)

This error can mean two different things so both need to be checked:

  1. CERT ON THE VM IS EXPIRED:

Use the following command to verify if the cert is expired: openssl x509 --enddate --noout cert.pem
If the cert is a cert chain, you may need to split the cert into 2 or 3 .pem files and run the command for each individual part of the cert. This will help you narrow down the cert that is expired.

  1. YOUR LDAP HOST CERT IS EXPIRED:

Get your cert and check the expiration date:
👇 Replace domain.host.com with the host domain configured in agent_config.yml

# reachout to the LDAP host and download the certificate
openssl s_client -connect domain.host.com:636 -showcerts </dev/null 2>/dev/null | openssl x509 -outform PEM > cert.pem

# use keytool to read the contents of the certificate
keytool -printcert -v -file cert.pem

🔴 LDAP connection reset by peer

This implies the port is closed, if you are connecting to port 389, then you will likely need to use port 636 with a cert and configure LDAPS.


🔴 Common LDAP Error Codes

Some common LDAP errors to help with troubleshooting when running the ldapwhoami or ldapsearchcommand.

Error CodeErrorDescription
525User not foundReturned when an invalid username is supplied.
52eInvalid credentialsReturned when a valid username is supplied but an invalid password/credential is supplied. If this error is received, it will prevent most other errors from being displayed.
530Not permitted to logon at this timeReturned when a valid username and password/credential are supplied during times when login is restricted.
531Not permitted to logon from this workstationReturned when a valid username and password/credential are supplied, but the user is restricted from using the workstation where the login was attempted.
532Password expiredReturned when a valid username is supplied, and the supplied password is valid but expired.
533Account disabledReturned when a valid username and password/credential are supplied but the account has been disabled.
701Account expiredReturned when a valid username and password/credential are supplied but the account has expired.
773User must reset passwordReturned when a valid username and password/credential are supplied, but the user must change their password immediately (before logging in for the first time, or after the password was reset by an administrator).
775Account locked outReturned when a valid username is supplied, but the account is locked out. Note that this error will be returned regardless of whether or not the password is invalid.

🔴 Permission Denied for config file

When starting the config script and you run into

open /home/moveworks/agent/conf/agent_config.yml: permission denied

This is what the directory permissions for the agent should look like after running the setup guide. Make sure the group ID is 17540.

Selinux https://prefetch.net/blog/2017/09/30/using-docker-volumes-on-selinux-enabled-servers/#:~:text=To allow a docker container,Z to the volume mount.


🔴 Failing to connect to agent.moveworks.com TLS timing out on client hello

This is possibly related to MTS (maximum transmission unit) being too small.

https://linuxhint.com/how-to-change-mtu-size-in-linux/

ifconfig | grep mtu
ifconfig <Interface_name> mtu <mtu_size> up

🔴 No Such File or Directory for start_agent.sh

If you see the following error:

open/ home/moveworks/agent/scripts/start_agent.sh: no such file or directory

This might mean that the command to run the configuration script may be incorrect (check for new line characters).


🔴 [401 Forbidden] when trying to pull Agent image onto the machine

When using wget to pull the agent image from the s3 URL, you may get this error. This means the VM may not have outbound network connectivity enabled.


🔴 LDAP Result Code 200 “Network Error”: dial tcp:

The error will reference the name of the server that is erroring out. Verify that the FQDN of the LDAP server is correct as specified in the agent_config.yml file. It’s possible the server was renamed or decommissioned.


🔴 LDAP Result Code 8 “Strong Auth Required”

Typically, this means you are attempting to use port 389 with LDAP, but the server expects LDAPS with port 636 and a certificate.

You will need to reconfigure the agent with a cert using LDAPS.

See Slack Overflow post for more info: https://stackoverflow.com/questions/24385929/stronger-authentication-required


🔴 moveworks/agent/certs/agent_key.pem permission denied

The key doesn’t exist in this case because it couldn’t be written there. We need to update the permissions of the directory. Try running sudo chown -R 17540:17540 . in the root of the agent directory.


Handy Docker Commands when Debugging

  1. Tail docker logs and ensure no startup errors
    1. Show only new logs: docker logs -f containerName
  2. Tail INFO logs outputted by agent
    1. navigate to logs directory
    2. tail -f *.INFO.log
  3. Access the command line inside the docker container: docker exec -it $(docker ps --format '{{.Names}}') /bin/bash
  4. Kill all running docker containers: docker kill $(docker ps -q)
  5. Remove all stopped docker containers: docker rm $(docker ps -aq)
  6. Remove all docker images (note: requires you to reload the agent image): docker rmi $(docker images -q)
    1. Warning: This removes all docker images, don’t do this if the customer has other docker images loaded on this machine.
    2. Instead do docker rmi {docker image id here}
      1. Get the image Id from docker images
      2. Verify the agent build image is the date expected

Other Tips

  • For terminal commands on the host machine, you may need to run sudo bin/bash before performing any commands.
  • If scripts don't work on the host machine, try cat <script>.sh and run the actual docker commands.
  • You can use telnet to verify network connections/whitelisting e.g: telnet kafka-elb.moveworks.io 19092 or telnet <LDAP_DOMAIN> 636