Sunday, October 7, 2018

Managing Multi-Tenant Environments

Managing multi-tenancy on systems is a balancing act. Administrators must prevent actions from adversely affecting other tenants, while providing users the resources to do their jobs. If done correctly, cluster management should be seamless, which greatly reduces fire-fighting and allowing time to be spent on improving the architecture.

I've had to tackle all of the following issues in production on various clusters, with hundreds of users.  Automation using Python scripting, scheduling and orchestration is key to making your life easy.

  1. User Space Management
    This is critical in large environments, which may have hundreds of users. 
    1. Onboarding/Offboarding
      Create home directories automatically when new users appear in AD groups. 
    2. Directory Quotas
      Explicitly set to a maximum on all file systems. Prevent users from DDOSing the system. 
    3. Directory Rights
      Prevent users from writing to directories they shouldn't, and allow access to universal resources.
    4. Symlinks and System Defaults
      Alter /etc/profile.d to realias system commands to the preferred ones.
  2. Resource Self-Service
    This is key, all accounts and resources should be automatically provisioned, so you don't have to provide these to every user.
    1. ETL Mechanism
      Developers will need some way to populate the system (NiFi/StreamSets/Logstash)
    2. Scalable Buffer
      Kafka is your friend! Set this up with Kerberos and auto-topic creation in a dev environment.
    3. Processing Resources
      Yarn Queues for Hadoop or cluster limits for DataBricks.
      https://docs.databricks.com/administration-guide/cloud-configurations/aws/cmbp.html
    4. Scheduling Mechanism
      Set up multi-tenancy for users (Oozie/Airflow/Azkaban)
  3. User Examples
    Write some examples, so users can become comfortable running their first job.
    1. Job - Sample Spark job/Elasticsearch Watch/etc.
    2. Scheduling - An example of how to run the job periodically. 
    3. Advanced Tasks - Examples to show how to perform geolocation/write their own record reader/handle reading/writing from databases, etc.

1 comment:

  1. Harrah's Cherokee Casino Resort - Mapyro
    Harrah's Cherokee Casino Resort is a Native American casino 문경 출장샵 in the heart of the Great 평택 출장안마 Smoky 거제 출장샵 Mountains of Western North Carolina. Find reviews 수원 출장샵 and information for 울산광역 출장마사지

    ReplyDelete

Apache Airflow - Runbook

To try out a different scheduler,  we tried Apache Airflow to schedule Spark jobs.  Due to a known issue with Kerberos and Python 3 (see...