I've had to tackle all of the following issues in production on various clusters, with hundreds of users. Automation using Python scripting, scheduling and orchestration is key to making your life easy.
- User Space Management
This is critical in large environments, which may have hundreds of users. - Onboarding/Offboarding
Create home directories automatically when new users appear in AD groups. - Directory Quotas
Explicitly set to a maximum on all file systems. Prevent users from DDOSing the system. - Directory Rights
Prevent users from writing to directories they shouldn't, and allow access to universal resources. - Symlinks and System Defaults
Alter /etc/profile.d to realias system commands to the preferred ones. - Resource Self-Service
This is key, all accounts and resources should be automatically provisioned, so you don't have to provide these to every user. - ETL Mechanism
Developers will need some way to populate the system (NiFi/StreamSets/Logstash) - Scalable Buffer
Kafka is your friend! Set this up with Kerberos and auto-topic creation in a dev environment. - Processing Resources
Yarn Queues for Hadoop or cluster limits for DataBricks.
https://docs.databricks.com/administration-guide/cloud-configurations/aws/cmbp.html - Scheduling Mechanism
Set up multi-tenancy for users (Oozie/Airflow/Azkaban) - User Examples
Write some examples, so users can become comfortable running their first job. - Job - Sample Spark job/Elasticsearch Watch/etc.
- Scheduling - An example of how to run the job periodically.
- Advanced Tasks - Examples to show how to perform geolocation/write their own record reader/handle reading/writing from databases, etc.
Harrah's Cherokee Casino Resort - Mapyro
ReplyDeleteHarrah's Cherokee Casino Resort is a Native American casino 문경 출장샵 in the heart of the Great 평택 출장안마 Smoky 거제 출장샵 Mountains of Western North Carolina. Find reviews 수원 출장샵 and information for 울산광역 출장마사지