Job Description
Experience: 4+ years
Location: Hyderabad (Work from Office)
Notice Period: Immediate
About NStarX
NStarX is an AI-first, Cloud-first engineering services provider built and led by practitioners.
We specialize in transforming businesses through cutting-edge technology solutions.
With years of expertise, we deliver scalable, data-driven systems that empower our clients to make smarter, faster decisions.
For more information, visit:
https://nstarxinc.com/
Role Summary
We are seeking a Software Engineer (Data Engineering) who can seamlessly integrate the roles of a Data Engineer and Data Scientist.
The ideal candidate will design robust data pipelines, build AI/ML models, and deliver data-driven insights that address complex business challenges.
This is a client-facing role requiring close collaboration with US-based stakeholders, and the candidate must be flexible to work in alignment with US time zones when needed.
Key Responsibilities
Data Engineering
- Design, build, and maintain scalable ETL and ELT pipelines for large-scale data processing.
- Develop and optimize data architectures supporting analytics and ML workflows.
- Ensure data integrity, security, and compliance with organizational and industry standards.
- Collaborate with DevOps teams to deploy and monitor data pipelines in production environments.
Data Science & AI / ML
- Build predictive and prescriptive models leveraging AI and ML techniques.
- Develop and deploy machine learning and deep learning models using TensorFlow, PyTorch, or Scikit-learn.
- Perform feature engineering, statistical analysis, and data preprocessing.
- Continuously monitor and optimize models for accuracy and scalability.
- Integrate AI-driven insights into business processes and strategies.
Client Interaction
- Serve as the technical liaison between NStarX and client teams.
- Participate in client discussions, requirement gathering, and design reviews.
- Provide status updates, insights, and recommendations directly to client stakeholders.
- Work flexibly with customers based on US time zones for real-time collaboration.
Required Qualifications
- Experience: 4+ years in Data Engineering and AI/ML roles.
- Education: Bachelor’s or Master’s degree in Computer Science, Data Science, or a related field.
Technical Skills (Necessary)
- Languages & Libraries: Python, SQL, Bash, PySpark, Spark SQL, boto3, pandas.
- Compute: Apache Spark on EMR (driver/executor model, sizing, dynamic allocation).
- Storage: Amazon S3 (Parquet) with lifecycle management to Glacier.
- Catalog: AWS Glue Catalog and Crawlers.
- Orchestration & Serverless: AWS Step Functions, AWS Lambda, Amazon EventBridge.
- Ingestion: CloudWatch Logs and Metrics, Kinesis Data Firehose (or Kafka/MSK).
- Warehouse: Amazon Redshift and Redshift Spectrum.
- Security & Access: IAM (least privilege), Secrets Manager, SSM.
- Operations & Collaboration: Git with CI pipelines (Jenkins, GitHub, GitLab), CloudWatch monitoring.
Nice to Have
- Scala, Docker, Kubernetes (Spark-on-Kubernetes), k9s.
- Fast data stores such as DynamoDB, MongoDB, or Redis.
- Databricks and Jupyter notebooks.
- FinOps exposure including cost baselines and dashboards.
Core Skills (Hands-on Responsibilities)
Data Lake to Data Mart Design
- Design layered data lake to data mart models (raw → processed → merged → aggregated).
- Implement hive-style partitioning (year/month/day) with retention and archival strategies.
- Define schema contracts, decision logic, and state machine handoffs.
Spark ETL Development
- Author robust PySpark or Scala jobs for parsing, flattening, merging, and aggregation.
- Tune performance using broadcast joins, partition pruning, and shuffle control.
- Implement atomic, overwrite-by-partition writes and idempotent operations.
Warehouse Synchronization
- Perform idempotent DELETE, INSERT, or MERGE operations into Redshift.
- Maintain audit-friendly SQL with deterministic predicates and row-level metrics.
Data Quality, Reliability & Observability
- Build scalable, automated ETL pipelines with idempotency and cost efficiency.
- Implement schema drift checks, duplicate prevention, and partition reconciliation.
- Monitor EMR or Kubernetes lifecycle, cluster right-sizing, and cost tracking.
Ingestion & Storage
- Build log and event pipelines into S3 using CloudWatch, Kinesis, or Firehose.
- Manage bucket layouts, lifecycle rules, and data catalog consistency.
- Understand compression formats and Hive-style directory structures.
Orchestration & Automation
- Implement AWS Step Functions with Choice, Map, Parallel states, retries, and backoff.
- Automate scheduling using EventBridge and deploy guardrail Lambdas.
- Parameterize pipelines for multiple environments and selective recomputations.
Soft Skills
- Strong analytical and problem-solving capabilities.
- Excellent communication for client engagement and stakeholder presentations.
- Ability to work flexibly with global and US-based teams.
- Team-oriented, proactive, and adaptable in fast-paced environments.
Preferred Qualifications
- Experience with MLOps and end-to-end AI/ML deployment pipelines.
- Knowledge of NLP and Computer Vision.
- Certifications in AI/ML, AWS, Azure, or GCP.
Benefits
- Competitive salary and performance-based incentives.
- Opportunity to work on cutting-edge AI and ML projects.
- Exposure to global clients and international project delivery.
- Continuous learning and professional development opportunities.
Why NStarX
- You will be joining a fast-growing AI-native company backed by:
- Strategic investment from SHI International
- Advisory board including Silicon Valley CXOs & AI leaders
- Compensation includes:
- Competitive base + commission
- Fast growth into leadership roles
To apply for this job email your details to recruiting-ind@nstarxinc.com