Skip to content Skip to footer

Job Description

Experience: 4+ years

Location: Hyderabad (Work from Office)

Notice Period: Immediate

About NStarX

NStarX is an AI-first, Cloud-first engineering services provider built and led by practitioners.
We specialize in transforming businesses through cutting-edge technology solutions.
With years of expertise, we deliver scalable, data-driven systems that empower our clients to make smarter, faster decisions.

For more information, visit:
https://nstarxinc.com/

Role Summary

We are seeking a Software Engineer (Data Engineering) who can seamlessly integrate the roles of a Data Engineer and Data Scientist.

The ideal candidate will design robust data pipelines, build AI/ML models, and deliver data-driven insights that address complex business challenges.

This is a client-facing role requiring close collaboration with US-based stakeholders, and the candidate must be flexible to work in alignment with US time zones when needed.

Key Responsibilities

Data Engineering
  • Design, build, and maintain scalable ETL and ELT pipelines for large-scale data processing.
  • Develop and optimize data architectures supporting analytics and ML workflows.
  • Ensure data integrity, security, and compliance with organizational and industry standards.
  • Collaborate with DevOps teams to deploy and monitor data pipelines in production environments.
Data Science & AI / ML
  • Build predictive and prescriptive models leveraging AI and ML techniques.
  • Develop and deploy machine learning and deep learning models using TensorFlow, PyTorch, or Scikit-learn.
  • Perform feature engineering, statistical analysis, and data preprocessing.
  • Continuously monitor and optimize models for accuracy and scalability.
  • Integrate AI-driven insights into business processes and strategies.
Client Interaction
  • Serve as the technical liaison between NStarX and client teams.
  • Participate in client discussions, requirement gathering, and design reviews.
  • Provide status updates, insights, and recommendations directly to client stakeholders.
  • Work flexibly with customers based on US time zones for real-time collaboration.
Required Qualifications
  • Experience: 4+ years in Data Engineering and AI/ML roles.
  • Education: Bachelor’s or Master’s degree in Computer Science, Data Science, or a related field.
Technical Skills (Necessary)
  • Languages & Libraries: Python, SQL, Bash, PySpark, Spark SQL, boto3, pandas.
  • Compute: Apache Spark on EMR (driver/executor model, sizing, dynamic allocation).
  • Storage: Amazon S3 (Parquet) with lifecycle management to Glacier.
  • Catalog: AWS Glue Catalog and Crawlers.
  • Orchestration & Serverless: AWS Step Functions, AWS Lambda, Amazon EventBridge.
  • Ingestion: CloudWatch Logs and Metrics, Kinesis Data Firehose (or Kafka/MSK).
  • Warehouse: Amazon Redshift and Redshift Spectrum.
  • Security & Access: IAM (least privilege), Secrets Manager, SSM.
  • Operations & Collaboration: Git with CI pipelines (Jenkins, GitHub, GitLab), CloudWatch monitoring.
Nice to Have
  • Scala, Docker, Kubernetes (Spark-on-Kubernetes), k9s.
  • Fast data stores such as DynamoDB, MongoDB, or Redis.
  • Databricks and Jupyter notebooks.
  • FinOps exposure including cost baselines and dashboards.

Core Skills (Hands-on Responsibilities)

Data Lake to Data Mart Design
  • Design layered data lake to data mart models (raw → processed → merged → aggregated).
  • Implement hive-style partitioning (year/month/day) with retention and archival strategies.
  • Define schema contracts, decision logic, and state machine handoffs.
Spark ETL Development
  • Author robust PySpark or Scala jobs for parsing, flattening, merging, and aggregation.
  • Tune performance using broadcast joins, partition pruning, and shuffle control.
  • Implement atomic, overwrite-by-partition writes and idempotent operations.
Warehouse Synchronization
  • Perform idempotent DELETE, INSERT, or MERGE operations into Redshift.
  • Maintain audit-friendly SQL with deterministic predicates and row-level metrics.
Data Quality, Reliability & Observability
  • Build scalable, automated ETL pipelines with idempotency and cost efficiency.
  • Implement schema drift checks, duplicate prevention, and partition reconciliation.
  • Monitor EMR or Kubernetes lifecycle, cluster right-sizing, and cost tracking.
Ingestion & Storage
  • Build log and event pipelines into S3 using CloudWatch, Kinesis, or Firehose.
  • Manage bucket layouts, lifecycle rules, and data catalog consistency.
  • Understand compression formats and Hive-style directory structures.
Orchestration & Automation
  • Implement AWS Step Functions with Choice, Map, Parallel states, retries, and backoff.
  • Automate scheduling using EventBridge and deploy guardrail Lambdas.
  • Parameterize pipelines for multiple environments and selective recomputations.
Soft Skills
  • Strong analytical and problem-solving capabilities.
  • Excellent communication for client engagement and stakeholder presentations.
  • Ability to work flexibly with global and US-based teams.
  • Team-oriented, proactive, and adaptable in fast-paced environments.
Preferred Qualifications
  • Experience with MLOps and end-to-end AI/ML deployment pipelines.
  • Knowledge of NLP and Computer Vision.
  • Certifications in AI/ML, AWS, Azure, or GCP.
Benefits
  • Competitive salary and performance-based incentives.
  • Opportunity to work on cutting-edge AI and ML projects.
  • Exposure to global clients and international project delivery.
  • Continuous learning and professional development opportunities.

Why NStarX

  • You will be joining a fast-growing AI-native company backed by:
  • Strategic investment from SHI International
  • Advisory board including Silicon Valley CXOs & AI leaders
  • Compensation includes:
  • Competitive base + commission
  • Fast growth into leadership roles

To apply for this job email your details to recruiting-ind@nstarxinc.com

Privacy Overview
NStarX Logo

This website uses cookies so that we can provide you with the best user experience possible. Cookie information is stored in your browser and performs functions such as recognising you when you return to our website and helping our team to understand which sections of the website you find most interesting and useful.

Necessary

Strictly Necessary Cookie should be enabled at all times so that we can save your preferences for cookie settings.