Data Engineer - NStarX Inc.

Full Time
Hyderabad, India (Office)
Posted 2 months ago
Applications have closed

Job Description

Experience: 4+ years

Location: Hyderabad (Work from Office)

Notice Period: Immediate

About NStarX

NStarX is an AI-first, Cloud-first engineering services provider built and led by practitioners.
We specialize in transforming businesses through cutting-edge technology solutions.
With years of expertise, we deliver scalable, data-driven systems that empower our clients to make smarter, faster decisions.

For more information, visit:
https://nstarxinc.com/

Role Summary

We are seeking a Software Engineer (Data Engineering) who can seamlessly integrate the roles of a Data Engineer and Data Scientist.

The ideal candidate will design robust data pipelines, build AI/ML models, and deliver data-driven insights that address complex business challenges.

This is a client-facing role requiring close collaboration with US-based stakeholders, and the candidate must be flexible to work in alignment with US time zones when needed.

Key Responsibilities

Data Engineering

Design, build, and maintain scalable ETL and ELT pipelines for large-scale data processing.
Develop and optimize data architectures supporting analytics and ML workflows.
Ensure data integrity, security, and compliance with organizational and industry standards.
Collaborate with DevOps teams to deploy and monitor data pipelines in production environments.

Data Science & AI / ML

Build predictive and prescriptive models leveraging AI and ML techniques.
Develop and deploy machine learning and deep learning models using TensorFlow, PyTorch, or Scikit-learn.
Perform feature engineering, statistical analysis, and data preprocessing.
Continuously monitor and optimize models for accuracy and scalability.
Integrate AI-driven insights into business processes and strategies.

Client Interaction

Serve as the technical liaison between NStarX and client teams.
Participate in client discussions, requirement gathering, and design reviews.
Provide status updates, insights, and recommendations directly to client stakeholders.
Work flexibly with customers based on US time zones for real-time collaboration.

Required Qualifications

Experience: 4+ years in Data Engineering and AI/ML roles.
Education: Bachelor’s or Master’s degree in Computer Science, Data Science, or a related field.

Technical Skills (Necessary)

Languages & Libraries: Python, SQL, Bash, PySpark, Spark SQL, boto3, pandas.
Compute: Apache Spark on EMR (driver/executor model, sizing, dynamic allocation).
Storage: Amazon S3 (Parquet) with lifecycle management to Glacier.
Catalog: AWS Glue Catalog and Crawlers.
Orchestration & Serverless: AWS Step Functions, AWS Lambda, Amazon EventBridge.
Ingestion: CloudWatch Logs and Metrics, Kinesis Data Firehose (or Kafka/MSK).
Warehouse: Amazon Redshift and Redshift Spectrum.
Security & Access: IAM (least privilege), Secrets Manager, SSM.
Operations & Collaboration: Git with CI pipelines (Jenkins, GitHub, GitLab), CloudWatch monitoring.

Nice to Have

Scala, Docker, Kubernetes (Spark-on-Kubernetes), k9s.
Fast data stores such as DynamoDB, MongoDB, or Redis.
Databricks and Jupyter notebooks.
FinOps exposure including cost baselines and dashboards.

Core Skills (Hands-on Responsibilities)

Data Lake to Data Mart Design

Design layered data lake to data mart models (raw → processed → merged → aggregated).
Implement hive-style partitioning (year/month/day) with retention and archival strategies.
Define schema contracts, decision logic, and state machine handoffs.

Spark ETL Development

Author robust PySpark or Scala jobs for parsing, flattening, merging, and aggregation.
Tune performance using broadcast joins, partition pruning, and shuffle control.
Implement atomic, overwrite-by-partition writes and idempotent operations.

Warehouse Synchronization

Perform idempotent DELETE, INSERT, or MERGE operations into Redshift.
Maintain audit-friendly SQL with deterministic predicates and row-level metrics.

Data Quality, Reliability & Observability

Build scalable, automated ETL pipelines with idempotency and cost efficiency.
Implement schema drift checks, duplicate prevention, and partition reconciliation.
Monitor EMR or Kubernetes lifecycle, cluster right-sizing, and cost tracking.

Ingestion & Storage

Build log and event pipelines into S3 using CloudWatch, Kinesis, or Firehose.
Manage bucket layouts, lifecycle rules, and data catalog consistency.
Understand compression formats and Hive-style directory structures.

Orchestration & Automation

Implement AWS Step Functions with Choice, Map, Parallel states, retries, and backoff.
Automate scheduling using EventBridge and deploy guardrail Lambdas.
Parameterize pipelines for multiple environments and selective recomputations.

Soft Skills

Strong analytical and problem-solving capabilities.
Excellent communication for client engagement and stakeholder presentations.
Ability to work flexibly with global and US-based teams.
Team-oriented, proactive, and adaptable in fast-paced environments.

Preferred Qualifications

Experience with MLOps and end-to-end AI/ML deployment pipelines.
Knowledge of NLP and Computer Vision.
Certifications in AI/ML, AWS, Azure, or GCP.

Benefits

Competitive salary and performance-based incentives.
Opportunity to work on cutting-edge AI and ML projects.
Exposure to global clients and international project delivery.
Continuous learning and professional development opportunities.

Why NStarX

You will be joining a fast-growing AI-native company backed by:
Strategic investment from SHI International
Advisory board including Silicon Valley CXOs & AI leaders
Compensation includes:
Competitive base + commission
Fast growth into leadership roles

Job Description

About NStarX

Role Summary

Key Responsibilities

Data Engineering

Data Science & AI / ML

Client Interaction

Required Qualifications

Technical Skills (Necessary)

Nice to Have

Core Skills (Hands-on Responsibilities)

Data Lake to Data Mart Design

Spark ETL Development

Warehouse Synchronization

Data Quality, Reliability & Observability

Ingestion & Storage

Orchestration & Automation

Soft Skills

Preferred Qualifications

Benefits

Why NStarX

Have Questions?

Services

Industries

About Us

Insights

Address

Contact

+1 314 720 4402