About the role
You are an experienced software engineer who thrives on building large-scale computing platforms. You have deep expertise in large scale distributed systems that deal with high complexity, a lot of traffic and data. You know how to achieve reliability and scale with minimum operational load.
Key responsibilities
- Build our core Python/Rust platform: request routing, AI workload orchestration, scheduling, GPU autoscaling, large scale file storage, queueing, etc
- Produce forward designs for platform evolution as we scale to 100x current traffic and need to provide low latency across the world
- Leverage AI to an extreme level to automate the mundane parts of building complex but reliable systems
- Profile and tune low level CPU and memory performance
Requirements
- 5+ years experience building distributed compute and orchestration platforms in Python or Rust
- Strong understanding of distributed systems fundamentals: consensus, scheduling, fault tolerance, capacity planning
- Deep understanding of computational complexity and memory allocation
- Track record of designing systems that scale under real production load
- Experience building and using observability to drive performance and reliability decisions
- Excellent communication and ability to drive technical decisions across teams
- Self-starter who executes quickly, takes ownership, and constantly seeks improvement
Nice to have
- Experience with AI/ML inference or training infrastructure
- Experience with high-performance systems programming (async runtimes, zero-copy, memory-safe concurrency)
- Background in building multi-tenant compute platforms
- Understanding of networking fundamentals and performance characteristics
- Familiarity with GPU workload characteristics and scheduling constraints
Location
-
Turkey
What we offer at fal
- Interesting and challenging work
- A lot of learning and growth opportunities
- Regular team events and offsites
Find similar jobs
Explore opportunities with similar job descriptions at other companies.
$ similar5 results
10h
Notion
Software Engineer, Infrastructure
Hyderabad, India
Notion10h
Software Engineer, Infrastructure
Hyderabad, India
2d
OpenAI
Software Engineer, RL Training Infra
San Francisco
OpenAI2d
Software Engineer, RL Training Infra
San Francisco
2d
OpenAI
Software Engineer, ML Systems & Training Architecture
San Francisco
OpenAI2d
Software Engineer, ML Systems & Training Architecture
San Francisco
2d
Crusoe
Senior Staff Network Engineer, Deployment
San Francisco, CA - US
Crusoe2d
Senior Staff Network Engineer, Deployment
San Francisco, CA - US
3d
MongoDB
Senior Platform Engineer
Gurugram
MongoDB3d
Senior Platform Engineer
Gurugram