Technical InternshipsApply to this job
RAW Labs is a rapidly expanding Swiss enterprise data technology company that was spun out of École Polytechnique Fédérale de Lausanne (EPFL), by Prof. Anastasia Ailamaki and a team of highly successful engineers and scientists from amongst others CERN, Cisco and Salesforce.
At RAW Labs we have developed novel and innovative technologies to interrogate massive quantities of data in different formats, that are held in a variety of data stores across an enterprise infrastructure and in the Cloud. By leveraging this core technology RAW Labs has built a Cloud based Data Sharing platform for creating and maintaining APIs. The RAW platform enables our customers to exploit all forms of data to create curated Data Products via our DataOps infrastructure, and securely share data in hours, not days. Enterprises use RAW Labs’ platform to drive ML/AI, business intelligence and data analytics applications without having to build and maintain complex data engineering infrastructures.
RAW Labs is funded by a group of highly sophisticated and experienced technology investors and are advised by technology luminaries including: Prof. Martin Odersky (creator of Scala), Prof. Mike Franklin (co-creator of Spark), Dr. Alon Halevy (from Facebook’s AI team).
Where you’ll be
Our R&D team is based in two development centers: one in Lausanne, Switzerland, and the other in Athens, Greece. The successful applicants will be working in/near either office, with Remote working available if desired by agreement – note that access to either office will be required for face-to-face meetings with your project supervisor.
About the Role
We are seeking a number of highly talented, innovation-driven Interns to help our research and further differentiate the RAW platform.
Our Interns will be assigned one of several projects we have, based on preference, availability, suitability and business priorities (See below).
As an Intern you will be assigned a technical supervisor, who will typically be a senior engineer from our staff. Whilst you are working with us, you will be fully integrated into the core engineering team and hence will get to see how a commercial technology company develops its product. And, of course, if you excel during the Internship period, there will be opportunities to join us permanently as we grow.
Firstly and foremostly, we are looking for candidates with a passion for new technology, an inquisitive mind, a self-starting approach, and a can-do attitude. You may be working on problems that may not have been solved yet, or only partially solved. We are looking for:
- University degree in computer science or engineering and any post-graduate experience a bonus.
- Commercial experience a benefit, not a pre-requisite
- Evidence of an innovative project you have undertaken in the software and/or data space.
- Great oral and written communication skills, preferably in English, but we have French, Greek and Portuguese speakers too.
For technical skills, we can tailor the project to the successful candidate’s experience. Here are some of the technologies we use currently:
- Scala and/or Java and/or Kotlin, and SQL.
- Development of distributed / big data, especially Spark
- Development of Visual Studio Code Extensions
- Cloud Service Provider’s stack, e.g: AWS
- Benchmarking and profiling tools, e.g., JMH, Apache JMeter
- Container technologies such as Kubernetes and/or Docker
- CI/CD tools, e.g., Jenkins, Artifactory and DevOps tooling, e.g., Terraform, Docker, Compose, Ansible
- Security frameworks/libraries/providers, e.g. Auth0
Please indicate in your application which of the projects are of interest to you. There can be multiple:
1. GraalVM/Truffle code generator (Expected duration: 6m-12m)
For Whom: Students interested into gaining hands-on into compilers and database architecture. Duration: Flexible, but min. of 3 months. Ideally 6 to 12 months.
RAW’s query engine implements just-in-time code generation techniques, and additionally supports multiple experimental code emitters. The goal of this internship is to contribute to an experimental Truffle-based code generator for the RAW query engine.
This project requires knowledge of Java and basic knowledge of compiler design (e.g. university course on compilers). Database engine design knowledge is desirable. Experience in – or strong desired to learn – GraalVM and Truffle in particular are a big plus, so please mention it.
We provide learning material, so above all, this project requires willingness to learn, implement and experiment.
2. AI/ML for Schema Discovery
For Whom: Students interested into gaining hands-on experience in the development of practical AI/ML models. Duration: Flexible. Expect 2 to 6 months.
A major feature of RAW’s query engine is the ability to read “raw” data (CSV, XML, JSONs, etc) with complex structures, directly from source and without any previous processing. Among other features, this requires a complex schema detection framework. For this we have developed an inferrer.
The goal of this project is to advance the use of AI/ML techniques for the inferrer to detect the structure and schema of typical datasets in a performant matter, e.g. character encoding, headers, delimiters, data types, formats of dates, numbers, nulls, etc. It is an opportunity to develop novel techniques, which may lay the foundation for future research-level work.
This project requires previous experience in building AI/ML models. No specific toolchain is required. Above all, this project requires a deep interest in the development and experimentation of new techniques as well as its careful evaluation.
3. Visualisation of execution query plans
For Whom: Students interested into gaining hands-on experience in building advanced UI prototypes. Duration: Flexible. Expected 2 to 4 months.
Database engines provide users with “query execution plans” as to help them debug, understand and improve query performance. RAW’s query engine is based on a category theoretical model that diverges from the relational algebra models commonly used in database engines.
This new algebra requires novel ideas to visualise the query execution plan. The goal of this internship is to implement visualisation tools for the query plans produced by RAW.
This project requires knowledge of UI Web-based tools, as our goal is to provide a query plan viewer. No specific toolchain is required. Above all, this project requires creativity and a desire to excel in UX and UI design.
4. Visually exploring and querying nested data
For Whom: Students interested into gaining hands-on experience in building advanced UI prototypes and concepts (e.g. work on developer tooling for game engines or other complex UI projects). Duration: Flexible, ranging from 2 months early prototypes to 12 months project.
RAW’s query engine allows users to explore and query complex data. Currently, the user develops code using RAW SQL, a SQL-based query language created at RAW Labs with multiple extensions for querying complex nested data structures.
The goal of this project is to explore ideas for visually exploring and querying complex nested data structures. This is an alternative to manually writing RAW SQL scripts, and would provide users with an easier “no code” approach to write queries.
This project requires knowledge of UI development. No specific toolchain is required. Experience in using or development game development tools (e.g. Unreal Blueprints) is strongly desired. Above all, this project requires creativity and a desire to excel in UX and UI design.
5. Data Lineage
For Whom: Students interested in data management, automated data governance, traceability, metadata interoperability, standards and data catalogs. Duration: Flexible, ranging from 2 to 4 months
RAW’s query engine allows users to build complex analytics that integrate multiple data sources. These data analytics libraries can be built and shared with other users.
The goal of this project is to develop tools to determine and expose the lineage of data transformations in RAW, in the form of a catalog and REST APIs that can be easily consumed by UIs as well as users. The APIs to expose data lineage will be an extension of the core metadata APIs being developed at RAW.
This project requires knowledge of Scala/Java, as well as SQL.
6. GraphQL API interface
For Whom: Students with an interest in API developments, GraphQL vs. REST differences/comparisons, experience/knowledge of ApolloGraphQL, NodeJs, etc. is useful. Duration: Flexible, ranging from 2 to 4 months
Currently we support generation of REST APIs, however we are interested in generating GraphQL interfaces as these often work well for analytical workloads and user-defined questions.
The goal of the project will be to prototype and evaluate a GraphQL interface for the RAW platform. In addition to the development of the back-end, some consideration will be required for the UI and UX part, and metadata/API catalog.
7. Short-term Development Projects
For Whom: Students looking for short and well-defined implementation projects, using Scala/Java, and who do not have more than 2 months. Duration: 2 months
We have a number of short-term development projects; these are well-defined and scoped projects that give you the opportunity to contribute to an advanced codebase and gain practical experience. These projects include, but not limited to the following,
- Support OpenData: The Open Data Protocol is used for consuming queryable REST APIs. The goal of this project is to implement the Open Data Protocol to expose RAW data.
- Support OpenAPI: The OpenAPI specification is used to describe, produce and consume web services. RAW also allows users to easily create “data-based Web Services”. The goal of this project is to implement the latest OpenAPI standards in RAW, so that our API’s can be conformant with the standards
- JSON Schema: JSON schema provides a way to describe the structure of complex data. The goal of this project is to implement the support for JSON Schema in RAW
- Add API invocation commands to enhance our RAW user catalog application. RAW creates APIs but to use them it’s easier for the user to just copy/paste the invocation in: Python, Java, Excel/PowerBI, Postman, or some other technology, directly from our App.
- Development of RAW SQL demos: At RAW Labs, we are constantly developing new “data products” using the RAW platform. The goal of this project is to help the development of a variety demos. Unlike other projects, this involves primarily the use of RAW SQL language
Apply to this job