Data Engineer (NLP-Focused)

  • Компания: Київстар
  • Город , Kyiv,
  • Зарплата:
  • Размещено: 2025-08-14 00:00:00

Описание

We are looking for a Data Engineer (NLP-Focused) to build and optimize the data pipelines that fuel our Ukrainian LLM and Kyivstar’s NLP initiatives. In this role, you will design robust ETL/ELT processes to collect, process, and manage large-scale text and metadata, enabling our data scientists and ML engineers to develop cutting-edge language models. You will work at the intersection of data engineering and machine learning, ensuring that our datasets and infrastructure are reliable, scalable, and tailored to the needs of training and evaluating NLP models in a Ukrainian language context. This is a unique opportunity to shape the data foundation of a pioneering AI project in Ukraine, working alongside NLP experts and leveraging modern big data technologies.About us  is a Ukrainian hybrid IT company and a resident of We are a subsidiary of Kyivstar, one of Ukraine's largest telecom operators.Our mission is to change lives in Ukraine and around the world by creating technological solutions and products that unleash the potential of businesses and meet users' needs. Over 600+ specialists work daily in various areas: mobile and web solutions, as well as design, development, support, and technical maintenance of high-performance systems and services. We believe in innovations that truly bring quality changes and constantly challenge conventional approaches and solutions. Each of us is an adherent of entrepreneurial culture, which allows us never to stop, to evolve, and to create something new. 

What you will do

Design, develop, and maintain ETL/ELT pipelines for gathering, transforming, and storing large volumes of text data and related information. Ensure pipelines are efficient and can handle data from diverse sources (, web crawls, public datasets, internal databases) while maintaining data integrity.Implement web scraping and data collection services to automate the ingestion of text and linguistic data from the web and other external sources. This includes writing crawlers or using APIs to continuously collect data relevant to our language modeling efforts.Implementation of NLP/LLM-specific data processing: cleaning and normalization of text, like filtering of toxic content, de-duplication, de-noising, detection, and deletion of personal data.Formation of specific SFT/RLHF datasets from existing data, including data augmentation/labeling with LLM as teacher.Set up and manage cloud-based data infrastructure for the project. Configure and maintain data storage solutions (data lakes, warehouses) and processing frameworks (, distributed compute on AWS/GCP/Azure) that can scale with growing data needs.Automate data processing workflows and ensure their scalability and reliability. Use workflow orchestration tools like Apache Airflow to schedule and monitor data pipelines, enabling continuous and repeatable model training and evaluation cycles.Maintain and optimize analytical databases and data access layers for both ad-hoc analysis and model training needs. Work with relational databases (, PostgreSQL) and other storage systems to ensure fast query performance and well-structured data schemas.Collaborate with Data Scientists and NLP Engineers to build data features and datasets for machine learning models. Provide data subsets, aggregations, or preprocessing as needed for tasks such as language model training, embedding generation, and evaluation.Implement data quality checks, monitoring, and alerting. Develop scripts or use tools to validate data completeness and correctness (, ensuring no critical data gaps or anomalies in the text corpora), and promptly address any pipeline failures or data issues. Implement data version control.Manage data security, access, and compliance. Control permissions to datasets and ensure adherence to data privacy policies and security standards, especially when dealing with user data or proprietary text sources.

Qualifications and experience needed

Education & Experience: 3+ years of experience as a Data Engineer or in a similar role, building data-intensive pipelines or platforms. A Bachelor’s or Master’s degree in Computer Science, Engineering, or a related field is preferred. Experience supporting machine learning or analytics teams with data pipelines is a strong advantage.NLP Domain Experience: Prior experience handling linguistic data or supporting NLP projects (, text normalization, handling different encodings, tokenization strategies). Knowledge of Ukrainian text sources and data sets, or experience with multilingual data processing, can be an advantage given our project’s focus. Understanding of FineWeb2 or a similar processing pipeline approach.Data Pipeline Expertise: Hands-on experience designing ETL/ELT processes, including extracting data from various sources, using transformation tools, and loading into storage systems. Proficiency with orchestration frameworks like Apache Airflow for scheduling workflows. Familiarity with building pipelines for unstructured data (text, logs) as well as structured data.Programming & Scripting: Strong programming skills in Python for data manipulation and pipeline development. Experience with NLP packages (spaCy, NLTK, langdetect, fasttext, etc.). Experience with SQL for querying and transforming data in relational databases. Knowledge of Bash or other scripting for automation tasks. Writing clean, maintainable code and using version control (Git) for collaborative development.Databases & Storage: Experience working with relational databases (, PostgreSQL, MySQL), including schema design and query optimization. Familiarity with NoSQL or document stores (, MongoDB) and big data technologies (HDFS, Hive, Spark) for large-scale data is a plus. Understanding of or experience with vector databases (, Pinecone, FAISS) is beneficial, as our NLP applications may require embedding storage and fast similarity search.Cloud Infrastructure: Practical experience with cloud platforms (AWS, GCP, or Azure) for data storage and processing. Ability to set up services such as S3/Cloud Storage, data warehouses (, BigQuery, Redshift), and use cloud-based ETL tools or serverless functions. Understanding of infrastructure-as-code (Terraform, CloudFormation) to manage resources is a plus.Data Quality & Monitoring: Knowledge of data quality assurance practices. Experience implementing monitoring for data pipelines (logs, alerts) and using CI/CD tools to automate pipeline deployment and testing. An analytical mindset to troubleshoot data discrepancies and optimize performance bottlenecks.Collaboration & Domain Knowledge: Ability to work closely with data scientists and understand the requirements of machine learning projects. Basic understanding of NLP concepts and the data needs for training language models, so you can anticipate and accommodate the specific forms of text data and preprocessing they require. Good communication skills to document data workflows and to coordinate with team members across different functions.

A plus would be

Advanced Tools & Frameworks: Experience with distributed data processing frameworks (such as Apache Spark or Databricks) for large-scale data transformation, and with message streaming systems (Kafka, Pub/Sub) for real-time data pipelines. Familiarity with data serialization formats (JSON, Parquet) and handling of large text corpora.Web Scraping Expertise: Deep experience in web scraping, using tools like Scrapy, Selenium, or Beautiful Soup, and handling anti-scraping challenges (rotating proxies, rate limiting). Ability to parse and clean raw text data from HTML, PDFs, or scanned documents.CI/CD & DevOps: Knowledge of setting up CI/CD pipelines for data engineering (using GitHub Actions, Jenkins, or GitLab CI) to test and deploy changes to data workflows. Experience with containerization (Docker) to package data jobs and with Kubernetes for scaling them is a plus.Big Data & Analytics: Experience with analytics platforms and BI tools (, Tableau, Looker) used to examine the data prepared by the pipelines. Understanding of how to create and manage data warehouses or data marts for analytical consumption.Problem-Solving: Demonstrated ability to work independently in solving complex data engineering problems, optimising existing pipelines, and implementing new ones under time constraints. A proactive attitude to explore new data tools or techniques that could improve our workflows.

What we offer

Office or remote – it’s up to you. You can work from anywhere, and we will arrange your workplace.Remote onboarding.Performance bonuses.We train employees with the opportunity to learn through the company’s library, internal resources, and programs from partners. Health and life insurance. Wellbeing program and corporate psychologist. Reimbursement of expenses for Kyivstar mobile communication.

Похожие вакансии

DevOps Engineer

... employee referral program, your personal data will be processed until the ... the processing of your personal data may be extended each time ... the processing of your personal data during the recruitment process, further ...
Компания: JTI Город:
Зарплата: Размещено:
jobs.jti.com

Frontend Engineer

... employee referral program, your personal data will be processed until the ... the processing of your personal data may be extended each time ... the processing of your personal data during the recruitment process, further ...
Компания: JTI Город:
Зарплата: Размещено:
jobs.jti.com

Data Governance Manager

... adherence to an enterprise data governance  and data quality framework for data policies, standards and practices, both ... consequently maintain the data strategy aligned to itWho are ... experience of embedding Data Governance processes and standards into ...
Компания: JTI Город:
Зарплата: Размещено:
jobs.jti.com

Data Governance Manager

... ensuring adherence to an enterprise data governance  and data quality framework for data policies, standards and practices, both ... years of experience of embedding Data Governance processes and standards into ...
Компания: JTI Город:
Зарплата: Размещено:
jobs.jti.com

Data Governance Manager

... ensuring adherence to an enterprise data governance  and data quality framework for data policies, standards and practices, both ... years of experience of embedding Data Governance processes and standards into ...
Компания: JTI Город:
Зарплата: Размещено:
jobs.jti.com

Consumer Data Activation Manager

... .Maximize the value of existing data sources and identify new data opportunities that could add value ... analytics.Knowledge of major big data storage data platforms (e.g., Data Lake).What we offer:  Competitive ...
Компания: JTI Город:
Зарплата: Размещено:
jobs.jti.com

Consumer Data Activation Manager

... .Conduct fit-gap analyses in data sources, models, and platforms to enhance data-driven decision-making.Coordinate with ... analytics.Knowledge of major big data storage data platforms (e.g., Data Lake).What are the next ...
Компания: JTI Город:
Зарплата: Размещено:
jobs.jti.com

UX Data Activation Manager

... track record of successful UX-focused execution.Ability to collect, organize, and analyze data from multiple sources to solve ... analytics.Knowledge of major big data storage data platforms (e.g., Data Lake).Experience using Google Analytics, ...
Компания: JTI Город:
Зарплата: Размещено:
jobs.jti.com

UX Data Activation Manager

... track record of successful UX-focused execution.Ability to collect, organize, and analyze data from multiple sources to solve ... analytics.Knowledge of major big data storage data platforms (e.g., Data Lake).Experience using Google Analytics ...
Компания: JTI Город:
Зарплата: Размещено:
jobs.jti.com

Senior Data Scientist/NLP Lead

... end-to-end development of NLP and LLM models - from data exploration and model prototyping to ... such as model drift or data pipeline bottlenecks.• Provide technical leadership and mentorship to the NLP ML team. Review code and ...
Компания: Київстар Город:, Kyiv,
Зарплата: Размещено:
ua.talent.com

Data Service & Quality Delivery Manager

... , ensuring that the impact on data lineage, data models, and overall data quality is properly assessed. The ... web analyticsKnowledge of major big data storage data platforms (i.e. Data Lake)Experience in project management ...
Компания: JTI Город:
Зарплата: Размещено:
jobs.jti.com

Data Service & Quality Delivery Manager

... , ensuring that the impact on data lineage, data models, and overall data quality is properly assessed. The ... web analyticsKnowledge of major big data storage data platforms (i.e. Data Lake)Experience in project management ...
Компания: JTI Город:
Зарплата: Размещено:
jobs.jti.com

Lead Big Data Engineer (#3873)

...  Senior Data Engineer to join our vibrant team. As a Senior Data Engineer, you will play a critical ... role in designing, developing, and maintaining sophisticated data pipelines, Ontology Objects, and Foundry ...
Компания: N-iX Город:, ,
Зарплата: Размещено:
ua.talent.com

Middle Data Engineer

... are looking for Middle Data Engineer, and we are sure that ... , transform, and load data from a wide variety of data sources, including external APIs, data streams, and data lakes.Implement data privacy and data security requirements to ensure solutions ...
Компания: NIX Solutions Город:Харьков, Украина
Зарплата: Размещено:
www.nixsolutions.com

Data application developer

... As a Data engineer, work closely with stakeholders(PM, Data analysts & Data scientists) to understand business requirements and translate them into data products.Implement and uphold data governance practices ... -focused environment.Our Commitment​At Zoom, ...
Компания: Zoom Город:, Dnipro,
Зарплата: Размещено:
ua.talent.com

Data Architect

... AVD). Representing conceptual and logical data models for the OLAP data entities. Architecting data flow for ETL ELT processes focused on OLAP data. Describing mechanics of data retention and historization according to ...
Компания: Andersen Город:Ukraine
Зарплата: Размещено:
people.andersenlab.com

Senior Cloud Architect (Data Modeling)

... technical specifications; Deep understanding of data governance frameworks and data quality practices; AWS Data Engineer Certification (or equivalent Azure GCP ...
Компания: Globaldev Group Город:, ,
Зарплата: Размещено:
ua.talent.com

Senior Data Engineer

... seek a dynamic and talented Data Engineer to join our data team. The ideal candidate should have a strong background in data engineering, with a preference for ...
Компания: Avenga Город:Kraków - Software Delivery Center, Remote
Зарплата: Размещено:
www.avenga.com

Data Engineer, Customer Care

... product launch & features. As a Data Engineer at Lyft, you will be ... exposes services that make data a first-class citizen at Lyft. We are looking for a Data Engineer to build a scalable data platform. You will proactively propose ...
Компания: Lyft Город:, Kyiv,
Зарплата: Размещено:
ua.talent.com

Database Engineer & Data Analyst Job Description

... skilled Mid-Senior Database Engineer and Data Analyst to join our dynamic team. This role combines database engineering expertise with advanced data analysis capabilities to design, implement, ... Python or R for data analysis and automation - Experience with ...
Компания: Sigma Marketing LLC Город:, Kyiv,
Зарплата: Размещено:
ua.talent.com

Data Model Engineer (IR-439)

... more.You’ll work closely with Data Architects, Data Engineer, Business Analyst and DevOps Engineers to design and implement scalable data solutions.Requirements Bachelor’s or Master’s ...
Компания: Intellectsoft Город:, ,
Зарплата: Размещено:
ua.talent.com

Head of Analytics, Core

... customers by reducing their data-related tasks so they can ... shaping the future of data management. Devart Team is looking ... years in analytics and data 3+ years in leadership positions ... solutions Experience with data visualization tools (e.g., Tableau, ...
Компания: Devart Город:Remote, Slovakia, Ukraine
Зарплата: Размещено:
www.devart.com

Senior Data Engineer

... field.Proven experience as a Data Engineer or in a similar role. ... , include the subject line: Senior Data Engineer – [Your Full Name].Only shortlisted ...
Компания: LuxeVision Consulting LLC Город:, ,
Зарплата: Размещено:
ua.talent.com

Lead Data Engineer

... expertise. The position of Lead Data Engineer on the NIX team is ...
Компания: NIX Solutions Город:Харьков, Украина
Зарплата: Размещено:
www.nixsolutions.com

Data Engineer, Marketplace

... product launch & features. As a Data Engineer, you will play a key role in building and maintaining scalable data infrastructure at Lyft. You will ...
Компания: Lyft Город:, Kyiv,
Зарплата: Размещено:
ua.talent.com

D&IT BUSINESS SOLUTIONS ANALYST

... you have a passion for data, a strong analytical mindset, and ... and business teams to optimize data-driven solutions and provide support ... Python or similar tools for data manipulation and automation.Knowledge of data visualization tools (Power BI) is ...
Компания: JTI Город:
Зарплата: Размещено:
jobs.jti.com

Middle/Senior Implementation Engineer (Data Engineer) (#3702)

... highly motivated Middle Senior Implementation Engineer (Data Engineer) who will become a “super ...
Компания: N-iX Город:, ,
Зарплата: Размещено:
ua.talent.com

Business Solution Architect (Data Segmentation & Activation)

... July 2025  Business Solution Architect (Data Segmentation & Activation) Purpose:   The Business ... with a passion for leveraging data to drive consumer engagement and ... solution architects to ensure smooth data exchange, system interoperability, and end- ...
Компания: JTI Город:
Зарплата: Размещено:
jobs.jti.com

Data Engineer

... a tech mindset.About the Data Solution Team As a Data Engineer, you will join our Data Solution Team, which drives our data-driven innovation. The team is ... + years of experience as a Data Engineer, with expertise in Apache Spark ...
Компания: TechBiz Global Город:, ukraine,
Зарплата: Размещено:
ua.talent.com

Lead Data Engineer

... , analysts and product owners, we engineer technology that redefines industries and shapes the way people live. About the role: As a Lead Data Engineer, become a part of a ...
Компания: Ciklum Город:, ,
Зарплата: Размещено:
ua.talent.com

Data Engineer

... IT company Andersen invites a Data Engineer to join its team for ... . Hands-on experience with data warehouses: Snowflake, Databricks, or Google ... , especially Airflow. Understanding of data modeling, data quality, and performance optimization. Comfortable ...
Компания: Andersen Город:Ukraine
Зарплата: Размещено:
people.andersenlab.com

Senior Technical Business Analyst

... our business stakeholders and the data engineering team. You will work ... as a Technical Business Analyst, Data Analyst, or similar role.Strong data modeling skills and expertise in data analysis techniques.Proficiency in tools ...
Компания: 8allocate Город:Remote
Зарплата: Размещено:
cleverstaff.net

D&IT BUSINESS SOLUTIONS ANALYST

... you have a passion for data, a strong analytical mindset, and ... and business teams to optimize data-driven solutions and provide support ... Python or similar tools for data manipulation and automation.Knowledge of data visualization tools (Power BI) is ...
Компания: JTI Город:
Зарплата: Размещено:
jobs.jti.com

D&IT BUSINESS SOLUTIONS ANALYST

... you have a passion for data, a strong analytical mindset, and ... and business teams to optimize data-driven solutions and provide support ... Python or similar tools for data manipulation and automation.Knowledge of data visualization tools (Power BI) is ...
Компания: JTI Город:
Зарплата: Размещено:
jobs.jti.com

D&IT BUSINESS SOLUTIONS ANALYST

... you have a passion for data, a strong analytical mindset, and ... and business teams to optimize data-driven solutions and provide support ... Python or similar tools for data manipulation and automation.Knowledge of data visualization tools (Power BI) is ...
Компания: JTI Город:
Зарплата: Размещено:
jobs.jti.com

AI QA Engineer

... AI QA Engineer with specialization in LLM NLP model quality assurance to ensure our language models and NLP applications meet the ... focused on testing AI, ML, or complex data-driven systems, and 2+ years in data analysis.• Strong foundation in QA ...
Компания: Київстар Город:, Kyiv,
Зарплата: Размещено:
ua.talent.com

Senior/Middle Data Scientist (Benchmarking & Alignment)

... : • 3+ years of experience in Data Science or Machine Learning, preferably with a focus on NLP.• Proven experience in machine learning ... : • Proficiency in Python and common data science and NLP libraries (pandas, NumPy, scikit-learn, ...
Компания: Київстар Город:, Kyiv,
Зарплата: Размещено:
ua.talent.com

Data Engineer (IR-385)

... with BI tools and data visualization concepts. Experience working in ... structured, semi-structured, and unstructured data, ensuring data quality, consistency, and integrity. Develop ... and analytics teams to understand data needs and translate them into ...
Компания: Intellectsoft Город:, ,
Зарплата: Размещено:
ua.talent.com

Lead Data Analyst (#3656)

... continuous improvement strategies. Provide clean data sets to end users, modeling data in a way that empowers ... understand existing internal tools and data warehouses, identify data quality and reliability improvements, and ...
Компания: N-iX Город:, ,
Зарплата: Размещено:
ua.talent.com