Internship data science "Generative AI"

Alstom

  • Ridderkerk, Zuid-Holland
  • Training
  • Voltijds
  • 1 dag geleden
Internship data science "Generative AI"AlstomAlstom is a global leader in the manufacturing, maintenance and service of rolling stock (trains, metro’s tram’s and locomotives), infrastructure, signaling and digital mobility, with a presence in 70 countries, > 85,000 employees and ~€ 9.3 Bln of yearly revenues.The Alstom service organization in the Benelux consists of 17 locations and ~450 employees. From these locations customers are provided with spare parts, technical support, maintenance, overhauls and modernization services, on their fleets and signaling installed base.The Fleet Support Center is a relatively new department within the Alstom Benelux organization. At the Fleet Support Center customers are provided with digital solutions. Currently 7 fleets in the Benelux are sending data real-time to the Alstom data lake. Diagnostic- and service order information is transmitted to a central data lake and translated to meaningful information like CBM, PDM, real-time monitoring. The Fleet Support Center team consists of a team of data science, BI and engineering competences, and is growing continuously in volume and in product offerings to the market.Internship assignmentAs a general development Alstom would like to invest in assessing the application of Large Language Models (LLM). The objective of this development is to optimize operational processes by creating impactful generative and conversational agents mainly for semantic search. Within Alstom we have many data sources available, which are used for troubleshooting, engineering of products & services and product improvement. With an end-to-end implementation of an LLM into the organization we aim to reduce the time for searching relevant documentation and information.As a pilot, we use the troubleshooting process of newly introduced trains in the market. In case of an unexpected failure, the product introduction (PI) team seeks for ample sources of information, like failure code, proposed action, technical documentation, reference cases, reference environment data, etc. to solve the problem. By feeding and fine-tuning an LLM with this base information we will be able to provide the relevant information within seconds to the PI team, rather than minutes or hours spent on searching.The scope of the pilot is as following: * Review and test feasibility of leveraging available LLMs
  • Identify data sources, create a data model, and a data flow together with the end-users
  • Develop a minimum viable conversational web interface to enable end-users
  • Fine-tune the LLM model (via in-context learning, prompt engineering, partial retraining)
  • Validate the results in practice via quantitative and qualitative methods
In a later stage it will be determined which fleet and process we take for the sample, and if we take a full train data set or a subsystem / component.OrganizationThe project team consists of the following functions:
  • Data science: this will be the student, and the Alstom data scientist. This team will be responsible for the execution of the data analysis and the development of the intelligence
  • Fleet support center officer: this team will be responsible for the validation of the model by the integration of the intelligence in the daily operations
  • Engineering: this team will be responsible for the theoretical validation of the data model
  • Fleet / PI manager: this person is the business owner and the receiver of the final product. This person will indicate requirements of the final solution
The student will be functionally accompanied by Alireza Khanshan, Alstom data scientist, and by Erik Sonneveld digital business director on the overall objective, progress and outcome. The student will be part of the fleet support center team and will be included in the daily / weekly / monthly department governance, including sprint reviews.Competences requiredThe student requires the following competencies to complete the internship:
  • Proficiency in Python.
  • Understanding of statistical methods and hypothesis testing using libraries like SciPy and Statsmodels.
  • Familiarity with machine learning algorithms and libraries such as Scikit-learn and TensorFlow.
  • Familiarity with language processing and semantic search.
  • Experience with data cleaning, preprocessing, and transformation using Pandas and NumPy.
  • Ability to create meaningful visualizations using Matplotlib, Seaborn, and Plotly.
  • Knowledge of Git for collaborative work.
  • Ability to clearly present findings and insights using Jupyter Notebooks and other relevant tools.
Are you enthusiastic? We can totally imagine! Apply via the button. For more information you are welcome to reach out to Erik Sonneveld. Reach him via 06 53 47 53 77 (WhatsApp or via calling).We're looking forward to meeting you!

Alstom