
Internship data science "pattern recognition"
- Ridderkerk, Zuid-Holland
- Training
- Voltijds
85,000 employees and ~€ 9.3 Bln of yearly revenues.The Alstom service organization in the Benelux consists of 17 locations and ~450 employees. From these locations customers are provided with spare parts, technical support, maintenance, overhauls and modernization services, on their fleets and signaling installed base.The Fleet Support Center is a relatively new department within the Alstom Benelux organization. At the Fleet Support Center customers are provided with digital solutions. Currently 7 fleets in the Benelux are sending data real-time to the Alstom data lake. Diagnostic- and service order information is transmitted to a central data lake and translated to meaningful information like CBM, PDM, real-time monitoring. The Fleet Support Center team consists of a team of data science, BI and engineering competences, and is growing continuously in volume and in product offerings to the market.Internship assignmentAs a general development Alstom would like to invest in event pattern recognition. Objective of this development is to detect train failures by diagnostic train events at point P in the health degradation process (see picture 1), to schedule intervention on time and before actual failure of the train or subsystem.Picture 1: health degradation processThe process is as follows: * A data set with intervention on trains is available. This consists of details like; the date of intervention, subsystem / component affected, train failure details.
- A data set with all events measured on the train is available. This consists of details like: timestamp, subsystem affected, context information, location.
- Interventions and main train event need to be linked together, as the timestamp of intervention is per definition different than the timestamp of the event.
- After the identification of the main event, the analysis of the relationship between events and operating data can be executed. Data analysis should point out which data point, or combination of data points, are confident predictors of the failure of the train or subsystem at point P during the health degradation process.
- The outcome of step 4 can be implemented in the real-time monitor of the related customer, to validate the intelligence into practice
- Data science: this will be the student, and the Alstom data scientist. This team will be responsible for the execution of the data analysis and the development of the intelligence
- Fleet support center officer: this team will be responsible for the validation of the model by the integration of the intelligence in the daily operations
- Engineering: this team will be responsible for the theoretical validation of the data model
- Fleet / PI manager: this person is the business owner and the receiver of the final product. This person will indicate requirements of the final solution
- Proficiency in Python.
- Understanding of statistical methods and hypothesis testing using libraries like SciPy and Statsmodels.
- Familiarity with machine learning algorithms for time series data using libraries such as Scikit-learn and TensorFlow/Keras.
- Experience with data cleaning, preprocessing, and transformation using Pandas and NumPy.
- Ability to create meaningful visualizations using Matplotlib, Seaborn, and Plotly.
- Basic understanding of electrical devices and their failure modes.
- Knowledge of Git for collaborative work.
- Ability to clearly present findings and insights using Jupyter Notebooks and other relevant tools.