———————–
The source data used includes data from the ITS National Transportation Information Center (for predicted speeds) and the Korea Meteorological Administration (observed temperature and humidity, observed snowfall and precipitation, and observed wind direction and speed APIs). Additionally, we utilized the Gyeonggi-do Traffic Information Center among numerous regional traffic information centers to establish actual traffic speed and combined data based on approximately 670,000 entries for preprocessing.
Based on data from 2021 to 2022, we combined training and test datasets, resulting in a total of 4,800 entries, with 1,600 each for clear, snowy, and rainy conditions.
In the data analysis and standardization layer, we conducted data analysis and standardization based on three criteria: CDA, EDA, and standardization. We used the training data to extract test data for Random Forest, LGBM, and K-NN models to create models and perform parameter tuning, leading to predictions and performance verification.
The validation metrics used were MSE (Mean Squared Error) and MAPE (Mean Absolute Percentage Error), with the comparison focused on the error range of the predicted speeds supported by ITS between the RF & LGBM models versus the K-NN model. The output included prediction time and prediction intervals, with the relevant data visualized on a map.
————————————
The source data composed of ITS and Korea Meteorological Administration data goes through a “collection and refinement process” involving information gathering and extract, transform, load (ETL) management, and is then loaded into Flume for predictive data. Subsequently, the data is stored in a distributed file system in Parquet format, which is processed through the Hive system within the Hadoop ecosystem for AI evaluation and predictions.
In the meantime, the collected and refined data is stored in PostgreSQL, enabling the web data refinement process for unexpected situations. The Elasticsearch system, with coordinate-based addresses, facilitates the transfer of various predictive results to predictive and learning services. As this diverse data and services move towards final visualization, they converge on Tomcat and Nginx servers, ultimately achieving the visualization service (traffic status, location-based traffic condition refinement process) as the final output.
———————————
The service architecture for the AI Traffic Big Data Platform consists of three process services from the perspectives of software and hardware: Hadoop ecosystem, Data Warehouse (DW), and Kubernetes (k8s).
The Hadoop ecosystem utilizes Hive, HBase, Spark, and Zookeeper, supported by hardware specifications of Dell PowerEdge R740, which includes 32 cores, 128GB of CPU, and a storage capacity of 32TB. It is equipped with Proxmox VE version 7.3 for OS virtualization, and the VM virtualization includes Hive, HDFS, YARN, and Java 11.
The DW server specifications are based on a standard Hadoop ecosystem ThinkSystem SR650 server with 32 cores, 64GB of CPU, and a hardware capacity of 9.6TB + 1.2TB, with five units installed. Virtualization is also encouraged to utilize Proxmox VE 7.3. The setup includes three servers for PostgreSQL (for unexpected situation web data processing) and Elasticsearch (for coordinate and road name address processing), one server for the Java ecosystem (for front/back-end processing: Tomcat, Nginx, eGovFrame services), one server for data loading and processing (Apache NiFi/Python Agent/Kafka/Apache Flume), and one server for Kibana/Prometheus monitoring.
For Kubernetes processing, four servers were established in a clustered configuration for web/WAS, monitoring, collection/preprocessing, and GPU nodes. The three GPU nodes are equipped with CUDA, cuDNN, Anaconda, Airflow, and TensorFlow.
Curriculum
- 10 Sections
- 66 Lessons
- Lifetime
- 사업관리6
- 인프라구축관리9
- 서비스 기획관리2
- 데이터 관리11
- 탐색적 분석1
- AI 알고리즘0
- 시스템 개발4
- 시스템 테스트7
- 교육 인수 인계 관리3
- 산출물23
- 10.1[분석] 기능 차트
- 10.2[분석] 요구사항정의서
- 10.3[분석] 이슈 및 리스트 관리
- 10.4[분석] 프로세스 정의서
- 10.5[분석] 인터페이스 정의서
- 10.6[설계] ERD
- 10.7[설계] 개발 표준 정의서
- 10.8[설계] 단위 테스트 시나리오
- 10.9[설계] 시스템 이행계획서
- 10.10[설계] 테이블 목록서
- 10.11[설계] 테이블 정의서
- 10.12[설계] 프로그램(소프트웨어) 목록서
- 10.13[설계] 화면 설계서
- 10.14[개발] 결함 보고서
- 10.15[개발] 소스 코드 (개발 원시 코드)
- 10.16[개발] 통합 테스트 결과서
- 10.17[구현] 개발산출물 검사리스트
- 10.18[구현] 교육 명세서
- 10.19[구현] 사용자 메뉴얼
- 10.20[구현] 릴리즈 이행 결과서
- 10.21[구현] 운영자 메뉴얼
- 10.22[구현] 프로젝트 완료보고서
- 10.23[회의록] 프로젝트 구현 회의