Taxi Trip EstimatorEstimator taxi trip
Take a look at the Minneapolis cabin rate from Taxi Services Inc.
Taxicab Trajectory asked attendees to forecast where a taxi would set a client down by partially providing information about their trip, while Taxi Trip Time's destination was to forecast the duration of a trip with the same record. A summary of the first place in Taxi Trajectory can be found here.
There were 418 competitors in 345 crews competing to forecast the amount of times a taxi ride would take. The BlueTaxi crew was third in taxi trip times and seventh in taxi trajectories. Describes how their research group from five different nations got together and how they won the Taxi Trip Times contest.
BlueTaxi is a very multi-cultural company, we are Ernesto (El Salvador), Lam (Vietnam), Alessandra (Italy), Bei (China) and Yiannis (Greece). Why did you choose to take part in this contest? Following a small talk with Lam, we agreed to do it and join the BlueTaxi crew for both routes of the race.
So I took the tour for the travel forecast and Lam took the tour for the target forecast. We' ve asked Alessandra, Bei and Yoannis to join us to strengthen the five-man end group. Yoannis: The issue of forecasting the finish and driving times for a taxi trip seemed very difficult to me, so I chose to join the BlueTaxi group.
At the Technical University of Eindhoven (TU/e) I did my doctorate in the field of patterns milling for datastreams and went to the IBM Research Lab in Ireland about a year and a half ago. I am interested in the extraction of large and rapid amounts of information on large databases with application in telecoms, transport in the context of the Smart Citysject.
At: I am a statistics analyst specializing in timeseries analytics, forecast, resampling/subsampling methodologies for conditional datasets and finance economics. Masters in Computer Science from Athens University of Economics and Business. Recently I have been working with spatio-temporal information in various different ventures that focus on maintenance, storage and analytics.
In addition, I have expertise in visualizing similar types of information that help researchers understand the world. At the L3S Research Center of the University of Hannover I did my doctorate in Computer Science. I am a trained computer learner applying to Web Science, Social Media Analytics and Recommender Systems.
Have you already gained experiences or have you already gained know-how in domains that have assisted you to be successful in this contest? At the IBM research laboratory I worked on several similar research project, e.g. with GPS tracks of busses to predict the buss arrive times at stops, e.g. see our corresponding document on this subject (Flexible Falling Window for Kernel Regression Based Bus Arrrival Times Prediction) in the industry trail at ECML/PKDD 2015.
Of course, I had no particular expertise in transport system, but my experiences in the areas of mechanical education, computer sciences and analysis were invaluable to the game. My spatial time expertise proved to be very helpful in the competition," says Mriannis. At: I had some previous analysis of transport information, which I found helpful in this challenging task.
However, to be frank, I have not achieved good acceleration in a contest until now. Which pre-processing and controlled teaching techniques have you used? We find 10 closest neighbors without the distances of Euclidia for each test drive and consider the length of these journeys as a predictor. As with 10-NN, nuclear compression was used to forecast the length of attack.
If we compare a test drive with the practice drives, we only consider the last 100, 200, 300, 400, 500, 1000 metres and also the full tours. The later part of the journey is more important in some cases. The last 500 metres of the ride are very important.
There are two journeys with different departure points, but with the same arrival (Porto airport). Thus we can use trip match to predict the goal of the other trip if we can predict the goal of similar trip. Consideration was also given to context-related matches (only journeys with the same taxi ID, same day of the week, same telephone number, etc.), as we found different target allocations for these correlations.
In the Taxi-ID environment, the best results were achieved with the kernel residue analysis. During the modeling, we did not forecast the length of the entire trip, but the extra trip delay in relation to the cut-off datestamp. Due to the fact that the key figure for the analysis was FMSLE, we log-transformed the label for the scheduled completion date, that is, the protocol for the extra delay period.
Treatment of outliers: We have found that journeys with absent readings (identified at speeds 160, 140, 100 km/h) are more unpredictable, we try to restore this information via the test kit by looking at the space between the cut-off time stamp and the launch time stamp. Test-data for this contest was very small (320 instances), making it very susceptible to upgrades.
Which was your most important view of the dates? For example, some journeys with the same phone number are very frequent, journeys to the airports are also very simple to forecast. We' ve used some open sources trip schedulers to forecast the trip times, but the results, e.g. the finish forecast results from the other race title, but we noticed that the results were not very good for taxi rides.
We are told that travellers should incorporate real-time information on transport to help better predict their travelling times. With the help of call ID information you can limit the possible targets of the respective taxi. What did you do with your days in this contest? An approximate guess would be 80% for features engineering, 10% for computational modeling and 10% for computational modeling.
How long was the duration for your successful workout and forecast? While most of the functions were built on the kernel index query, which was generally very timeconsuming, we introduced a novel method of searching by index to resolve the problems of efficient operation, and the runtime was significantly shortened with the index. Regarding the modeling part, the most time-consuming stage is always the choice of parameters for the models, which we carried out with the help of cross-validation.
So what'd you take out of this contest? Ultimately, our data-driven and automated training solutions produced significantly better results than other transport data-only solutions. We' ve learned that prospecting, extracting features and doing engineers are very important. Modeling is now much simpler with many open code tools for automatic tutorial.
Have you any suggestions for those just starting out in information technology? Always be inquisitive and open your minds to what the information has to tell you. Turn datascience into your pastime! What contribution did participation in a project make to its overall sucess? In our opinion, it was the spirit of the teams that was the basic formula for our triumph.
At the end our BlueTaxi took the third place in the forecast of the taxi driving times and the seventh place in the target forecast. Without the participation as a teammate we could not reach such high placings. He loves web science, big data analysis and the strength of machine learning to help decision-making.
He focuses his research on the development of novel and smart filter techniques that use interpersonal interaction, multi-dimensional relations, and other pervasive information on the Web to reduce people's information flows. Mr. Ernesto received his doctorate from the L3S Research Center of the University of Hannover and is currently a Research Scientist in Machine Learning and Database Science at IBM Research - Ireland.
In December 2012 Mr. Lam received his doctorate at the Technical University Eindhoven in the field of patterns milling in datastreams. Immediately thereafter he entered the IBM research laboratory in Dublin Ireland and worked on various research assignments suggesting new mechanical learner and statistic modeling techniques to address real-world urban living improvement issues.
She is currently working as a research assistant at the IBM Smarter Cities Technology Centre in Dublin, Ireland. Their research interests are in the fields of signalling, in particular statistically estimating and predicting vehicle transport, distributed/cooperative estimates and municipal guidance schemes. At: At Chen got her BMath in Statistics and Actuarial Sciences (with Dean's Honour's List and President's Award), MMath and PhD in Statistics (with Outstanding Achievement in Graduate Studies Award) from the University of Waterloo, Canada.
Mr. Yiannis: Mr. Gkoufas has a Master's degree in Computer Science from Athens University of Economics and Business. Browse other Taxi Trip Time & Trajectory contests blog entries by klicking on the tag below.