I recently worked on a big data analytics project where I collected live streaming data of around 50–60 trending topics from Twitter for long periods of time. This project heavily relied on the Twitter API for data collection, and one of the major limitations of data collection using the Twitter API was that the client machine always had to keep up with the data stream and if it could not, the stream would just disconnect, and trust me this will happen a lot especially if you’re processing a lot of data in real time! On this article, I will show you my solution to overcoming that limitation by using Kafka as a message queue.
Continue reading this article on Towards Data Science