AWS IoT can collect and handle large quantities of data coming from a variety of sources https://aws.amazon.com/iot/ and makes it easy to use AWS services like AWS Lambda, Amazon Kinesis, Amazon S3, Amazon Machine Learning, and Amazon DynamoDB.
AWS DataSync https://aws.amazon.com/datasync/ is a data transfer service that makes it easy for you to automate moving data between on-premises storage and Amazon S3 or Amazon Elastic File System (Amazon EFS).
Amazon FSx for Lustre https://aws.amazon.com/fsx/lustre/ provides a high-performance file system optimized for fast processing of workloads such as machine learning, high performance computing (HPC), video processing, financial modeling, and electronic design automation (EDA).
Apache Spark Streaming enables high-throughput, fault-tolerant, and scalable processing of live data streams. It divides the incoming data streams into batches before sending them to the Spark engine for processing. http://spark.apache.org/streaming/
Amazon Managed Streaming for Kafka (MSK) https://aws.amazon.com/msk/ is a fully managed service that makes it easy for you to build and run applications that use Apache Kafka to process streaming data.
Data Lake Concepts and Building a Serverless Data Lake