Start of Big Data Programming Quiz
1. What is InfiniteGraph?
- InfiniteGraph is a type of relational database management system.
- InfiniteGraph is a distributed graph database developed in Java and C++.
- InfiniteGraph is a programming language for web development.
- InfiniteGraph is a cloud storage service for documents and files.
2. What are the characteristics of OLTP?
- OLTP processes large volumes of historical data through batch jobs.
- OLTP focuses on large-scale data analysis for business intelligence.
- OLTP uses fully denormalized schema to increase complexity.
- OLTP involves transaction-oriented applications and ensures the database is up-to-date.
3. What are the components of Hadoop?
- SQL Server, PostgreSQL, and MongoDB
- Oracle, SQLite, and Elasticsearch
- Apache Spark, Apache Cassandra, and Apache HBase
- Hadoop HDFS, Hadoop MapReduce, and Hadoop YARN
4. What is the role of HDFS in Hadoop?
- HDFS is the processing unit of Hadoop, handling computations on data.
- HDFS is the resource management unit of Hadoop, scheduling job executions.
- HDFS is the storage unit of Hadoop, responsible for storing large amounts of data in a distributed manner.
- HDFS is the analysis framework of Hadoop, enabling data visualization.
5. What is MapReduce in Hadoop?
- MapReduce is a machine learning algorithm for data classification.
- MapReduce is a database system designed to manage unstructured data.
- MapReduce is the processing unit of Hadoop, responsible for processing large datasets in parallel across a cluster of nodes.
- MapReduce is a distributed file system used for storing data.
6. What is YARN in Hadoop?
- YARN is a programming language used specifically for Hadoop development and scripting.
- YARN is a data storage layer for Hadoop, focusing on storing large datasets effectively.
- YARN is the resource management unit of Hadoop, responsible for managing resources and scheduling jobs across the cluster.
- YARN is the user interface for Hadoop, providing visual tools for data analysis.
7. What is the significance of the `data storage to price ratio` in big data?
- The `data storage to price ratio` signifies the quality of data generated from big data analysis.
- The `data storage to price ratio` is relevant because companies cannot afford to own, maintain, and spend energy on large data storage unless the cost is sufficiently low, making it easier for users to access larger storage.
- The `data storage to price ratio` is important because it helps maximize the number of users accessing data simultaneously.
- The `data storage to price ratio` is significant because it determines the speed of data processing in big data environments.
8. What is an example of big data utilized in action today?
- Spreadsheet analysis
- Social media
- Manual data entry
- Email marketing
9. What is personalized marketing enabled by big data?
- Random promotional offers sent to everyone
- Generic advertising accessible to all
- Mass email distribution focusing on sales
- Targeted marketing campaigns tailored to individual preferences
10. What is the workflow for working with big data?
- The workflow for working with big data involves collecting data, storing music, and creating images.
- The workflow for working with big data consists of manual data entry, spreadsheet analysis, and presentation.
- The workflow for working with big data includes data cleaning, visualization, and reporting.
- The workflow for working with big data involves big data, better models, and higher precision.
11. What is the most compelling reason why mobile advertising is related to big data?
- Mobile advertising relies solely on text messaging, which isn`t related to big data.
- Mobile advertising uses traditional print methods for promotion, ignoring big data.
- Mobile advertising benefits from data integration with location, which requires big data.
- Mobile advertising`s main focus is on basic TV ads without big data involvement.
12. What are the three types of diverse data sources?
- Customer data, physical data, and products.
- Machine data, organizational data, and people.
- Mobile data, virtual data, and documents.
- Cloud data, network data, and animals.
13. What is an example of machine data?
- Weather station sensor output
- Excel spreadsheet document
- HTML webpage source
- JPEG image file
14. What is an example of organizational data?
- Weather data from satellites
- Stock prices from the market
- Personal emails from users
- Disease data from the Center for Disease Control
15. Which of the following summarizes the process of using data streams?
16. Where does the real value of big data often come from?
- Simply collecting large volumes of data.
- Creating multiple copies of data for safety.
- Storing data in a single location.
- Combining streams of data and analyzing them for new insights.
17. What does it mean for a device to be `smart`?
- A device is `smart` if it connects with other devices and has knowledge of the environment.
- A device is `smart` if it has a touchscreen interface.
- A device is `smart` if it uses only batteries for power.
- A device is `smart` if it runs faster than traditional devices.
18. What does the term `in situ` mean in the context of big data?
- Analyzing data remotely from different locations.
- Storing the data in a cloud environment.
- Bringing the computation to the location of the data.
- Transferring data to a central server for processing.
19. What are the essential programming concepts for data science?
- SQL, NoSQL, XML, and JSON
- HTML, CSS, JavaScript, and PHP
- Tables, charts, graphs, and images
- Variables, data types, functions, and modules
20. What is the purpose of using functions in data science programming?
- To store data in a compressed format for efficiency.
- To ensure data security against unauthorized access.
- To create graphic interfaces for better visualization.
- To encapsulate complex operations into reusable code.
21. What are the characteristics of variables in programming?
- Variables are named spaces in a computer’s memory that can hold specific values, and data types define the kind of data that can be stored in a variable.
- Variables are temporary files stored on a hard drive that do not influence execution.
- Variables are hidden components that only the compiler can access without any data types.
- Variables are fixed storage areas in memory that cannot change once set by the programmer.
22. What is object-oriented programming (OOP) in data science?
- OOP is a database management technique focused on SQL queries.
- OOP is a method of data compression for large datasets.
- OOP is a functional programming method with no data encapsulation.
- OOP is a programming paradigm using objects as building blocks.
23. What is the role of modules in data science programming?
- Modules in data science programming are collections of related functions and variables that help organize code.
- Modules are used to store only raw data without any processing.
- Modules provide a direct interface to databases without any programming.
- Modules are primarily used for hardware integration in data science tasks.
24. What is batch processing in big data?
- Batch processing involves the integration of data from multiple sources manually.
- Batch processing is only for small datasets and requires immediate results.
- Batch processing is processing data in real-time as it arrives.
- Batch processing is processing large datasets in groups or batches.
25. What is Apache Hadoop’s MapReduce?
- Apache Hadoop’s MapReduce is a web server application for file storage.
- Apache Hadoop’s MapReduce is a real-time monitoring tool for system performance.
- Apache Hadoop’s MapReduce is a graphical user interface for data entry.
- Apache Hadoop’s MapReduce is a distributed data processing model and execution environment.
26. What is HBase in Hadoop?
- HBase is a visual analytics tool for Hadoop.
- HBase is a streaming data service in Hadoop.
- HBase is a distributed, column-oriented database in Hadoop.
- HBase is a file processing system for Hadoop.
27. What is Hive in Hadoop?
- Hive is a data visualization tool for big data analysis.
- Hive is an operating system for managing Hadoop clusters.
- Hive is a web server for hosting big data applications.
- Hive is a data warehouse that manages data stored in HDFS.
28. What is Sqoop in Hadoop?
- Sqoop is a framework for running machine learning algorithms in Hadoop.
- Sqoop is a tool for transferring data between Hadoop and structured data stores.
- Sqoop is a visualization tool for big data analytics in Hadoop.
- Sqoop is a programming language designed for data processing in Hadoop.
29. What is Pig in Hadoop?
- Pig is a high-level data processing language used in Hadoop.
- Pig is a visualization tool for Hadoop datasets.
- Pig is a RDBMS for real-time data processing.
- Pig is a storage system for big data analysis.
30. What is Apache Spark in Hadoop?
- Apache Spark is a SQL database management system.
- Apache Spark is a distributed web server technology.
- Apache Spark is an open-source unified analytics engine for large-scale data processing.
- Apache Spark is a deep learning framework for neural networks.