Upgrade to remove ads
Terms in this set (55)
There are 2 subnets, one with a bastion - it will ask something along the lines about which subnet should have which protocols/ports open
a. The bastion host should be accessible through SSH - so Port 22 should be open
b. The CIDR range where the node sits, you should have it so the bastion to subnet allows all protocols) - want the nodes in the cluster to access one another as well as the bastion
What CIDR range that will accommodate all these nodes
a. The smaller the number after the / in a CIDR block range, the more number of addresses
Which compute shape has the highest total memory per node
What do you need to do to see the OEL image after you launched it from the Marketplace?
a. Install VNCViewer
b. SSH Key
c. Install Package
What MPI distribution is not supported on OCI anymore?
What notebook is used in Oracle Data Science Platform?
. Jupyter Notebooks - the company that created it is JupyterLabs
question # 7 on vets
Protocol for load balancer
What metrics that are collected by monitoring service are triggered for autoscaling?
a. CPU utilization
b. Memory utilization
What HDFS replication factor should be used for locally attached storage in HDFS?
What HDFS replication factor should be used for more cost efficient, low risk environments for block storage?
What are some best practices for Big Data Migration?
a. Object Storage
b. Data Transfer Appliance
OCI Block Volume Storage supports sharing a Block Volume among multiple compute instances in read/write or read only shareable mode. What file system should be used to allow multiple compute instances to read/write data concurrently without any data loss.
a. Parallel File Systems like Lustre, IBM Spectrum Scale(GPFS), BeeGFS, etc
b. Distributed File Systems like Gluster, OCFS2, GFS2
A file system is built using BM.Standard2.52 Compute shape for File Servers. One 25 Gbps NIC/network card is used to connect to 10 Block Volumes of 1TB each (max. 480MB/s per volume). The other 25 Gbps NIC is used for sending/receiving data to/from client nodes. File system client instances who mount the file system are provisioned using VM.Standard2.16 Compute shapes. What is the max IO theoretical throughput a client node can get?
There's another question about file systems that asks for throughput (this one has no 2050mbps option)
I chose 4800mbps since throughput = iops x block size
Load balancing algorithms
a. IP Hash
b. Round Robin
c. Least Connectivity
What Big Data Solution can run Spark based workloads
a. Oracle Data Flow (ODF)
2 BM.2.36, 36 cores per node took 4 hours. 4 nodes took 3 hours. What is the efficiency?
Efficiency = Time ratio/Inverse Core ratio
We have 2 nodes took 4 hours, and 4 nodes took 3 hours.
Thus efficiency = (4/3)/(4/2) = (4/3) * (2/4) = 8/12 = 2/3 = 0.66667 = 67%
Storage for ODF
How should you double the capacity of a 3000 core cluster
Make a copy using a configuration and instance pool
What compute shapes are best for ODF
Need CPU, go with VM
How do you store a large amount of data where most of it does not need to be accessed frequently
What do you not need to worry about with object storage?
Highly available, highly persistent (redundant)
What happens when you delete an instance pool?
When you delete an instance pool, the resources that were created by the pool are permanently deleted, including associated instances, attached boot volumes, and block volumes. (ALL)
How are instances distributed in an instance pool?
a. The instances in a pool are distributed across all fault domains in a best-effort manner based on capacity.
What are two recommended storage options on OCI for Persistent Filesystem (~3TB)?
What are two recommended storage options on OCI for Scratch Filesystem (~3TB)?
File system made with DenseIO nodes
What is TeraSort phase in Terasort?
Map, Shuffle, Reduce the source data set in smaller result set - read/process/write/IO intensive operation
What does Data Science integrate with?
ADW, Functions, object storage
What metrics do you need to keep track of in file systems?
Throughput and latency
What are the most common Big Data Workloads?
a. In memory
b. Batch processing
A job needs part CPU and part GPU, what shape do you pick?
What are cluster networking built on top of?
If streaming is embarrassingly parallel, what shape do you choose?
Block volume performance tier - 10 TB of storage, only editing changing small chunks of it, what storage do you do?
Put it in block volume and take advantage of performance tier
How to speed up tightly coupled workload?
a. cluster networking and HPC
When an instance pool scales in, in what order are instances terminated
The number of instances is balanced across ADs, then balanced across FDs. Finally, within a FD, oldest instance is terminated first
For creating a file system with higher performance block volume, what minimum block volume size would you choose for highest throughput for 1MB block size
i. 500 GB
ii. 800 GB
iii. 32 TB
iv. 1 TB
For higher performance block volume, you get max throughput with size 800 GB - 32 TB. Since the question says minimum, the answer is 800 GB for 1MB block size.
Which is not a benefit of moving big data to OCI?
a. Oracle Airflow
Note: Airflow is an open-source workload management platform. Although it can be used in OCI, it can be used on-premise as well and therefore isn't in itself an advantage of moving big data to OCI.
What are options for building a filesystem? (choose 2)
a. Block volumes
b. NVMe local storage
BM.Standard2.52 using a file system on block volume, increased the amount of nodes but still no performance increase. What could be done to speed it up?
a. Move the file system to FSS
1. Note: Consider the performance benefit of having an increasing number of nodes that have access to the same FSS filesystem. The documentation advertises FSS as a solution for high-performance shared storage.
Which Hadoop products are supported on OCI? (choose 3)
A large workload uses some GPU but mostly CPU. Which shape do you choose?
1. Note: Highest ratio of CPU to GPU, i.e. 28 OCPU : 2 GPU. BM.GPU3.8 has a ratio of 52 OCPU : 8 GPU.
You have a data transfer rate of 5 Mb/s for both read and write for an instance. You add a replication of the data that gets stored. What is the data transfer rate for read and write after this?
a. 2.5 Mb/s for read and write
b. 5 Mb/s for read and write
c. 2.5 Mb/s for read, 5 Mb/s for write
d. 5 Mb/s for read, 2.5 Mb/s for write
5 Mb/s for read, 2.5 Mb/s for write
Note: Reading once and writing twice. The question becomes, are the writes being done serially or in parallel?
Given the latency times (which were 1.something µs and 3.something µs), what should you do?
Do nothing, this is normal
Note: 1.something µs means in the same RAC, 3.something µs means in different RAC, but still in an RDMA arrangement.
Question that asked how to maximize IOPS, you have to choose combinations of x many block volumes, each with a block volume size of y and a block size of z. Note: Consider volume performance, and keep in mind that smaller block size makes for better IOPS, since IOPS = 1/(latency + (average of read seek time and write seek time))
Question that asked how to maximize throughput, you have to choose combinations of x many block volumes, each with a block volume size of y and a block size of z. Note: Consider volume performance, and keep in mind that larger block size makes for better throughput, since throughput = IOPS * block size
What does Oracle Data Science help you do?
a. Manage on premises big data workload
b. Manage big data workload on cloud
c. (Another option)
d. (Another option)
Manage big data workload on cloud
What does Oracle Data Science with? (choose multiple)
b. Oracle DB on-premise
c. Azure bucket
d. AWS S3 bucket
e. GCP bucket
b. Oracle DB on-premise
c. AWS S3 bucket
Order in which to consider optimizing performance for storage
NVMe local storage > block volume level > network level
1. Note: You get faster access to data that is closer to the CPU.
1. How does an instance pool distribute instances once an instance is deleted?
a. Terminate the instance, distribute instances across availability domains, distribute instances across fault domains
b. (Another variation of this answer)
c. (Another variation of this answer)
d. (Another variation of this answer)
Terminate the instance, distribute instances across availability domains, distribute instances across fault domains
What does Data Flow integrate with?
Which shape is available for Data Science notebook sessions?
Which storage options are supported for direct HDFS use in Hadoop?
a. DenseIO NVMe
b. Block Volume
d. Object Storage
a. DenseIO NVMe
b. Block Volume
Given a customer has RPO of 4 hours and RTO of 1 hour. What DR method would you recommend? (choose 2)
a. Use terraform and build the infrastructure
b. Have a cluster ready in another region and do near real-time replication
c. Use object storage to store data and replicate every hour
d. Replicate data in near real-time
Use terraform and build the infrastructure
Use object storage to store data and replicate every hour
OTHER SETS BY THIS CREATOR
HPC practice exam
Operations Exam Study Guide
OTHER QUIZLET SETS
Marketing Quiz 5