Salesforce Data Architecture and Management Designer
Terms in this set (119)
What is Multitenancy?
Multitenancy is a means of providing a single application to multiple organizations from a single hardware/software stack.
What does Salesforce do when providing its CRM to a new customer?
Instead of providing a complete set of hardware/software resources to an organization, Salesforce inserts a layer of software between the single instance and other customer deployments. This layer is invisible to the organizations, which only see their own data and schemas while Salesforce reorganizes the data behind the scenes to perform efficient operations.
How does Salesforce ensure that tenant-specific customizations do not breach the security of other tenants or affect their performance?
Salesforce uses a runtime engine that generates application components for each organization using the customer's metadata.
How does Salesforce store the application data for each organization?
In a few large database tables that are partitioned by tenant and serve as heap storage. The platform's runtime engine then materializes virtual tables based on the customer's metadata.
What are two side-effects of the way that Salesforce stores customer application data?
1.) Traditional performance-tuning techniques will show little to no results.
2.) You cannot optimize the underlying SQL of the application because it is generated by the system, and not written by each tenant.
How long might it take before the text in a searchable object's created or updated record is searchable?
15 minutes or more
In what order does Salesforce perform indexed searches?
1.) Searches the indexes for appropriate records
2.) Narrows down the results by access permissions, search limits, and other filters, creating a result set
3.) Once a result set reaches a predetermined size, all other records are discarded
4.) Finally, the result set is used to query the records from the database to retrieve the fields that a user sees
What are the main areas of the application that are impacted by differing (or suboptimal) architectures in implementations with large data volumes?
1.) The loading or updating of large numbers of records, either directly or with integrations.
2.) Extracting records using reports, list views, or queries.
What is the Force.com Query Optimizer?
The Force.com Query Optimizer works behind the scenes to determine the best path to the data being requested based on the filters in the query. It will determine the best index from which to drive the query, the best table from which to drive the query if no good index is available, and more.
What is PK Chunking?
PK Chunking (or Primary Key Chunking) is a strategy for querying large data sets. PK Chunking is a feature of the Bulk API that splits a query into chunks of records with sequential record Ids (i.e. the Primary Keys). Ids are always indexed, so this is an efficient method for querying large data sets.
When would you use PK Chunking?
When you need to query or extract 10s or 100s of millions or records, for example, when you need to initially query an entire data set to setup a replicated database, or if you need to query a set of data as part of an archival strategy where the record count could be in the millions.
What is the default size for a PK Chunk?
What is the maximum size for a PK Chunk?
What is the most efficient chunk size (recommended) for an organization?
It depends on a number of factors, such as the data volume, and the filters of the query. Customers may need to experiment with different chunk sizes to determine what is most optimal for their implementation. 100,000 records is the default size, with chunks able to reach 250,000 records per chunk, but the increase in records per chunk means less efficiency in performance.
What is the format of the header to include in the Bulk API to enable PK Chunking?
When using PK Chunking, how would you specify the chunk size in the header?
What is a best practice for querying a supported object's share table using PK Chunking?
Determining the chunks is more efficient in this case if the boundaries are defined on the parent object record Ids rather than the share table record Ids. So, for example, the following header could be used for a Bulk API query against the OpportunityShare object table using PK Chunking:
Sforce-Enable-PKChunking: chunkSize=150000; parent=Opportunity
When loading data with parent references into Salesforce, what is more efficient? Using an External Id or a Salesforce Id?
Using an External Id has additional overhead in that it performs a kind of "lookup" to find the record, whereas this additional overhead does not occur (or is bypassed) when using the native Salesforce Id.
What is better? Performing Upserts, or performing Inserts followed by Updates?
Upserts are more costly than performing Inserts and Updates separately. Avoid Upserts when it comes to large data volumes.
What are some other best practices for optimizing your data load performance?
1.) Introduce bypass logic for triggers, validation rules, workflow rules (but not at the cost of data integrity)
2.) Defer Sharing Calculations
3.) Minimize the number of fields loaded for each record. Foreign key, lookup relationships, and roll up summary fields are likely to increase processing times.
4.) Minimize the number of triggers where possible. Also, where possible, convert complex trigger code to Batch Apex that processes asynchronously after data is loaded
What are Skinny Tables?
Skinny tables are tables created by Salesforce that contain frequently used fields in order to avoid joins and increase performance when running reports and queries.
Why would Skinny Tables be needed?
Behind the scenes, for each object, Salesforce maintains separate tables for standard fields and custom fields. Normally, when a query or report contains both types of fields, a join would be needed between these two behind-the-scenes tables. A Skinny Table, which could contain standard and custom fields for an object, would eliminate the need for those joins.
For what objects are Skinny Tables available?
Account, Contact, Opportunity, Lead, Cases, and Custom Objects.
True or False: Picklist fields are available on Skinny Tables.
True or False: Lookup fields are available on Skinny Tables.
True or False: Formula fields are available on Skinny Tables.
True or False: Text Area (Long) fields are available on Skinny Tables.
How do you create a Skinny Table for an object?
Contact Salesforce Support.
How many columns can a Skinny Table contain?
True or False: Skinny Tables cannot contain fields from other objects.
Describe considerations with respect to Skinny Tables and Sandboxes.
Skinny Tables are copied to Full Copy Sandboxes, but not for other Sandboxes. If needed in other Sandboxes, contain Salesforce Support.
For what fields does Salesforce automatically maintain indexes?
7.) Foreign Keys (Lookups and Master-Detail fields)
8.) Email (Leads and Contacts)
Which data types cannot be indexed?
1.) Text Area (Long)
2.) Text Area (Rich)
3.) Multi-Select Picklist
4.) Non-Deterministic Formulas
5.) Encrypted Text
What custom field type is automatically indexed when created?
What data types can be External Ids?
Describe indexes and tables.
The Salesforce architecture makes the underlying data tables for custom fields unsuitable for indexing. Therefore, Salesforce creates an Index Table that contains a copy of the data, along with information about the data types.
By default, Index Tables do not include records that are null (with empty values), however you can work with Salesforce to include these if needed.
The Force.com Query Optimizer will use an index on a standard field if the filter:
Matches less than 30% of the first million records and less than 15% of the remaining records, up to a maximum of 1 million records.
The Force.com Query Optimizer will use an index on a custom field if the filter:
Matches less than 10% of the total number of records for the object, up to a maximum of 333,333 records.
What should you always do to prepare for a data load?
Test in a Sandbox environment first.
This is enabled for the Bulk API by default.
Describe Parallel Mode within the Bulk API
It is enabled by default. It allows for faster loading of data by processing batches in parallel.
What are the trade-offs with respect to Parallel Mode?
There is risk of lock contention. Serial mode is an alternative to Parallel mode in order to avoid lock contentions.
When should you use Parallel Mode versus Serial Mode?
Whenever possible, as it is a best practice.
When should you use Serial Mode versus Parallel Mode?
When there is risk of lock contention and you cannot reorganize the batches to avoid these locks.
How can you organize data load batches to avoid risks of lock contention?
By organizing the data by parent Id.
Suppose that you are inserting AccountTeamMember records and you have references to the same Account Id within multiple batches. You risk lock timeouts as these multiple batches process (for example, in parallel) and attempt to lock the Account record at once. To avoid these lock contentions, organize your data by Account Id such that all AccountTeamMember records referencing the same Account Id are in the same batch.
What does the Bulk API do when it encounters locks?
1.) Waits a few seconds for the lock to be released.
2.) If lock is not released, record is marked as failed.
3.) If there are problems acquiring locks for more than 100 records in the batch, the remainder of the batch is put back in the queue and will be tried again later.
4.) When a batch is reprocessed, records that are marked as failed will not be retried. Resubmit these in a separate batch to have them processed.
5.) The batch will be tried again up to 10 times before the batch is marked as failed.
6.) As some records may have succeeded, you should check the results of the data load to confirm success/error details.
What operations are likely to cause lock contention, and as a result, require data loads to be run in Serial Mode via the Bulk API?
1.) Creating New Users
2.) Updating User Roles
3.) Updating Territories
4.) Changing ownership for records with a Private sharing model
With respect to data loads, any batch job that takes longer than this amount of time is suspended and returned to the queue for later processing.
With respect to data loads, how can you optimize batch sizes?
All batches should run in under 10 minutes. Start with 5000 records per batch and adjust accordingly based on the processing time. If processing time is more than 5 minutes, reduce the batch size. If it takes only a few seconds, increase the batch size. And so on. If you get a timeout error, split your batches into smaller batches.
When loading data via batches, if more than N unprocessed requests/batches from a single organization are in the queue, additional batches from that organization will be delayed while batches from other organizations are processed.
List several key attributes with respect to determining Data Quality.
(when were the records last updated?)
(i.e. make a list of the fields that are required for each business use. Then, run a report that shows the percentage of blanks for these fields.
(appExchange apps for data quality can be used to help determine accuracy against a trusted source).
(Reports can show the variations for each true value for a given field).
What is the Data.com Assessment App?
The Data.com Assessment App helps customers understand the overall health of their data.The app can be used to analyze Account, Contact, and Lead records in order to gain insights on data completeness and quality.
What are the Data Quality Analysis Dashboards?
Provided for free by Salesforce Labs via the AppExchange, Data Quality Analysis Dashboards leverage custom formula fields on many standard objects to record data quality and completeness. The formulas are then depicted via the dashboards to identify deficiencies in the data.
What is the first step to improving data quality?
Develop a Data Management Plan, which typically includes standards for creating, processing, and maintaining data.
List some standards that are typically included in a Data Management Plan?
5.) Roles and Ownership
6.) Security and Permissions
What features of Salesforce can be used to enforce Data Quality?
1.) Required fields
2.) Validation rules
3.) Workflow rules
4.) Page layouts
5.) Simple dashboards
6.) Data enrichment tools (Data.com)
7.) Duplicate Management
8.) Custom field types
9.) State and Country Picklists
For how long is Field History data retained?
With respect to Field History Tracking, what happens to fields with more than 255 characters?
Their changes are tracked as edited, but the old/new values are not recorded.
True or false: Tracked fields are automatically translated.
If a trigger causes a change on an object that the current user does not have access to edit...
... the change is not tracked because field history honors the permissions of the current user.
Duplicate Management uses Data.com technology. Because of this, is a Data.com license required?
With Duplicate Management, what is a Matching Rule?
A rule that determines how duplicates are identified.
With Duplicate Management, what is a Duplicate Rule?
A rule that determines the behavior that occurs when a record being saved has been identified as a possible duplicate.
Standard duplicate rules for these objects are set up and activated by default.
Accounts, Contacts, and Leads
How do you enable Duplicate Management for Person Accounts?
First enable Person Accounts, then define and activate the Matching and Duplicate Rules for Person Accounts.
When defining a Duplicate Rule, what options are available with respect to Record-Level Security?
1.) Enforce Sharing Rules
2.) Bypass Sharing Rules
True or False: The action you select for your Duplicate Rule applies to both Create and Edit actions.
False. You can choose to take a different action for Creates versus Edits.
What actions are available for Duplicate Rules?
When the Allow action is selected, you can choose to also enable...
1.) An Alert
2.) Reporting (which includes the record created/edited and all its potential duplicates).
When the Block action is selected, this is enabled by default.
Duplicate Rules can be associated to N Matching Rules.
3, and they must be of different objects.
How many Duplicate Rules can exist per object?
True or False: You can compare matches across objects (ex. Compare Contacts to Leads).
What must happen when you match across objects?
Establish field mapping between the two objects.
For which objects can Duplicate Management be established?
1.) Person Accounts
2.) Business Accounts
5.) Custom Objects
Will Duplicate Rules run for all users or all records?
It depends on whether or not optional "Conditions" have been defined for the Duplicate Rule.
When might Duplicate Rules NOT run?
1.) When records are created via the Quick Create
2.) On Lead Convert in an org where the "Use Apex Lead Convert" is disabled
3.) Records are restored using the Undelete button
4.) Records are added via Lightning Sync
5.) Records are manually merged
6.) Records are created via the Community Self-Registration
7.) A Self-Service user creates a record and the Duplicate Rule contains conditions based on the User object
8.) Duplicate Rule Conditions are set for lookup fields and records with no value for these fields are saved.
What is a compound WHERE clause condition?
i.e. WHERE x AND y
What does the Force.com Query Optimizer do when a query has a compound WHERE clause?
It considers the selectivity of the single-column indexes alone, as well as the intersected selectivity that results from joining two single-column indexes.
List some differences with the Force.com Query Optimizer compared to traditional relational database optimizers.
- Because Salesforce is a multitenant environment, Salesforce keeps tenant-specific statistics to provide insight into each tenant's data distribution.
Composite Index Joins
- The Force.com Query Optimizer considers the selectivity of the single-column indexes alone, as well as the intersected selectivity that results from joining two single-column indexes.
- The Force.com Query Optimizer considers the selectivity of sharing filters alongside traditional filters (i.e. the WHERE clauses) to determine the lowest cost plan for query execution.
If an index is not available for a field in a filter condition...
The only alternative is to scan the entire table/object, even when the filter condition uses a optimizable operator with a selective value.
The platform automatically recalculates optimizer statistics in the background when...
Your data set changes by 25% or more.
What should you do if you change a little less than 25% of your data set for a LDV object and you notice slower query or report performance?
Submit a case to Salesforce Premier Support to see if a manual statistics recalculation for select objects in your org can return operation to peak performance.
What is the Query Plan Tool, and where is it located?
The Query Plan Tool is a tool to help optimize and speed up queries over large volumes. The Query Plan Tool can be found/enabled within the Developer Console.
For the Query Plan Tool, what is the Cardinality?
The approximate # of records returned by the plan.
For the Query Plan Tool, what is the Leading Operation Type?
The primary operation type that Salesforce will use to optimize the query.
For the Query Plan Tool, what is the Cost?
The cost of the query compared to the Force.com Query Optimizer's selectivity threshold.
For the Query Plan Tool, when the Cost is above 1, it means that...
The query will not be selective.
For the Query Plan Tool, what are four Leading Operation Types?
3.) Table Scan
4.) Other (optimizations internal to Salesforce)
For the Query Plan Tool, what is sObject Cardinality?
The estimated total size/volume/rows of the sObject table
For the Query Plan Tool, what is sObject Type?
The sObject (i.e. Account)
What is Lookup Skew?
When a very large number of records point to the same record in the lookup object.
Why is Lookup Skew bad?
Lookups are foreign key relationships between objects. When a record is inserted or updated, Salesforce locks the target records in each lookup field to ensure that data integrity is maintained. Locks can occur when you try to insert or update records in a LDV environment where lookup skew exists.
What are some techniques for dealing with problems related to Lookup Skew?
1.) Reduce record save time (i.e. increase save performance, optimize trigger/class code, reduce workflow, consider asynchronous operations, etc.)
2.) Consider a Picklist field instead of a Lookup field
3.) Distribute the skew
4.) Reduce the load (i.e. from automated processes and integrations running concurrently)
With respect to Lookup Skew, what is an alternative to having a "catch-all" lookup value?
Leave the value blank, which will reduce/eliminate the skew.
When should you use a Picklist field instead of a Lookup field?
When you have a relatively low number of values.
What is Index skew?
Essentially similar to lookup skew, when a large number of records point to the same index.
What is a symptom of Index Skew?
Index row lock (when two updates occur at the same time and the index, which needs to be rebuilt, is large).
What are some exceptions to Index Selectivity that will result in an efficient index not being used (hint: non-optimized operators)?
1.) Negative filter operators (i.e. !=, NOT LIKE, EXCLUDES)
2.) Comparison operators paired with text fields
3.) Leading % wildcards
4.) References to non-deterministic formula fields (i.e. cross-object formula fields)
True or False: Deleted records cannot impact query performance.
False. Add isDeleted = False to your queries, or empty your recycle bin!
When should you add a Custom Index?
When queries regularly filter on a field with selective values, either alone or in conjunction with other fields.
1.) Updated whenever a user Creates or Updates a record.
2.) Can be imported with any back-dated value if your business requires preserving the original timestamps when migrating data into Salesforce.
3.) NOT indexed
1.) Always READ ONLY.
2.) Updated whenever a user Creates or Updates a record, AS WELL AS whenever an automated system process updates the record.
Is this possible?
LastModifiedDate <= SystemModStamp
Is this possible?
LastModifiedDate > SystemModStamp
How can LastModifiedDate affect SOQL performance?
LastModifiedDate is NOT indexed. SOQL will intelligently try to use an index on SystemModStamp when LastModifiedDate is included in the WHERE clause of a SOQL query. However, the Force.com Query Optimizer cannot use the index if the SOQL query uses LastModifiedDate to determine the upper boundary of a date range because SystemModStamp can be a greater (i.e. more recent) date than LastModifiedDate. This is to avoid missing records that fall in between the two timestamps.
True or False: SOQL can use the index on SystemModStamp for this query.
Select Id, Name from Account where LastModifiedDate > 2014-11-08T00:00:00Z
True or False: SOQL can use the index on SystemModStamp for this query.
Select Id, Name from Account where LastModifiedDate = CustomDate__c
True or False: SOQL can use the index on SystemModStamp for this query.
Select Id, Name from Account where LastModifiedDate < CutoffDate__c
Which is a best practice: using LastModifiedDate or SystemModStamp to filter your SOQL queries?
What options exist to optimize performance for LastModifiedDate if your business requirements do not allow you to use SystemModStamp (or if SystemModStamp is not available for the object you are querying)?
Use a custom date field
- Create a custom date field and use a workflow or other mechanism to populate this field with the value of LastModifiedDate. Then contact Salesforce to have a custom index placed on the custom date field.
Use a skinny table
- If your query or report performance over large data volumes is sluggish, consider a skinny table. If LastModifiedDate is added as a column, it can be indexed on a skinny table.
Filter on LastActivityDate
- If your business requirement is to pull up Account or Contact records related to activities, and if you plan on using a Skinny Table, contact Salesforce to request an index on LastActivityDate on the Skinny Table
Use the Data Replication API
- Use getUpdated() to retrieve updated records. Under the hood, the API uses SystemModStamp to determine the matching records, ad if it doesn't exist, will automatically use LastModifiedDate or CreatedDate.
List 2 reasons why SOQL queries that filter using a formula field can result in slow performance.
1.) Formula fields are not indexed by default (and therefore require full table scans if they are the primary operation type chosen by the query optimizer)
2.) Formula field values are calculated on the fly (actual values are not stored in the database for these fields)
Which types of formula fields can be indexed?
Deterministic Formula Fields
List some reasons why a Formula field may be non-deterministic.
1.) It has references to other objects (either directly or via referencing another formula field that references another object).
2.) It uses dynamic functions like TODAY()
3.) It has references to fields that Salesforce cannot index
4.) Standard fields with special functionalities (such as IsClosed on Opportunities and Cases, or Status on Leads, or Subject on Activities
5.) References to Owner, AutoNumber, Audit fields, or Divisions
True or False: By default, field indexes will include NULLs.
True or False: Field indexes can never include NULL rows.
False. You can work with Salesforce Support to have indexes updated to include NULL rows.
Which field types cannot be setup such that their indexes include NULL rows?
1.) Picklist fields
2.) Lookup fields
3.) External Ids
Instead, contact Salesforce Support for assistance with creating a two-column (compound) index.
Describe the SOAP API and when to use it.
The SOAP API is optimized for real-time (synchronous) client applications/transactions that update a few records at a time. When the data sets contain hundreds of thousands of records, the SOAP API is less practical.
Describe the Bulk API
The Bulk API is based on REST principles (asynchronous) and is optimized for loading/deleting large volumes of data in batches in the background.