Before we start, please not that if you want to see a table of contents for all the sections of this blog and their various Purview topics, you can locate the in the following link:
This document is not meant to replace any official documentation, including those found at docs.microsoft.com. Those documents are continually updated and maintained by Microsoft Corporation. If there is a discrepancy between this document and what you find in the Compliance User Interface (UI) or inside of a reference in docs.microsoft.com, you should always defer to that official documentation and contact your Microsoft Account team as needed. Links to the docs.microsoft.com data will be referenced both in the document steps as well as in the appendix.
All of the following steps should be done with test data, and where possible, testing should be performed in a test environment. Testing should never be performed against production data.
The Information Protection section of this blog series is aimed at Security and Compliance officers who need to perform data classification using trainable classifiers.
This document is meant to guide an administrator who is “net new” to Microsoft E5 Compliance through.
We will be creating a net new trainable classifier
This document does not cover any other aspect of Microsoft E5 Compliance, including:
It is presumed that you have a pre-existing of understanding of what Microsoft E5 Compliance does and how to navigate the User Interface (UI).
You have files in your data that are run-off of a template or standard format. Examples would be a contracts or resumes. This would be different than an Exact Data Match or Sensitive Information Type that can run off keywords, keyword dictionaries, regexes, or functions.
At the time of the writing of this document, you can only select one item at a time for training of your classifier
For your initial seeding, you’ll need at least 50 files but no more than 500 files
To fine tune your classifier, you will need least 200 files (on top of the initial files). I recommend you start with 200 to start and then add more files later on to better train your classifier during second and third passes of training.
c. I used this one from the MSFT App store
Now we will look at the Overview pane.
Once the Classifier is trained, you can use Content Explorer to find the if data matches the classifier withi your tenant. I recommend you find some extra data (outside the 250 files listed above) and place them in a SharePoint site or OneDrive folder. Then wait up to 14 days to have the classifier find the data based on the indexing engine that is running in the background of the tenant.
Co-author note – Special Thinks to Joseph Ortiz, Microsoft Purview Technical Specialist, for his insights around the BBC article and workflow and his suggestion to use the negative and positive terms to more easily identify training documents for the Trainable Classifier
Note: This solution is a sample and may be used with Microsoft Compliance tools for dissemination of reference information only. This solution is not intended or made available for use as a replacement for professional and individualized technical advice from Microsoft or a Microsoft certified partner when it comes to the implementation of a compliance and/or advanced eDiscovery solution and no license or right is granted by Microsoft to use this solution for such purposes. This solution is not designed or intended to be a substitute for professional technical advice from Microsoft or a Microsoft certified partner when it comes to the design or implementation of a compliance and/or advanced eDiscovery solution and should not be used as such. Customer bears the sole risk and responsibility for any use. Microsoft does not warrant that the solution or any materials provided in connection therewith will be sufficient for any business purposes or meet the business requirements of any person or organization.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.