Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to content
v0.10.0
Compare
Choose a tag to compare
@mhamilton723 mhamilton723 released this
· 67 commits to master since this release
e9986fe

SynapseML

Building production ready distributed machine learning pipelines can be a challenge for even the most seasoned researcher or engineer. We are excited to announce the release of SynapseML v0.10.0 (Previously MMLSpark), an open-source library that aims to simplify the creation of massively scalable machine learning pipelines. SynapseML unifies several existing ML Frameworks and new MSFT algorithms in a single, scalable API that’s usable across Python, R, Scala, Java, .NET, C#, and F#.

Highlights

OpenAI Language Models .NET, C#, and F# Support Full MLFlow Support Live Demos in Browser
Embed 175-billion parameter models into your databases with ease Use or train any SynapseML model from .NET Quick and easy MLOps, model management, and autologging Explore the SynapseML library with zero setup
Learn More Getting Started Guide Explore the Docs Run in Browser

New Features

General

Azure Cognitive Services for Big Data 🧠

Responsible AI at Scale 😇

  • Added partial dependence plots (PDP) to allow for understanding how independent variables affect a model's prediction (#1426)
  • Updated ICE/PDP documentation with PDP-based feature importance and additional examples (#1441, #1352)
  • Added a notebook for ICE and PDP feature explainers (#1318)
  • Updated data balance documentation to better describe how it can be used to ensure model fairness (#1540)

MLFlow 🔃

LightGBM on Spark 🌳

  • Added the ability to pass in generic argument strings to LightGBM enabling many complex parameterizations (#1444)
  • Added seed parameters to LightGBM (#1387)
  • Added a method to get LightGBM native model string directly (#1515)
  • Fixed issue with validation data creation during useSingleDataset mode (#1527)
  • Fixed multiclass training with initial scores (#1526)
  • Fixed saving LightGBM model iterations with early stopping (#1497)
  • Fixed issue where chunk size parameter was incorrectly specified during data copy (#1490)
  • Fixed issue where when empty partition is chosen as the main worker in singleDatasetMode (#1458)
  • Fixed bug with data repartitioning in LightGBMRanker (#1368)
  • Fixed outdated docs for useSingleDatasetMode (#1562)
  • Refactored LightGBM class structure to improve logging and debugging (#1557)

Vowpal Wabbit 🐇

  • Fixed issues with the saveNativeModel for the VWRegressionModel #1364 (#1366)
  • Fixed issues with building quadratic interaction terms (#1460)

Isolation Forests 🌲

Additional Updates

Maintenance 🔧

  • Removed unused debugging code (#1546)
  • Remove Synapse test exclusion for Explanation Dashboard notebook (#1531)
  • Made python style checks verbose (#1532)
  • Fixed library checking while installing library on Databricks cluster (#1488)
  • Upgraded and fix Dockerfiles (#1472)
  • Added Developer Docker Image build to pipeline (#1480)
  • Fixed ADO area path in Issue Linker (#1464)
  • Fix master version badge display
  • Improved Databricks error reporting
  • Updated azure cli to stop build errors
  • Fixed SSL handshake flakiness
  • Added itsdangerous as a dependency to ADB tests (#1412)
  • Turned on debug for pr to work item workflow
  • Pointed pr linker to official implementation
  • Changed GitHub action trigger from pull_request_target to pull_request (#1413)
  • Fixed issue where Unit Tests were not executing (#1409)
  • Added Azure DevOps PR linker (#1394)
  • Updated GH PAT name (#1389)
  • Re-enable Synapse E2E Tests (#1517)
  • Updated SynapseE2E Tests to Spark 3.2 (#1362)
  • Fix ADO issue/pr linking (#1463)
  • Cleaned up extra MVAD models and improved network resiliency (#1457)
  • Updated azure blob client version (#1563)
  • Fixed docker security vulnerability (#1561)
  • Streamlined scalastyle hook (#1530)
  • Updated CODEOWNERS (#1523)
  • Updated OpenAI resource info (#1525)
  • Fixed semantic PR checking (#1503)
  • Updated docker images to remain compliant (#1500)
  • Added component governance explicitly to build so timeout variable works (#1489)
  • Fixed path for notebook test files in gitignore (#1485)
  • Increased component governance timeout (#1482)
  • Added conda caching to build
  • Stopped build from failing after 1 hour
  • Fixed flaking MVAD test
  • Refactored build pipeline definitions
  • Split Synapse tests into multiple test (#1377)
  • Moved from ADO Pipelines to GitHub Workflows (#1406)

Website Improvements 💻

  • Fixed MathJax expressions rendering (#1343)
  • Fixed google analytics gtags (#1434)
  • Corrected placement of BingSiteAuth.xml config (#1445, #1439)
  • Fixed website security and upgrade docusaurus (#1545)
  • Moveed Geospatial Services to its own folder (#1345)
  • Bumped minimist from 1.2.5 to 1.2.6 in /website (#1455)
  • Bumped node-forge from 1.2.1 to 1.3.0 in /website (#1451)
  • Bumped prismjs from 1.25.0 to 1.27.0 in /website (#1430)
  • Bumped follow-redirects from 1.14.7 to 1.14.8 in /website (#1402)
  • Bumped nanoid from 3.1.23 to 3.2.0 in /website (#1355)
  • Bumped shelljs from 0.8.4 to 0.8.5 in /website (#1347)
  • Bumped follow-redirects from 1.14.1 to 1.14.7 in /website (#1348)
  • Bumped cross-fetch from 3.1.4 to 3.1.5 in /website (#1496)
  • Bumped async from 2.6.3 to 2.6.4 in /website (#1481)
  • Pinned onnxmltools to a specific version (#1524)

Bug Fixes 🐞

  • Fixed twitter sentiment detection notebook (#1544)
  • Fixed issue with DataConversion serialization (#1505)
  • Fixed typos in TestBase (#1501)
  • Fixed issue in GridSpace python API (#1470)
  • Fixed reflective class loading in IntelliJ (#1456)
  • Removed verbose ComputeModelStatistics output and convert scoredLabelsCol to DoubleType (#1361)
  • Fixed flaking in geospatial notebooks

Code Style 🎶

  • Improved style checks using pre-commit (#1538, #1528, #1535)
  • Formatted code and notebooks with Black style checker (#1522, #1520)

Documentation 📘

  • Tabularized badges for readability (#1486)
  • Added a PR template (#1418)
  • Improved installation readme (#1369, #1422)
  • Added a Security readme (#1511)
  • Updated the Azure Synapse readme (#1372)
  • Remove reference to custom maven resolver
  • Added pointer to docs on synapse pool configuration
  • Fixed typos in readme (#1516)

Contributor Spotlight

We are excited to highlight the contributions of the following SynapseML contributors:

Serena Ruan Ric Serradas Puneet Pruthi
Serena is a Software Engineer II on the Synapse team in Beijing and a force of nature. In this release, Serena has continued her prolific contribution steak by adding language support for .NET, C#, and F# and integrating SynapseML with MLFlow. Additionally, Serena has contributed several features to the MLFlow and Spark.NET open-source communities so that these systems can work better for every user. These contributions are just some of the many amazing things Serena has accomplished during this release, and her devotion and craft are pivotal to the ecosystem. Ric is a Senior Engineering Manager on the OneNote team with a shining personality and drive to collaborate. In just a few weeks Ric hit the ground running by setting up an automated link between GitHub and Azure DevOps, building the first working version of SynapseE2E tests, and re-writing our entire build in GH Actions. Furthermore, Ric worked tirelessly through nights and weekends to land his contributions. Puneet is a Senior Engineer on the SynapseML team with a knack for engineering systems and dockerization. Puneet's contributions to the library include architecting the new binder integration, driving our Synapse E2E tests to completion, and improving SynapseML’ s infrastructure around community engagement. Puneet is constantly thinking of ways to improve the community and we value his effort.
Mark Niehaus Keerthi Yanda Yagna Oruganti
Mark is a Senior Software Engineer on the SynapseML team with a deep knowledge of the .NET ecosystem and infrastructure development. In this release, Mark architected SynapseML’ s .NET binding blob publishing strategy, drove the OpenAI GPT-3 bindings to completion, and wrote a detailed GPT-3 walkthrough. Mark completed these projects while supporting the Time Series Insights service, speaking to his ability to keep multiple plates spinning at a time. Keerthi is a Software Engineer II on the SynapseML team. Despite joining Microsoft just a few months ago, Keerthi has quickly learned the SynapseML ropes to take command of our integration with the Azure Synapse platform. Huge kudos to her for braving long build times, and daunting error messages to make sure SynapseML works out of the box on Synapse Analytics clusters. Yagna is a Senior Data and Applied Scientist on the Industry AI team with a talent for building solutions that integrate many community tools to solve customer challenges. Yagna's first contribution to SynapseML was a masterpiece of a demo showing how to use Isolation Forests, MLFlow, Tabular SHAP, and the interpret-ml explanation dashboard in a single anomaly detection example.

Acknowledgements

We would like to acknowledge the developers and contributors, both internal and external, who helped create this version of SynapseML

Serena Ruan @serena-ruan, Eric Dettinger, Scott Votaw @svotaw, Puneet Pruthi @ppruthi, Ric Serradas @riserrad, Mark Niehaus @niehaus59, Kyle Rush @k-rush, Keerthi Yanda @KeerthiYandaOS, Yagna Oruganti @YagnaDeepika, Jason Wang @memoryz, Ilya Matiach @imatiach-msft, Yazeed Alaudah @yalaudah, Elena Zherdeva @ezherdeva, Kashyap Patel @ms-kashyap, Martha Laguna @martthalch @marthalc, Alex Li @liyzcj, Maria Guirguis @maguir, Alexandra Savelieva @alsavelv, @netang, Sudhindra Kovalam @SudhindraKovalam, Markus Cozowicz @eisber, Tom Finley, Markus Weimer, Jeff Zheng, James Verbus @jverbus, Chris Hoder, Misha Desai, Nellie Gustafsson, Eren Orbey, Beverly Kodhek, Louise Han @jr-MS, Justyna Lucznik, Kim Manis, Mitrabhanu Mohanty, Bogdan Crivat, Anand Raman, William T. Freeman, James Montemagno, Luis Quintanilla, Dennis Kennedy, Ryan Hurey, Jarno Ensio, Brian Mouncer, Steve Suh @suhsteve, Akshaya Annavajhala (AK), Guolin Ke, Tara Grumm, Niharika Dutta @Niharikadutta, Andrew Fogarty, Juanyong Duan, Weichen Xu @WeichenXu123, Spark.NET Team, ONNX Team, Azure Global, Vowpal Wabbit Team, LightGBM Team, MSFT Garage Team, MSR Outreach Team, Speech SDK Team, MLflow Team

Learn More

Visit our website for the latest docs, demos, and examples Read more about SynapseML's GA release in the Microsoft Research Blog Learn more about our .NET bindings and code generation system.
Watch a demonstration of SynapseML to create a multilingual search engine. Read our Paper from IEEE Big Data '21 Explore our integration with the Azure OpenAI Service