Customers can also use Snowflake to store and materialize features using Streams and Tasks, or their existing ELT tools such as Airflow and dbt to manage transformations. Snowflake partners with companies like AtScale, Rasgo, and Tecton, enabling users to adopt out-of-the-box platforms that enhance the management, discovery, and processing of features at scale through deep integrations with Snowflake.
To provide data scientists with access to all relevant data, even that beyond their organization, the Snowflake Data Cloud simplifies data sharing between partners, suppliers, vendors, and customers using Secure Data Sharing and access to third-party data through the Snowflake Data Marketplace. This offers access to unique data sets that can help increase the accuracy of models without complex data pipelines. Secure Data Sharing in Snowflake doesn’t require data transfer via FTP or the configuration of APIs to link applications. It simplifies ETL integration and automatically synchronizes “live” data among data providers and data consumers. Because the source data is shared rather than copied, customers don’t require additional cloud storage. The Snowflake Data Marketplace and Data Exchange enable data scientists to easily collaborate on models by sharing both raw and processed data.
To make model results easily accessible to the users that can act on those model results, Snowflake allows users to consume them via dashboards, reports, and business analytics tools by leveraging connections with ecosystem partners like Looker, Sigma, Tableau, and ThoughtSpot. With Snowflake, your data can be stored in any cloud and on any region to best suit to your needs with a consistent data experience for collaboration and migration when needed.
Overall, by offering a single platform, Snowflake removes the need to run separate systems every time there is a switching of tools, libraries, or languages. In addition, model results from data scientists are fed into Snowflake so that they are available for data apps and to nontechnical users for generating business value.
Flexibility of Language and Framework
By providing developers with a single platform that supports their language of choice, and popular open source and commercial solutions, Snowflake enables developers to spend more time on generating actionable business insights.
Snowpark is a new developer framework for Snowflake. It allows data engineers, data scientists, and data developers to code in their familiar way with their language of choice — including Python, Scala, and Java — and execute data pipelines and ML workflows faster and more securely in a single platform. Developers want flexibility when working with data, elastically scalable environments that require near-zero administrative work and maintenance, and immediate access to the data they need. With Snowpark, developers can unlock the scale and performance of Snowflake’s engine and leverage native governance and security controls that are built into Snowflake’s platform.
Through Snowflake’s broad partner ecosystem, customers can take advantage of direct connections to existing and emerging data science tools and languages such as Python, R, Java, and Scala; open source libraries such as PyTorch, XGBoost, scikit-learn, and TensorFlow; notebooks like Jupyter and Zeppelin; and platforms such as Amazon SageMaker, Dataiku, DataRobot, and H2O.ai. As per Snowflake, users of Amazon SageMaker can either leverage prebuilt integrations to Amazon SageMaker Data Wrangler or Amazon SageMaker Autopilot or use the Snowflake Connector for Python to directly populate Pandas DataFrames in their notebook instances. This high-speed connection results in accelerated model development, as well as optimized data preparation and feature engineering. Dataiku, DataRobot, and H2O.ai have a built-in Snowflake integration where its users can quickly connect their account with Snowflake and push down the processing across multiple steps of the workflow to Snowflake’s elastic performance engine.
Through the Snowflake and Anaconda partnership and product integrations, Snowflake users can now seamlessly access one of the most popular ecosystems of Python open source libraries without the need for manual installs and package dependency management. The integration can fuel a productivity boost for Python developers.
In summary, with Snowpark for Python (in preview), data teams can:
- Accelerate their pace of innovation using Python’s familiar syntax and thriving ecosystem of open source libraries to explore and process data where it lives.
- Optimize development time by removing time spent dealing with broken Python environments with an integrated Python package dependency manager.
- Operate with improved trust and security by eliminating ungoverned copies of data with all code running in a highly secure sandbox directly inside Snowflake.
To make all of this functionality available to their rich partner ecosystem, the Snowpark Accelerated Program highlights partners with Snowpark integrations that extend Snowflake’s engine to their customers.
Performance Across the ML Workflow Steps
Snowflake can handle large amounts of data and users simultaneously. Its intelligent, multicluster compute infrastructure automatically scales to meet feature engineering demands without any bottlenecks or user concurrency limitations. To automate and scale feature engineering pipelines, users can leverage Streams and Tasks to have their data ready for model inference.
For bulk inference, Snowflake simplifies the path to production with the flexibility to deploy models inside Snowflake as user-defined functions (UDFs). Partners such as Dataiku, DataRobot, and H2O.ai are building a more integrated experience for users to have a guided workflow to deploy trained models into Snowflake effortlessly. For real-time inference, users can deploy models in an external layer (e.g., Docker) and easily request predictions directly from inside Snowflake by using External Functions to communicate with the model’s API endpoint.
Enterprise-Grade Security and Governance
Snowflake enables organizations to enforce consistent enterprise-grade governance controls and security across all AI/ML workflows, limiting AI bias. Snowflake’s Data Cloud is built on a multilayered security foundation that includes encryption, access control, network monitoring, and physical security measures, enabling the adversarial robustness needed for an AI/ML solution. In addition to the industry-standard technology certifications such as ISO/IEC 27001 and SOC 1/SOC 2 Type 2, Snowflake complies with important government and industry regulations such as PCI, DSS, HIPAA/Health Information Trust Alliance, and FedRAMP certifications. All of this compliance is critical for AI/ML deployments across industrial use cases. Snowflake’s scalable data governance and security features enable organizations to address their machine learning trust initiatives easily. With security features like anonymized views, dynamic data masking, and row/column-level policies, organizations can ensure that data scientists are restricted from using sensitive information that can lead to bias in models.
Today, enterprises are confronted with a complicated set of business challenges, including an increasing pace of business, an expanding volume of business data, the need to think about shared data strategies to truly derive value from data, a growing scope of global commerce, and a multitude of risks for customers, employees, and suppliers. The volume of customers and suppliers, along with regulatory complexity and multi-industry businesses, means complexity is common in global businesses.
Enterprises are rationalizing, modernizing, and transforming their enterprise application portfolios. Machine learning, natural language processing, assistive user interfaces, and advanced analytics coupled with curated data sets are advancing traditional applications to become intelligent.
These intelligent applications enable more employee insights by automating transactions that were previously stalled and bringing more data into the equation so organizations can make better decisions immediately. Organizations need a data strategy for AI, which will vary greatly depending on the size, nature, and complexity of their business and AI strategy. To accelerate innovation and time to value and enjoy a sustainable competitive advantage, technology buyers are advised to:
- Build a talent pool of industry domain and technical experts like data engineers, data scientists, and machine learning engineers.
- Get employee buy-in and trust for the data strategy with inclusivity and transparency.
- Create a workflow for bringing in third-party and/or net-new data sources into the organization, including testing, buying, and seamless integration with existing internal data sets and processes.
- Ensure the process is cross-functional across IT, procurement, legal, compliance, and security.
- Select a secure and governed data platform with support for all data types to support the entire AI/ML life-cycle workflow.
- Ensure flexibility in programming with support for multiple programming languages like Python, Java, and Scala, as well as leading machine learning workflows like TensorFlow, PyTorch, and scikit-learn.
- Embrace an intelligent data grid that helps:
- Automate and enforce universal data and usage policies across multicloud ecosystems.
- Automate how data is discovered, cataloged, and enriched for users.
- Automate how to access, update, and unify data spread across distributed data and cloud landscapes without the need of doing any data movement or replication.
Many companies adopt AI as they undergo digital transformation — not just because they can, but because they must. AI is the technology helping businesses be agile, innovative, and scalable. Successful enterprises will become “AI first” organizations able to synthesize information (i.e., use AI to convert data into information and then into knowledge), learn (i.e., use AI to understand relationships between knowledge and apply learning to business problems), and deliver insights at scale (i.e., use AI to support decisions and automation). AI is becoming ubiquitous across all the functional areas of a business. IDC forecasts the overall AI software market will approach $596 billion in revenue by 2025, growing at a CAGR of 17.7%.
Data is the heart of AI initiatives. Organizations need to strengthen their data strategy for AI and adopt a secure, governed, collaborative, and scalable data platform that helps data science professionals focus on data science and scale AI initiatives seamlessly.