Informatica World 2019 Recap: Improving Data for Better AI

This year’s Informatica World featured discussions around the preparation and maintenance of data for use in artificial intelligence and machine learning.

Written by . Last Updated:
Informatica World 2019 focuses on improving data management for AI

The biggest news from this year’s Informatica World, explained by the speakers who were there.

Artificial intelligence is popping up everywhere, from factory floors to hospital rooms to IT departments and more. The use cases are varied, the circumstances are diverse, but every AI tool has one thing in common: a need for data.

There’s plenty of data. Years ago, organizations began building data lakes, waiting for new technologies and tools that would help turn that data into binary gold. Those tools are now here, driven by AI and machine learning (ML), and there’s one big problem: Much of the data is insufficient.

Data Management for a Data-Driven World

This year’s Informatica World, titled CLAIRITY UNLEASHED (after Informatica’s metadata-driven AI engine) dedicated a lot of time to improving the relationship between data and AI.

One of the difficulties with this relationship is that it’s a bit of a catch-22: AI requires sufficient data, yes, but data first requires AI and machine learning before it can be applied to any intelligence-creating algorithms.

Informatica CEO Anil Chakravarthy spoke about this relationship during the opening general session. He was later invited onto The Cube for a live-stream interview, where he expanded on his ideas:

“One of the key components [of AI and ML] is the availability of the right data —because you have to train these machine learning algorithms, the data scientists have to be able to find the right data, and then they have to prepare the right data, make sure they have access to the right data, clean it up, and then put it into the AI models. Because the training of the algorithms is very sensitive to the quality of the data.

“The flip side is … how do they manage all of that data, because the management of data is not just about availability of data, or the performance of the systems. It’s also the security of the data, the governance of the data, the availability of the data to the right users at the right time. Trying to do all of that manually, you just can’t keep up, and that’s where you need machine learning and AI to be able to do that for you in an automated manner.”

Data Lakes

Maintaining massive volumes of data can be a convoluted process, especially when the data is kept in a heterogeneous data lake. Databricks is helping organizations clean, store, and prepare data, and was at Informatica World to introduce a new partnership with Informatica.

“Every enterprise on the planet wants AI. It’s crucial for them,” said Ali Ghodsi, CEO and co-founder of Databricks. “So every software, every service, every product, will in the next ten years be adding AI capabilities to it.”

“Dealing with petabyte-scale data and cleaning it, making sure that it’s actually the right data for the task at hand, is not easy, so that’s the part that people are struggling with.”

“They’re saying these [AI] projects are actually being delayed, they’re having challenges, and the challenges typically have to do with data that is not reliable, it doesn’t have the quality it needs to have, it’s a little bit of garbage in, garbage out,” explained Ghodsi.

It’s a data hygiene problem. Training an AI model requires high-quality data that must be cleaned, scrubbed, prepared, and formatted.

The new integration between Informatica and Databricks will help data engineers “discover the right datasets and ingest high volumes of data from multiple sources into Delta Lakes,” using intelligent governance and auditing to prepare the data for machine learning.

Intelligent workload automation provides monitoring and resource provisioning for scalable automation

Modern IT environments have a lot of moving parts. Don’t let complexity bog you down.

Find out how intelligent workload allocation and resource provisioning can provide scalability and flexibility to hybrid IT environments.

Google Cloud

During the opening general session, Chakravarthy introduced Google Cloud CEO Thomas Kurian to announce a new partnership between Informatica and Google Cloud.

“If your data is not clean and well-organized, you’re not going to get a good outcome in your algorithm. Because if you have a skew in your algorithm, the algorithm is going to generate the skew in the result,” explained Kurian.

In order to better prepare data for use in AI and ML, Informatica Intelligent Cloud Services (IICS) and Master Data Management (MDM) will be available on the Google Cloud Platform (GCP), making it easier for organizations to build end-to-end data pipelines regardless of whether the data is stored on-premise, in a hybrid IT environment, or in the cloud.

“We want customers to be able to move data in real-time, or in batch, into the cloud,” said Kurian. For customers “to be able to describe the data they are moving into a catalog, to be able to then process that —and processing that could be cleansing it, standardizing it, and then running things like Spark and Hadoop calculations— to put the results in a warehouse, and be able to do analysis in the warehouse, and lastly to visualize it in your favorite visualization tool.”

Free Data Assessment

Later in the day, Chakravarthy was joined onstage by Ariel Kelman, AWS VP of worldwide marketing, announcing a partnership between Informatica, AWS, and Cognizant.

“What’s great about the partnership is all of [AWS’s] services need data, and a lot of that data is still on-premises,” said Kelman. “We see a lot of companies going through very large efforts to move their data to the cloud.”

But in order to do that, they must connect multiple data silos, forcing companies to focus on “documenting and coming up with a real strategy around governance and security and access control of the data as a prerequisite to even starting the migration of the data,” which is where “Informatica has been able to help [AWS] customers a lot.”

The partnership between Informatica, AWS, and Cognizant is aimed at helping enterprises migrate their data to AWS. Customers will be able to leverage a free Intelligent Data Migration Assessment powered by Informatica’s Enterprise Data Catalog, providing deeper insights about the data customers are interested in migrating.

“We talk to a lot of customers who are very eager to start migrating large amounts of data to the cloud, but they need to develop a plan,” explained Kelman. “They need to document the different data sources, the data governance rules, the work they need to do, how much of it to clean before the move.”

Chakravarthy then described the partnership as an “intelligent data migration assessment for migrating your on-premises data to AWS. It starts with a free self-service assessment that can be done by yourselves, you can download the tools and it will give you a free self-service assessment.”

The Future of CLAIRITY

“What you’re seeing now is the application of AI to accelerate the digital transformation we’ve been talking about for the last couple of years,” said Chakravarthy.

That was during the opening general session. For the closing general session, the audience was given a look at the long-term development of AI.

Informatica Executive Vice President and Chief Marketing Officer Sally Jenkins began the closing general session. She spoke about the journeys organizations take towards digital transformation.

“This year, we’re showing you how CLAIRITY can be unleashed through trusted data, and AI has been with us throughout. Data and AI are disrupting everything —your industries, your customers’ industries— in ways that we couldn’t even imagine just a few short years ago. But data and AI are not your end goal, they are a means to achieving all of your business outcomes, no matter where you are on your data journey.”

Merging AI and data with business outcomes is critical to the success of AI as a technology and to the enterprises that implement these new tools. Amit Wallia, Informatica president of products and marketing, spoke about the future of Informatica’s CLAIRITY and the journey moving forward.

“We believe AI will complement each and every one of us in this room. AI is going to be our personal assistant, a white-glove treatment, for us to automate, accelerate our journeys so we can truly focus on strategic activities and we can truly become data superheroes…. And as you take that journey forward, clearly, with the help of CLAIRE, and with the help of AI, you can definitely take the knowledge of your past and help reduce the uncertainty of your future, and truly transform your organizations.”

Caroline Boyland was a contributor to IT Automation Without Boundaries, covering workload automation, data center automation, cloud management, and more.