Machine learning operations don’t belong with cloudops

It’s Monday morning, and after a long weekend of system trouble the cloud operations team is discussing what happened. It seems that several systems that were associated with a very advanced, new inventory management system enabled with machine learning had issues over the weekend. The postmortem concluded the following:

  • The batch process that moved raw data from the operational database to the training database failed, as well as the auto recovery process. An ops team member who was working over the weekend attempted to resubmit but caused not one, but four partial updates that left the training database in an unstable state.
  • This caused the knowledge models in the machine learning systems to train with bad data and required that the new information in the knowledge base be removed and the models rebuilt.
  • Also, several outside data feeds, such as pricing and tax data, were updated at the same time to the training database. Although those worked fine, they too needed to be backed out of the knowledge database considering that the operational data was not in a good state.
  • The system was unavailable for two days and the company lost $4 million, considering lost productivity, customer reactions, and PR issues.

This is not 2025; this is today. As enterprises find more uses for “cheap and good” cloud-based machine learning systems we’re finding that the systems that leverage machine learning are complex to operate. The ops teams do not expect the degree of difficulty and the complexity and are finding that they are undertrained, understaffed, and underfunded.

To read this article in full, please click here

Vía InfoWorld Cloud Computing


Marketing Ops: A New Spin on Process Refinement
At Gartner for Marketing, we help identify the key challenges marketing leaders face and recommendations to address them. Refining standardized processes through frameworks alongside trial & error is how marketing…

Vía Gartner Blog Network gbn-feed – Gartner Blog Network

Blazor Tips and Tricks | Visual Studio Toolbox

In this episode, Robert is joined by Ed Charbeneau. Ed was on Toolbox last fall introducing us to Blazor, a framework for building interactive client-side web UI using C# instead of JavaScript. Ed returns to share and answer some of the most frequently asked questions he receives when showing folks Blazor.  

Vía Channel 9

Amazon SQS Now Supports Tag-on-Create

To more easily identify the purpose of Amazon Simple Queue Service (Amazon SQS) queues and track costs associated with messaging, you can categorize queues using metadata tags. For example, you can use tags to identify all Amazon SQS queues used by a particular department, project, or application.

Vía Recent Announcements