Just in:
Konica Minolta is named ASEAN 2023 Market Leader in Colour Light and Mid Digital Production Printers // Ajman Celebrates Conclusion of Ramadan Activities with Grand Ceremony // Hope for Respite as UAE Endorses UN Plea for Gaza Truce // Samsung Partners National Heritage Board to Bring a Slice of Singapore’s Cultural Heritage to Samsung The Frame TV // Sharjah Chamber Breaks Ground on Final Expansion with New HQ Pact // Superland Announced Annual Results for 2023, 2023 Net Profit Increased approximately 39.5% to approximately HK$22.2 million as Compared to the 2022 Adjusted One // Andertoons by Mark Anderson for Thu, 28 Mar 2024 // Emirates Post Speeds Up Deliveries for GCC with Special Day // Following the Money Trail: US and UK Investigate $20 Billion in USDT Transfers Tied to Sanctioned Russian Exchange // Global Audience to Witness Thrill of Dubai World Cup // Experience Ultimate Shopping Freedom at 4.4 Shopee Spree: Don’t Worry, Shop Shopee! // TUMI Hosts Global Launch Event in Singapore to Unveil Women’s Asra Collection and Announce Global Ambassador, Mun Ka Young // U.S. Compliance Takes Center Stage at OKX Following Industry Jitters // Emirati Aid Reaches Ukraine as Food Shortages Bite // First-Ever Fortune Innovation Forum Draws Top Global Leaders to Hong Kong, Promoting Agendas On Collective Cross-Sector Advancement // 2024 Lok Sabha Elections Will Be The Costliest One Till Now In The Whole World // German Job Market Resilience Bodes Well for Economic Recovery // Ingdan Announces 2023 Annual Results // Sunshine’s Debut Features Leave Tech World Scratching Its Head // Lisboeta Macau’s world first LINE FRIENDS PRESENTS CASA DE AMIGO and BROWN & FRIENDS CAFE & BISTRO has officially opened //
HomeBiz TechSpark Summit spotlights Machine Learning, but that's not all

Spark Summit spotlights Machine Learning, but that's not all

apache spark becomes top level project

Paraphrasing Garrison Keillor, it’s been a quiet week in the Apache Spark community – at least compared to last year, where the definitive Spark 2.0 was unveiled. Last week, Spark Summit pulled into Boston, and so did one of those nor’easters that make Boston so alluring in February.

And so the Spark project, for now, is engaging in the blocking and tackling chores of cleaning up or optimizing APIs. For instance, a recent update of Spark 2.0 has added pipeline processing for enabling more efficient running of complex machine learning jobs.

ADVERTISEMENT

And of course while we’re on that topic of machine learning, it was virtually impossible to evade presentations covering it. This is 2017 after all. Almost every customer in the enterprise case study track spoke of using machine learning in their solutions, or adding it on their next steps. More often than not, ML was paired with streaming, SQL query, and with graphs built to identify critical interrelationships.

Netflix described its use of Spark ML, the emerging set of machine learning libraries as the core of its personalized recommendation engine that is Spark’s future direction. GoDaddy’s small business success index, which provides health scores for the effectiveness of its customers’ websites, incorporates natural language processing to parse interactions, predictive statistical models to identify topics, and machine learning to identify which content is the most effective.

For others, machine learning is the next step. Capitol One discussed the success of its Second Look app that provides timely alerts to credit card customers for unexpected and potentially mistaken charges. They use Spark in conjunction with Kafka to provide an efficient queuing system for processing incoming streams to screen for suspicious charges. It is looking at adding machine learning as a means for personalizing the alerts in the future.

With Spark 2.0 barely in general availability for less than a year, a key theme of the summit was showing attendees how to take advantage of new features, such as Structured Streaming, which lets you run the same SQL calls against data in motion and data at rest – you can aim the same query at data flowing in through Spark Streaming and data sitting in columnar data stores such as Parquet.

Spark is one of a growing number of paths for collapsing the Lambda Architecture, which specifies the design of separate batch and real-time processing tiers. That’s not surprising; the compute engine or data platform that can readily accommodate both needs becomes a good candidate for becoming your gateway platform to big data compute. Others, such as Google with Apache Beam, and MapR and Hortonworks, with their respective data flow management engines, are making such bids.

ADVERTISEMENT

Much of the anticipation this year was over the plans for UC Berkeley’s successor to AMPLab, the research center that gave rise to Spark, plus projects such as Mesos and Alluxio (a.k.a., Tachyon). AMPLab’s mission targeted advanced analytics through batch processing. RISELab, the successor, is picking up where AMPLab left off, focusing on secure real-time processing.

And one of the first projects on RISELab’s agenda is particularly pertinent for Spark: coming up with a pure streaming engine that’s faster than Spark Streaming (which technically performs microbatching, not streaming). The new project, Drizzle, was actually unveiled at Spark Summit West last summer. Early benchmarks show it processing 10x faster than Spark Streaming at 10s of millions of events per second; but the superior performance of Flink at the extreme end of the scale (at the 20 million event mark) shows there’s still plenty of work to be done.

One of the findings of Databricks recent Spark survey is the level of community activity. Now that the conference is over, somebody else wants the last word.

Today, a team from Yahoo announced their contribution to the Spark community: the open sourcing of TensorFlowOnSpark. As the unwieldy project name implies, it’s about making TensorFlow, the deep learning libraries open sourced by Google last year, to run on Spark.

There was plenty of excitement last summer when Databricks and Google collaborated on TensorFrames, providing a way for TensorFlow to execute via Spark’s DataFrame. But according to Yahoo data scientist Andy Feng, the result was a compromise, as performance couldn’t equal running TensorFlow natively on the Google Cloud Platform. Their package, TensorFlowOnSpark, has a smaller API, and allows TensorFlow operations to execute asynchronously, without having to go through the bottleneck of the Spark driver. Oh, and if your cluster has a higher-bandwidth Infiniband network, TensorFlowOnSpark can optimize memory management for that.

While Spark 2.0 has made great strides in defining a stable target for developers (as they now know how the APIs are organized), there are still many blanks left to fill. There’s little doubt, for instance, that there are many in the R and Python developer communities that would like better optimizations that surpass what’s possible with the DataFrame. Putting R and Python programming on a level playing field with Scala will merit its own post.

Either way, while this year’s Spark Summit was relatively short on news, it’s hardly a sign that it’s time to close down the patent office.

(via PCMag)

ADVERTISEMENT

ADVERTISEMENT
Just in:
Superland Announced Annual Results for 2023, 2023 Net Profit Increased approximately 39.5% to approximately HK$22.2 million as Compared to the 2022 Adjusted One // Samsung Partners National Heritage Board to Bring a Slice of Singapore’s Cultural Heritage to Samsung The Frame TV // Global Audience to Witness Thrill of Dubai World Cup // Emirates Post Speeds Up Deliveries for GCC with Special Day // Arvind Kejriwal Was Used By BJP In 2011 Movement To Take On The Congress // Ajman Celebrates Conclusion of Ramadan Activities with Grand Ceremony // 2024 Lok Sabha Elections Will Be The Costliest One Till Now In The Whole World // Sunshine’s Debut Features Leave Tech World Scratching Its Head // German Job Market Resilience Bodes Well for Economic Recovery // Renewables Surge Sets Record, But Global Equity Lags // Following the Money Trail: US and UK Investigate $20 Billion in USDT Transfers Tied to Sanctioned Russian Exchange // Experience Ultimate Shopping Freedom at 4.4 Shopee Spree: Don’t Worry, Shop Shopee! // No running of govt from jail, says Delhi Lt Governor // AIA Hong Kong Wins More Than 20 Accolades at MPF Ratings MPF Awards, BENCHMARK MPF of The Year Awards and Bloomberg Businessweek Top Fund Awards // Meta Earth Official Website Launch: The Pioneer Explorer in the Modular Public Blockchain Domain // Andertoons by Mark Anderson for Thu, 28 Mar 2024 // Konica Minolta is named ASEAN 2023 Market Leader in Colour Light and Mid Digital Production Printers // U.S. Compliance Takes Center Stage at OKX Following Industry Jitters // Universal Language for Healthcare: General Authority Embraces Global Coding System // TUMI Hosts Global Launch Event in Singapore to Unveil Women’s Asra Collection and Announce Global Ambassador, Mun Ka Young //