OrangeBot.AI Digest — 2026-06-26
90 headlines across 8 sources, aggregated for this day.
Hacker News(15)
- U.S. government will decide who gets to use GPT-5.6 (www.washingtonpost.com)
- Data centers trigger voter backlash (www.newsweek.com)
- Previewing GPT‑5.6 Sol: a next-generation model (openai.com)
- Show HN: Smart model routing directly in Claude, Codex and Cursor (github.com)
- MicroVMs: Run isolated sandboxes with full lifecycle control (aws.amazon.com)
- The AI industry is pouring millions into US elections (www.bloodinthemachine.com)
- Jolla Phone (October 2026) (commerce.jolla.com)
- Springer Nature has removed two studies by Max Planck (www.science.org)
- Ultrasound imaging of the brain (alephneuro.com)
- Incident CVE-2026-LGTM (nesbitt.io)
- My Steam Machine is a 50ft HDMI cable (blog.matthewbrunelle.com)
- Why current LLM costs are not sustainable (aditya.patadia.org)
- 22-year-old Mozart's handwritten notebook unearthed in 'major discovery' (www.classicfm.com)
- US Govt to individually approve who gets GPT 5.6 (old.reddit.com)
- We all depend on open source. We will defend it together (akrites.org)
GitHub Trending(15)
- simplex-chat / simplex-chat
- google-labs-code / design.md
- commaai / openpilot
- kunchenguid / no-mistakes
- grafana / grafana
- ripienaar / free-for-dev
- opendatalab / MinerU
- alchaincyf / zhangxuefeng-skill
- mauriceboe / TREK
- xbtlin / ai-berkshire
- calesthio / OpenMontage
- aws / agent-toolkit-for-aws
- NanmiCoder / MediaCrawler
- garrytan / gstack
- IceWhaleTech / CasaOS
Product Hunt(15)
- Basedash for Excel
Turn any Excel file into a live dashboard
- ModuleX
AI workspace that’s already connected to everything
- DMV by Agent Community
A community-governed namespace for AI agents
- Sleek Analytics
See who's on your site. Right now.
- SquidHub
Multiplayer mode for humans and AI
- note.md
your notes and research documentation now a local LLM Memory
- Cewsco
All-in-one AI assistant — chat, images, voice & market data
- Gemini Spark
Your 24/7 personal AI agent
- Atlas
Every AI tool you use should know how your company works
- Aurora Notch
A private notch workspace for every Mac
- LockIn MCP
Let AI block distractions for you when you need to lock in
- AI Slide Editor by CubeOne
The editor PowerPoint should've shipped
- Agent Arena
The first public arena for AI agents
- Animdock Motion Templates in the Browser
Create trend motions in your browser!
- Group Subscriptions by beehiiv
Sell subscriptions to teams, companies, and organizations.
Hugging Face(15)
- DanceOPD: On-Policy Generative Field Distillation
Modern image generation demands a single model that unifies diverse capabilities, including text-to-image (T2I), local editing, and global editing. However, these capabilities are rarely naturally aligned and often conflict. For instance, editing tends to degrade T2I performance, while global and local editing interfere with each other. Consequently, effectively composing these capabilities has become a central challenge for image generation model training. To tackle this, we introduce DanceOPD, an on-policy generative field distillation framework for flow-matching models that routes each sample to one capability field, queries one low-noise student-induced state, and trains with a simple velocity MSE objective. With each capability source defined as a velocity field over the shared flow state space, the student learns from fields queried on its own rollout states to compose expert capabilities. This formulation also absorbs operator-defined fields such as classifier-free guidance. Comprehensive experiments on T2I, editing, realism-field absorption, and CFG absorption show that our approach improves multi-capability composition, strengthening target capabilities while preserving anchor generation quality. We believe this work establishes a practical route for generative field distillation in flow-matching models.
- ViQ: Text-Aligned Visual Quantized Representations at Any Resolution
A unified representation for text and vision is a natural pursuit, as it enables simpler multimodal modeling and more efficient training. However, representing images as discrete signals in the same way as text inevitably introduces severe information loss. Existing work struggles to balance low-level details and high-level semantics in discrete representations: reconstruction-oriented representations often lack semantic information, whereas semantically stronger features typically suffer from severe loss of detail. We present ViQ, a Visual Quantized Representations framework, which is designed to balance semantics and details in discrete representations while supporting inputs at native resolutions, thereby enabling it to serve as a unified and general discrete representation for arbitrary visual inputs. Our approach structures quantization learning into two stages: text-aligned pre-training and feature discretization. With text-aligned pre-training, we enhance the visual encoder semantic-rich supervision from the pretrained language model and enable it to process native-resolution visual inputs. During discretization, we propose a proximal representation learning strategy to progressively compact the feature space, along with a position-aware head-wise quantization mechanism that enables flexible processing of arbitrary resolutions. Extensive experiments on multimodal tasks demonstrate that ViQ achieves competitive performance compared to state-of-the-art multimodal vision encoders with continuous and high-dimensional visual features, while maintaining high precision in low-level reconstruction. We also show that multimodal training with visual quantized representations largely improves efficiency, yielding up to 20\%-70\% acceleration with different base LLMs and training recipes.
- OPID: On-Policy Skill Distillation for Agentic Reinforcement Learning
Outcome-based reinforcement learning provides a stable optimization backbone for language agents, but its sparse trajectory-level rewards provide little guidance on which intermediate decisions should be reinforced or suppressed. On-policy self-distillation offers dense token-level supervision, yet existing skill-conditioned variants often rely on external skill memories or retrieved privileged context, which are costly to maintain and can be mismatched with the state distribution induced by the current policy in multi-turn interaction. We propose OPID (On-Policy Skill Distillation), a framework that extracts skill supervision directly from completed on-policy trajectories. OPID represents trajectory hindsight as hierarchical skills: episode-level skills capture global workflows or failure-avoidance rules, while step-level skills capture local decision knowledge at critical timesteps. A critical-first routing mechanism uses step-level skills when critical decisions are identified and falls back to episode-level skills as default guidance otherwise. The selected skill is injected into the interaction history, allowing the old policy to re-score the same sampled response under both original and skill-augmented contexts. The resulting log-probability shift yields a token-level self-distillation advantage, which is combined with the outcome advantage for policy optimization. OPID thus preserves RL as the primary training objective while introducing dense, distribution-matched hindsight supervision. Experiments on ALFWorld, WebShop and Search-based QA demonstrate that OPID generally improves agent performance, sample efficiency, and robustness over outcome-only RL and existing skill-distillation baselines. Our code is available at https://github.com/jinyangwu/OPID/tree/main.
- Qwen-Image-Agent: Bridging the Context Gap in Real-World Image Generation
While text-to-image (T2I) models have achieved remarkable progress, they struggle with real-world requests that are often underspecified, implicit, or dependent on up-to-date knowledge. We identify this challenge as the Context Gap: the mismatch between the user context and the sufficient generation context for T2I models. To bridge this gap, we propose Qwen-Image-Agent, a unified agentic framework that integrates plan, reason, search, memory and feedback in a context-centric manner. Qwen-Image-Agent treats user input as partial context and progressively constructs the generation context through Context-Aware Planning and Context Grounding. Specifically, Context-Aware Planning identifies missing context and plans how it should be acquired and used, while Context Grounding gathers this context from reason, search, memory, and feedback. To evaluate agentic image generation, we further introduce Image Agent Bench (IA-Bench), a benchmark covering four core image agent capabilities: Plan, Reason, Search, and Memory. Experiments on IA-Bench, Mindbench and WISE-Verified show that Qwen-Image-Agent outperforms strong baselines and achieves state-of-the-art performance.
- The Verification Horizon: No Silver Bullet for Coding Agent Rewards
A classical intuition holds that verifying a solution is easier than producing one. For today's coding agents, this intuition is being inverted: as foundation models develop stronger reasoning capabilities and engineering harnesses grow more sophisticated, generating complex candidate solutions is no longer difficult -- reliably verifying them has become the harder problem. Every verifier we can build is only a proxy for human intent, never the intent itself. This makes verification subject to a twofold difficulty: first, intent is underspecified by nature, making it inherently hard to faithfully check whether it has been fulfilled; second, during model training, optimization widens the gap between proxy and intent -- manifesting as reward hacking or signal saturation. To address this, we characterize the quality of verification signals along three dimensions -- scalability, faithfulness, and robustness -- and argue that achieving all three simultaneously is the central challenge. We further study four reward constructions: a test verifier for general coding tasks, a rubric verifier for frontend tasks, the user as verifier for real-world agent tasks, and an automated agent verifier for long-horizon tasks. Across different task types and policy capability levels, we conduct in-depth analysis and experiments on the core challenges of reward design and how to more effectively leverage reward signals. Experiments show that targeted verification design can effectively suppress reward hacking, improve task completion quality, and achieve significant gains across multiple internal and public benchmarks. These experiences collectively point to a core observation: no fixed reward function can remain effective as policy capability continues to grow; and verification must co-evolve with the generator.
- JetSpec: Breaking the Scaling Ceiling of Speculative Decoding with Parallel Tree Drafting
Speculative decoding (SD) accelerates autoregressive Large Language Models (LLMs) by drafting multiple tokens and verifying them in parallel, but it faces a scaling limitation: increasing the draft budget improves speed only when acceptance remains high and drafting overhead stays low. This ceiling has been difficult to break because prior head-based SD methods face a causality-efficiency dilemma. Autoregressive drafters produce path-conditioned candidates that are effective for tree speculative decoding with higher acceptance length, but their drafting cost grows with tree depth. Bidirectional block-diffusion drafters generate all positions in one pass, but their branch-agnostic marginals can form individually plausible yet mutually inconsistent trees, wasting budget and reducing acceptance. We propose JetSpec, a head-based SD framework that combines one-forward drafting efficiency with branch-wise causal conditioning. JetSpec trains a causal parallel draft head over fused hidden states from the frozen target model, producing candidate trees whose scores align with the target model's autoregressive factorization. This enables JetSpec to convert larger draft budgets into longer accepted prefixes and higher end-to-end speedup. Across math, coding, and chat benchmarks on dense and MoE Qwen3 models, JetSpec consistently outperforms bidirectional-head and tree-based SD baselines. On H100 GPUs, JetSpec achieves up to 9.64x speedup on MATH-500 and 4.58x on open-ended conversational workloads, with further latency gains demonstrated through vLLM integration under realistic serving loads. Our code and models are available at https://github.com/hao-ai-lab/JetSpec.
- GUI vs. CLI: Execution Bottlenecks in Screen-Only and Skill-Mediated Computer-Use Agents
Computer-use agents can execute software tasks through either graphical interfaces or programmatic command interfaces, but existing evaluations confound interaction modality with differences in tasks, initial states, verifiers, and permitted actions. We introduce a matched execution-layer benchmark of 440 desktop tasks across 18 applications and 12 workflow categories, where screen-only GUI agents and skill-mediated CLI agents receive identical goals, states, and final-state verifiers while being restricted to modality-native actions. In this controlled setting, the strongest GUI agent reaches a 59.1% full pass rate, outperforming the strongest original-skill CLI agent at 48.2%; however, verifier-guided skill augmentation raises CLI success to 69.3%, showing that much of the CLI deficit comes from incomplete skill coverage rather than model capability alone. These results suggest that GUI and CLI expose different execution bottlenecks: GUI agents are limited by reliable grounded interaction over long-horizon workflows, whereas CLI agents are limited by the coverage and scalability of their skill interfaces.
- Why Multi-Step Tool-Use Reinforcement Learning Collapses and How Supervisory Signals Fix It
Tool use enables large language models (LLMs) to perform complex tasks, and recent agentic reinforcement learning (RL) methods show promise for enhancing model capabilities. However, RL alone often leads to instability or limited gains in tool-use tasks. In our experiments, some models exhibit catastrophic collapse, where performance abruptly drops and tool-invocation structures fail. The analysis reveals that these failures stem from unexpected probability spikes in specific control tokens, disrupting structured execution, yet the underlying tool-use capability remains intact, merely obscured by specific formats. To address this, we systematically investigate a diverse set of supervisory signals, including off-policy supervision, hint-based guidance, erroneous example supervision, and others, applied under both synchronous and interleaved training schemes. We find that interleaving supervised fine-tuning (SFT) with RL substantially improves stability, but exhibits degraded performance under format and content out-of-distribution (OOD) evaluation. We also analyze the impact of learning rates and generalization across settings. These results highlight the importance of understanding RL failures and demonstrate how diverse supervisory signals can guide exploratory learning, enabling robust training of LLMs for complex, multi-step tool-use tasks. Our Code is available at https://github.com/hypasd-art/Tool-RL-Box.
- Fast LeWorldModel
Joint-Embedding Predictive Architectures (JEPAs), including recent LeWorldModel (LeWM), have become a promising foundation for reconstruction-free visual world models. For visual planning, however, LeWM evaluates candidate action sequences by repeatedly applying a local one-step latent transition model. This autoregressive rollout makes planning computationally expensive and exposes the predicted trajectory to accumulated latent errors as the horizon grows. We propose Fast LeWorldModel (Fast-LeWM), a fast latent world model that replaces repeated local rollout with action-prefix prediction. Given the current latent and a candidate action sequence, Fast-LeWM encodes its prefixes and predicts the future latents reached after executing those prefixes in parallel. By making action prefixes the basic prediction unit, Fast-LeWM directly models action effects accumulated to different extents over multiple horizons. This prefix-level supervision forces the model to learn how states continuously evolve under different action prefixes, rather than only fitting one-step state transitions. During planning, the predictor can use the last prefix token from the encoded action sequence to evaluate the corresponding future latent without explicitly rolling through each intermediate imagined state. Across multiple tasks, Fast-LeWM improves average success over LeWM while substantially reducing planning time, achieving lower open-loop latent loss whose growth becomes significantly slower as the rollout horizon increases.
- Running the Gauntlet: Re-evaluating the Capabilities of Agents Beyond Familiar Environments
As agentic systems continue to evolve and are widely deployed in real-world scenarios, there is a growing demand to faithfully evaluate their capabilities. However, current benchmarks are typically built on popular applications with relatively simple tasks and focus on a narrow set of capabilities while overlooking broader dimensions, resulting in saturated performance on modern agents and failing to probe their limitations. To this end, we introduce GauntletBench, a web-based benchmark for evaluating agent generalisation in challenging scenarios, focusing on three underexplored capabilities (temporal perception, graphical understanding, and 3D reasoning), across five less-covered professional applications (Video Editor, Workflow Builder, 3D Modeller, Flight Analyser, and Circuit Designer), each with 20 vision-intensive tasks (100 in total). Our benchmark provides a modular pipeline that comprises an environment compatible with both open- and closed-source agent frameworks, a controlled web-based application, a well-structured task suite, and an automated evaluation engine with diverse metrics. Contrary to widespread expectations, our empirical results reveal that frontier agentic systems remain far from achieving human-level performance. Even the state-of-the-art agent achieves only a 19.1% success rate on our GauntletBench, highlighting the limitations in these overlooked capabilities and generalisation. By comparison, non-expert human annotators achieve over 80% success on our challenging yet feasible tasks, revealing the substantial gap between current agent capabilities and those required for complex real-world scenarios.
- LISA: Likelihood Score Alignment for Visual-condition Controllable Generation
The prevalent dual-branch paradigm, i.e., training a side network to encode visual conditions and fusing its intermediate-layer features to a frozen pretrained main network, has shown remarkable success in visual-condition controllable generation. Despite its widespread adoption, the role of the side branch and its training efficiency remain underexplored. In this paper, we first revisit this mainstream paradigm through the lens of score-based generative modeling: 1) The main network preserves visual perceptual quality by providing a prior unconditional score. 2) The side network steers conditional control by implicitly contributing a likelihood score. Guided by this perspective, we propose LIkelihood Score Alignment (LISA), an effective regularization method that explicitly aligns the intermediate feature of the side network with an approximated likelihood score. Specifically, we first hook features from a designated layer of the side network and project them into the score latent space by a lightweight decoder. Then, we construct an approximated likelihood score target and calculate the distance between the decoder's output and this target as an additional regularization loss. Finally, we jointly optimize the side network and decoder with both standard diffusion loss and our regularization loss. Experiments across various image/video tasks, architectures, and diffusion/flow models demonstrated that LISA can not only consistently accelerate the training convergence and improve final synthetic results, but also encourage the side network's features to be more disentangled for conditional modeling with negligible additional training cost and zero extra inference cost.
- In-Context World Modeling for Robotic Control
Modern Vision-Language-Action (VLA) models often fail to generalize to novel setups, such as altered camera viewpoints or robot morphologies, because they are typically conditioned only on current observations and language instructions. By ignoring the underlying system configuration as a variable, these models implicitly assume a fixed execution context encountered during training, necessitating data-intensive fine-tuning for any new environment. In this work, we introduce In-Context World Modeling (ICWM), a framework that treats system identification as an in-context adaptation problem. ICWM enables robot policies to autonomously infer essential system variables from a short history of self-generated, task-agnostic interactions. Unlike traditional In-Context Learning that uses demonstrations to specify what task to perform, ICWM leverages the context window to understand how the system operates. By processing these interactions before task execution, the model implicitly captures the world dynamics of the current system, enabling adaptation to novel configurations without parameter updates. Extensive experiments in simulation and on real-world robot platforms demonstrate that ICWM significantly outperforms standard VLA baselines on novel camera viewpoints.
- Confidence-Aware Tool Orchestration for Robust Video Understanding
Video reasoning language models implicitly assume that every input frame is equally reliable. This leads to what we term the Blind Trust Problem: under realistic perturbations such as motion blur, glare, or occlusion, frontier video reasoning models can suffer 15-30%p accuracy drops on real-world embodied benchmarks, while remaining unaware that their visual evidence has been degraded. To address this challenge, we propose Robust-TO, an agentic video understanding framework that explicitly integrates per-frame trustworthiness into every stage of reasoning. Robust-TO organizes heterogeneous visual perception tools under a unified evidence interface. Each tool receives a sub-query derived from the original question and a set of trustworthy frames selected by the reliability-relevance score. It returns evidence in a shared format: a concrete prediction (e.g., a bounding box, motion trajectory, recognized text, or action label), temporal grounding, and a calibrated reliability score. During reasoning, these calibrated scores guide evidence weighting in a three-tier synthesis process (high/medium/low) and define a confidence-cost GRPO reward that jointly optimizes correctness, evidence reliability, and efficiency. On two video reasoning benchmarks spanning eight tasks, Robust-TO achieves 56.4% average accuracy on clean inputs, surpassing the strongest open-source baseline by 10.6%p and outperforming Gemini-2.5-Pro (46.2%). Under five realistic corruption types, Robust-TO maintains 54.3% average accuracy, 5.8%p above the strongest open-source baseline, while exhibiting the smallest clean-to-corrupted accuracy drop among all compared methods.
- CoffeeBench: Benchmarking Long-Horizon LLM Agents in Heterogeneous Multi-Agent Economies
As LLM agents become capable of increasingly long-horizon tasks, evaluating their performance in economic systems is becoming increasingly important. Unlike existing benchmarks that primarily evaluate a single agent interacting with a passive environment, economic systems are inherently multi-agent, requiring autonomous agents to communicate, negotiate, and transact while pursuing their own objectives over extended periods. We introduce CoffeeBench, a benchmark for evaluating LLM agents in a long-horizon multi-agent economy composed of heterogeneous firms. In CoffeeBench, two farmers, two roasters, and two retailers autonomously operate their businesses over a 90-day simulation, each seeking to maximize cumulative net income through communication and transactions while managing cash, inventory, and pricing. The evaluated model controls one coffee roaster, while the remaining firms are controlled by fixed reference agents. Across several recent open-weight and proprietary LLMs, all models outperform a passive baseline that takes no actions, with most achieving positive net income. Analysis of agent behavior reveals substantial differences in long-horizon economic interaction: higher-performing models communicate more actively with other firms, whereas Claude~Haiku~4.5 exhibits an idle-drift failure mode, repeatedly choosing inaction despite producing coherent assessments and plans. We release our code and agent trajectories to support future research.
- Discretizing Reward Models
Despite their widespread use, the role of reward models in shaping reinforcement learning is poorly understood. Reward models offer a tempting promise: they automatically estimate response quality in the absence of verifiers or human judges. Unlike "verifiable rewards" which typically produce binary scores, reward models typically produce continuous scores, allowing them to be sensitive to fine-grained differences in responses. However, we show this apparent strength is a serious weakness: many popular reward models are oversensitive, assigning different scores to equally good responses. Theoretically, we show that seemingly perfect reward models can be highly oversensitive; empirically, this oversensitivity can lead to bad policies. In place of existing notions of "reward model accuracy," we propose evaluating reward models using distinct measures of "discriminative ability" and "specificity" (the complement of oversensitivity). As a solution, we describe a training-free algorithm that uses Monte Carlo dropout on any neural reward model to produce discrete reward clusters. Theoretically, we prove there exist discretizations that reduce oversensitivity at minimal expense of discriminative ability; empirically we show, in both controlled and natural RL settings, that discretizing rewards leads to less reward hacking and better policies than training on the original rewards.
Techmeme(15)
- How AI-native law firms use "management services organisation" structures to access capital historically barred from US law firms, including PE and VC funds (Stephen Foley/Financial Times)
Stephen Foley / Financial Times : How AI-native law firms use “management services organisation” structures to access capital historically barred from US law firms, including PE and VC funds — Interest in a model that separates legal casework from other operations exploded alongside the new tech
- Sources: Zuckerberg urged execs to explore Polymarket and Kalshi partnerships, as the Arena prediction app targets 100M monthly active "predictors" aged 18-34 (Mike Isaac/New York Times)
Mike Isaac / New York Times : Sources: Zuckerberg urged execs to explore Polymarket and Kalshi partnerships, as the Arena prediction app targets 100M monthly active “predictors” aged 18-34 — Mr. Zuckerberg's plans for Arena, a prediction markets app that Meta is building, also include appealing to 18- to 34-year-old users.
- AWS hikes prices for Nvidia GPUs in its EC2 Capacity Blocks service, which let businesses rent AI compute in advance, by 20%; Trainium chip pricing is unchanged (Catherine Perloff/The Information)
Catherine Perloff / The Information : AWS hikes prices for Nvidia GPUs in its EC2 Capacity Blocks service, which let businesses rent AI compute in advance, by 20%; Trainium chip pricing is unchanged — Amazon Web Services is raising the price for its AI workload rental service by 20%, the company said on Friday …
- Oracle's stock fell 19% this week, the steepest weekly drop since a 20% plunge in August 2001, amid concerns about its debt load and AI investments (Jordan Novet/CNBC)
Jordan Novet / CNBC : Oracle's stock fell 19% this week, the steepest weekly drop since a 20% plunge in August 2001, amid concerns about its debt load and AI investments — Oracle just wrapped up its worst week on Wall Street in 25 years as concerns continue to mount about the software company's debt load …
- The FTC fast-tracks approval for SpaceX to acquire Mesh, which raised a $50M Series A in February to make high-efficiency optical transceivers for data centers (Bloomberg)
Bloomberg : The FTC fast-tracks approval for SpaceX to acquire Mesh, which raised a $50M Series A in February to make high-efficiency optical transceivers for data centers — Elon Musk received a regulatory greenlight to acquire startup Mesh Optical Technologies, a company founded by former SpaceX engineers working …
- OpenAI says GPT-5.6 Sol and Terra were capable of identifying vulnerabilities but were unable to execute autonomous, end-to-end attacks against hardened targets (OpenAI)
OpenAI : OpenAI says GPT-5.6 Sol and Terra were capable of identifying vulnerabilities but were unable to execute autonomous, end-to-end attacks against hardened targets — GPT-5.6 is a new family of three models: Sol, our new flagship model; Terra, a capable lower-cost option; and Luna, our fastest and most cost-efficient model.
- Sources: Paul Meade, Apple's top executive in charge of Vision Pro and smart glasses efforts, is leaving for OpenAI to work on the company's AI-powered devices (Mark Gurman/Bloomberg)
Mark Gurman / Bloomberg : Sources: Paul Meade, Apple's top executive in charge of Vision Pro and smart glasses efforts, is leaving for OpenAI to work on the company's AI-powered devices — Apple Inc.'s top executive in charge of the Vision Pro headset and the company's smart glasses efforts is leaving for OpenAI …
- Uber expands the list of criminal convictions that disqualify US drivers and expands the background-check timeline, possibly removing ~0.5% of active US drivers (Natalie Lung/Bloomberg)
Natalie Lung / Bloomberg : Uber expands the list of criminal convictions that disqualify US drivers and expands the background-check timeline, possibly removing ~0.5% of active US drivers — Uber Technologies Inc. is tightening driver background checks in the US and applying the new standards retroactively to existing workers …
- Sources: Russian hackers were behind a 2025 ransomware attack on Jaguar Land Rover that used "mind-blowing" encryption and cost UK's economy an estimated $2.5B (New York Times)
New York Times : Sources: Russian hackers were behind a 2025 ransomware attack on Jaguar Land Rover that used “mind-blowing” encryption and cost UK's economy an estimated $2.5B — A loose collective of cybercriminals initially took credit for crippling Jaguar Land Rover last year.
- OpenAI appoints ex-Uber India head Prabhjeet Singh as its first managing director for India, to scale its presence in its second-largest market after the US (Jagmeet Singh/TechCrunch)
Jagmeet Singh / TechCrunch : OpenAI appoints ex-Uber India head Prabhjeet Singh as its first managing director for India, to scale its presence in its second-largest market after the US — OpenAI is making yet another big, visible bet on India. It has appointed former Uber India and South Asia president Prabhjeet Singh …
- GPT-5.6 Sol matches Mythos Preview on ExploitBench, adds Ultra mode with subagents for complex workflows, and max reasoning for deep problem-solving (OpenAI)
OpenAI : GPT-5.6 Sol matches Mythos Preview on ExploitBench, adds Ultra mode with subagents for complex workflows, and max reasoning for deep problem-solving — We're beginning a limited preview of the GPT-5.6 series: Sol, our flagship model; Terra, a balanced model for everyday work; and Luna, a fast and affordable model.
- OpenAI hopes to make GPT-5.6 generally available in the coming weeks and says "this kind of government access process" should not become the long-term default (Amrith Ramkumar/Wall Street Journal)
Amrith Ramkumar / Wall Street Journal : OpenAI hopes to make GPT-5.6 generally available in the coming weeks and says “this kind of government access process” should not become the long-term default — Company says White House review of AI releases shouldn't become long-term default; ban on Anthropic's Mythos model remains
- OpenAI releases three versions of GPT-5.6, called Sol, Terra, and Luna, as a limited preview to ~20 companies, with participants disclosed to the US government (Axios)
Axios : OpenAI releases three versions of GPT-5.6, called Sol, Terra, and Luna, as a limited preview to ~20 companies, with participants disclosed to the US government — OpenAI is rolling out GPT-5.6 Friday, but says it's limiting access to all three versions of the new model at the behest of the U.S. government.
- President Trump threatens to impose a 100% tariff on any country that imposes a digital services tax on US companies (Kevin Breuninger/CNBC)
Kevin Breuninger / CNBC : President Trump threatens to impose a 100% tariff on any country that imposes a digital services tax on US companies — President Donald Trump on Friday threatened to impose a “100% TARIFF” on the goods of any country that imposes a digital services tax on U.S. companies.
- Sources: Revolut told new hires they'll have to work in office at least three days a week from next year, retreating from its long-held remote-first approach (Financial Times)
Financial Times : Sources: Revolut told new hires they'll have to work in office at least three days a week from next year, retreating from its long-held remote-first approach — Europe's most valuable fintech will require junior staff to spend at least three days a week in the office from next year
Solidot(15)
- 从赞美美德到歌颂堕落
英国伦敦大学玛丽皇后学院的研究人员分析了 1960-2023 年间发行的逾 38 万首歌曲的歌词后发现,流行音乐中使用的情感语言和道德语言发生了显著变化。表达关怀和体面等道德美德的词语随时间推移变得越来越少见,而与伤害、欺骗、颠覆和堕落相关的语言逐渐增多。研究人员指出,“音乐不仅仅是娱乐。它是社会讲述自身故事的方式之一。通过分析几十年来的歌词,我们可以开始看到情感表达和道德叙事随时间如何演变。”研究还发现,女性艺术家更多与关爱和忠诚等美德联系在一起,而男性艺术家和男女混合组合则更多与反映伤害、颠覆和堕落等负面主题的联系在一起。
- 大型猿类笑声节奏与人类相似,存在了 1500 万年
根据发表在《Communications Biology》期刊上的一项研究,大猿的笑声节奏可能与现代人类相似,而这一现象已持续了至少 1500 万年。研究结果还表明,在大猿的演化过程中,笑声变得更快、变化更多,且越来越受到所处情境的影响。所有大猿(人科动物)都会笑,包括与人类亲缘关系较近的物种,如倭黑猩猩,以及亲缘关系较远的物种,如婆罗洲猩猩。然而笑声的节奏随时间如何演变,及其可能与人类语言的演化有何关联,此前尚不清楚。 在研究中,英国华威大学的研究人员分析了 4 只婆罗洲猩猩(Pongo pygmaeus)、两只大猩猩(Gorilla gorilla)、3只倭黑猩猩(Pan paniscus)、4只黑猩猩(Pan troglodytes)以及4个人的笑声录音,这些个体的年龄在6个月至7岁之间。 科学家研究了140段笑声序列,并测量了每次发声之间的时间间隔。研究发现,所有物种的笑声都遵循一种规律的节律模式,连续发声之间的间隔均匀。由于这种模式在所有研究物种中均存在,研究人员推测,这种有节奏的笑声可能早在 1500 万年前就已存在于它们的共同祖先身上。 他们还推断,随着时间的推移,笑声变得更快、更多样化,比如人类会根据情境改变笑声的速度,如被挠痒时发出的笑声比玩耍时更快,而其他猿类则不会。此外,与人类亲缘关系越近的猿类,其笑声节奏的变化性就越大。 这些发现表明,在大猿和人类的演化过程中,发声的灵活性和控制力可能逐渐增强,作者推测这可能促成了语言的出现,未来需要通过更大样本量的研究证实这些发现。
- 每小时走五分钟有助于抵消久坐的危害
久坐是一种健康风险,但对久坐行为的干预需要考虑可行性和有效性。根据发表在《British Journal of Sports Medicine》期刊上的一项研究,研究人员评估了每隔 30 分钟、60 分钟或 120 分钟就站起来步行 5 分钟的干预措施。有 19342 名成年人参与了研究,其中 11484 人分成三组执行上述三种不同的干预方法。结果显示,所有干预组参与者报告疲劳和负面情绪显著降低,正面情绪显著提升。在考虑了可行性和有效性等因素之后,研究人员指出每小时站起来走 5 分钟在可行性和有效性之间取得了最佳平衡。
- 美光与其大客户签署了长达五年的供货协议
美光 CEO Sanjay Mehrotra 在最新的财报电话会议上披露,该公司与 16 家大客户签署了“战略客户协议”,大部分协议涵盖的时间从 2026 年一直持续到 2030 年,客户承诺购买一定数量的产品,支付价格处于设有最低和最高价格的定价区间内。这意味着如果内存价格进一步上涨,客户基本不会受到的影响。美光 CEO 称,客户意识到,内存和存储设备的供应短缺需要相当长的时间才能缓解。美光预计供应将在 2028 年逐步改善,但目前无法预测内存供应何时才能赶上持续增长的需求。他说客户同意预付款项,该公司将利用这笔资金扩建晶圆厂。
- 晚上刷手机与眼疾风险增加相关
上海交通大学医学院附属第一人民医院的一支研究团队利用了英国生物样本库(UK Biobank)的数据,最终纳入了 82826 名基线时无眼部疾病的参与者。这些参与者均连续 7 天佩戴了配有高分辨率光传感器的腕带式加速度计,以客观记录其个人光照暴露情况。研究结果显示,在晚间时段(晚上20:00至23:30),当参与者所处环境的平均光照强度超过1000勒克斯时,与其后眼部退行性疾病的发病风险显著升高相关。其中,年龄相关性黄斑变性的患病风险增加了31%,白内障风险增加了18%,而原发性开角型青光眼的风险则大幅增加了47%。研究人员还观察到了显著的时间-剂量反应关系。在极高强度(如超过2250勒克斯)的光照下暴露时间越长,发生整体年龄相关性眼病和青光眼的风险就越高。
- 《Arma: Cold War Assault》重制版开源
Bohemia Interactive 在 GPL v3.0 许可证下公开了《Arma: Cold War Assault》重制版源代码,项目托管在 GitHub 上。《Arma: Cold War Assault》于 2001 年以《Operation Flashpoint: Cold War Crisis》的名字发布,游戏提供了 12.5 km × 12.5 km 开放世界地图,它对于现代化立体化野战的真实模拟为它赢得了一大批军事游戏爱好者拥趸。游戏的开放性以及强大的脚本编程能力,也给它带来了大量 MOD。重制版代码已现代化至 C++20,使用 CMake 和 Clang 构建,并支持 Windows x64 和 Linux x64 等平台。Bohemia Interactive 称,游戏代码是自由软件,但名字和商标并不能自由使用,而且模型、纹理、音效、任务和语音等游戏数据也都没有公开,需要另外购买。
- 微软再次延长 Windows 10 免费安全更新一年
Windows 10 于 2025 年 10 月 14 日结束支持,微软原本此后不再提供免费的安全更新,但 Windows 10 仍然有大量用户使用,软件巨人去年宣布将提供免费安全更新一年。如今还有几个月时间才到期,微软又将免费安全更新延长一年,Windows 10 用户不需要做任何事就能再享受一年免费安全更新。最新的扩展安全更新将于 2027 年 10 月 12 日到期。根据 StatCounter 的统计,有 26% 的 PC 仍然运行 Windows 10,由于微软提高了 Windows 11 的硬件需求,大部分 Windows 10 PC 无法升级到 Windows 11。
- 特朗普政府要求 OpenAI 分阶段发布新模型
出于安全担忧特朗普政府要求 OpenAI 分阶段发布新的 GPT-5.6 模型。The Information 报道,新模型最初将提供给一小部分合作伙伴,政府将在预览期内“逐个批准客户的访问权限”。报道称,这一要求源于国家网络安全总监办公室和科技政策办公室之间的对话。
- 美国国防部恢复了疫苗强制接种要求
在美国一个空军基地逾 200 名新兵感染流感之后,美国海陆空兵种恢复了新兵疫苗接种要求。两个月前国防部长 Pete Hegseth 取消了数十年来一直沿用的流感疫苗接种强制令,理由是不合理,取消强制令将恢复军人的“自由”。但历史早就证明,兵营等封闭环境容易滋生病菌,而传染病一直是军队战斗力的大敌。最近德州 Lackland 空军基地报告了 222 例确诊流感病例和 4 例住院病例,其中新兵 Keon McDaniel 死亡,但暂时不清楚其死因是否与流感有关。该基地只有约 40% 的新兵接种了疫苗,这波疫情爆发始于 6 月初。五角大楼发言人称,五角大楼已批准陆军、海军、空军、国家安全局和国防卫生局豁免于 Hegseth 的流感疫苗自愿接种政策。
- LastPass 再次披露用户数据泄漏
密码管理器 LastPass 再次披露了用户数据泄漏事故,这一次是它的外部合作伙伴 Klue 导致的,黑客访问了客户信息和支持案例数据。LastPass 称,被访问的数据包括客户姓名、电话号码、电子邮件地址和实际地址,以及支持案例数据和销售相关数据。它表示在获悉数据泄漏之后,它立即撤回了员工对 Klue 的访问,轮换了暴露的 API 令牌,通知了执法部门。LastPass 警告客户对钓鱼攻击或社交工程攻击提高警惕,公布了与攻击者相关的 IP 地址和电邮域名。
- 苹果产品正式涨价
在苹果 CEO 库克提前透风数天之后,苹果产品全系列涨价,涨幅少则 50 美元多则上千美元。即使是苹果也无法再自己承担高昂的内存和存储器成本。 苹果在一份声明中表示,“我们从未见过一个组件价格以如此之快、如此之大的幅度上涨。迄今为止,我们一直在尽力为客户抵挡这些涨价,但现在我们已经到了不得不开始提高部分产品价格的地步,包括今天 iPad 和 Mac 的涨价。我们知道这不是一个好消息,我们正在不遗余力地寻找解决方案。”
- 卵巢绝经后可能转变为具有免疫功能的器官
生殖专家曾认为,女性绝经后,卵巢会像阑尾一样变得无用。在对 50-75 岁女性的卵巢进行检查时,研究人员发现该器官的细胞会随着年龄增长产生不同的蛋白质。为了更深入研究卵巢的年龄相关变化,研究人员转向了实验小鼠。尽管小鼠不会出现雌激素急剧下降等人类更年期特有特征,但这些动物在 2 年生命周期的后期,卵巢功能也会停止。研究人员分别从年轻小鼠、处于生殖期末期的小鼠以及“绝经”后小鼠体内摘取了卵巢。对每只动物,他们对其中一侧卵巢的 RNA 进行了测序,以测量基因表达情况。对另一侧卵巢,他们对组织进行了显微镜下视觉分析,以识别不同的细胞群,并测量纤维化的发展程度,纤维化是指随着年龄增长自然发生的硬化组织堆积现象。但对“绝经”后卵巢的分析显示,其中各类免疫细胞的水平均高于年轻小鼠的典型水平。此外,老年小鼠的卵巢中,编码各种促炎化合物的基因活性更高,这些免疫分子可能被分泌到血液中并随血液流向身体其他部位。尚不清楚衰老的卵巢究竟是真正发挥着免疫信号传导的作用,还是仅仅是免疫细胞的意外聚集地。这一发现或许有助于解释,为何女性尽管寿命更长,但随着年龄增长,健康状况往往不如男性。绝经后的卵巢可能会分泌某些分子,导致女性在更年期出现慢性炎症。
- 中国科学家研发出降低镉吸收能力的水稻
镉不是植物生长的必要元素,但其通过土壤—水稻—食物链进入人体长期摄入后,会引发肾功能损伤、癌症、骨质疏松等严重健康问题。OsNramp5 是水稻中负责从根部往茎部运输镉的关键转运蛋白,但也同时负责锰离子等植物生长必需的金属离子的运输,敲除 OsNramp5 可以有效降低镉的运输,但也会造成其他必要金属元素的缺乏,使水稻大幅减产。根据发表在 PNAS 期刊上的研究,中国科学院遗传与发育生物学研究所等通过碱基替换技术,靶向编辑水稻负责吸收镉元素的核心转运基因 OsNramp5,创制了优异人工等位变异,发现了特异降低镉吸收而不影响锰等其他关键金属离子吸收的新机制,解决了低镉与高产难以兼顾的难题,为镉污染农田安全生产主粮提供了可落地的育种新方案。
- OpenAI 宣布了专用于推理的自研 AI 芯片 Jalapeño
OpenAI 宣布了首款自研芯片 Jalapeño,由 OpenAI 与博通公司合作设计和制造,专门用于 AI 推理。OpenAI 没有披露技术方面的细节,只是称初步测试显示每瓦性能显著优于目前最先进的同类产品。OpenAI 与博通是在去年 10 月正式宣布合作,OpenAI 声称利用其模型加速了芯片的设计。自研 AI 芯片旨在减少对英伟达的依赖,Google 和亚马逊也都开发了自研芯片。
- 英国维基百科员工寻求成立工会
英国维基百科员工率先寻求成立工会。维基媒体基金会英国员工于 6 月 24 日星期三致函管理层,请求由 Communication Workers Union(CWU)下辖分支 United Tech and Allied Workers (UTAW) 代表他们的权利。员工呼吁维基基金会作为这家全球非营利机构的实际管理者,履行其领导层最近作出的公开承诺,即保障员工组织和组建工会的权利。逾千名维基志愿者和社区成员签署了请愿书声援这些员工。英国是仅次于美国的维基媒体基金会第二大员工来源国。
OrangeBot Weekly
5 Claude Code skills worth using each week — with my verdict on what’s actually good. No hype.