TEXT VIEW · TODAY'S DIGEST · 36 HEADLINES ACROSS 8 SOURCES

Startup Archive(0)

No items yet for today.

App Store Rankings(0)

No items yet for today.

ISSUE 0854
SUN, MAY 3, 2026
Discover the best information organized by OrangeBot.AI
TODAY · SUN, MAY 3, 2026

The web,
read by a bot.

Ten sources — Hacker News, Product Hunt, HuggingFace, Techmeme and more — filtered, tagged, and summarized every morning for builders who don’t have time to scroll.

NEWChrome extension: save posts from Twitter/X in one click.Install →
01

AI DIGEST

UPDATED DAILY · EDITOR'S PICK
01.00
AI DIGEST

AI新闻摘要

May 3, 2026

Here is a summary of today's key news events.

AI Sector Faces Intense Scrutiny Ahead of Major IPOs

What: Major AI companies like OpenAI are preparing for public offerings (IPOs), causing large index funds to change their investment rules in anticipation. This comes as OpenAI's CEO faces mounting pressure, and the company is criticized for its chatbot generating dangerous content. Meanwhile, AI technology continues to boost global stock markets and is being adopted by industries like pharmaceuticals to accelerate drug discovery.

Ukraine's Strikes on Russian Oil Have Limited Financial Impact

What: Kyiv is continuing its campaign of strikes against Russian oil infrastructure. However, reports suggest the attacks have had only a modest impact on Moscow's finances, which remain supported by a steady flow of oil revenue. The conflict, now in its tenth week, continues to contribute to high global energy prices.

UK's Main Political Parties Face Voter Discontent

What: The UK’s Labour party is bracing for what could be its worst-ever performance in local elections. Polls show that both the Labour and Conservative parties have historically low support, with their combined share representing barely a third of voters. This signals significant political uncertainty and dissatisfaction with the two-party system.

Corporate Deals and Rising Costs Impact Multiple Sectors

What: Several major companies are navigating market shifts. Royal Bank of Canada and Bank of Montreal are looking to sell their payment processing businesses. Meanwhile, various manufacturing sectors are facing rising prices for supplies like aluminum and plastic. In gaming, Nintendo warns that higher memory chip prices could increase the cost of its upcoming Switch 2 console.

Middle East Conflict Disrupts Regional Politics

What: Tensions in the Middle East have forced Saudi Arabia to postpone its response to the UAE's decision to leave the OPEC oil cartel. Elsewhere, former President Trump has retaliated against German opposition leader Friedrich Merz for his criticism of the US-Israeli war effort, highlighting the conflict's broad international political impact.

02

ON THE WIRE

6 SOURCES
02

HACKER NEWS

02.00
HACKER NEWS

Hacker News - May 3, 2026

Hacker News Feed: Highlighting key posts and discussions.

Do_not_track

(donottrack.sh)

408126
Dav2d

(code.videolan.org)

532150
NetHack 5.0.0

(nethack.org)

471156
Russia Poisons Wikipedia

(www.bettedangerous.com)

255200
Ask.com has closed

(www.ask.com)

454225
03

HUGGINGFACE

03.00
HUGGINGFACE

huggingface.title - May 3, 2026

huggingface.description

Heterogeneous Scientific Foundation Model Collaboration

Agentic large language model systems have demonstrated strong capabilities. However, their reliance on language as the universal interface fundamentally limits their applicability to many real-world problems, especially in scientific domains where domain-specific foundation models have been developed to address specialized tasks beyond natural language. In this work, we introduce Eywa, a heterogeneous agentic framework designed to extend language-centric systems to a broader class of scientific foundation models. The key idea of Eywa is to augment domain-specific foundation models with a language-model-based reasoning interface, enabling language models to guide inference over non-linguistic data modalities. This design allows predictive foundation models, which are typically optimized for specialized data and tasks, to participate in higher-level reasoning and decision-making processes within agentic systems. Eywa can serve as a drop-in replacement for a single-agent pipeline (EywaAgent) or be integrated into existing multi-agent systems by replacing traditional agents with specialized agents (EywaMAS). We further investigate a planning-based orchestration framework in which a planner dynamically coordinates traditional agents and Eywa agents to solve complex tasks across heterogeneous data modalities (EywaOrchestra). We evaluate Eywa across a diverse set of scientific domains spanning physical, life, and social sciences. Experimental results demonstrate that Eywa improves performance on tasks involving structured and domain-specific data, while reducing reliance on language-based reasoning through effective collaboration with specialized foundation models.

192
Visual Generation in the New Era: An Evolution from Atomic Mapping to Agentic World Modeling

Recent visual generation models have made major progress in photorealism, typography, instruction following, and interactive editing, yet they still struggle with spatial reasoning, persistent state, long-horizon consistency, and causal understanding. We argue that the field should move beyond appearance synthesis toward intelligent visual generation: plausible visuals grounded in structure, dynamics, domain knowledge, and causal relations. To frame this shift, we introduce a five-level taxonomy: Atomic Generation, Conditional Generation, In-Context Generation, Agentic Generation, and World-Modeling Generation, progressing from passive renderers to interactive, agentic, world-aware generators. We analyze key technical drivers, including flow matching, unified understanding-and-generation models, improved visual representations, post-training, reward modeling, data curation, synthetic data distillation, and sampling acceleration. We further show that current evaluations often overestimate progress by emphasizing perceptual quality while missing structural, temporal, and causal failures. By combining benchmark review, in-the-wild stress tests, and expert-constrained case studies, this roadmap offers a capability-centered lens for understanding, evaluating, and advancing the next generation of intelligent visual generation systems.

78
Co-Evolving Policy Distillation

RLVR and OPD have become standard paradigms for post-training. We provide a unified analysis of these two paradigms in consolidating multiple expert capabilities into a single model, identifying capability loss in different ways: mixed RLVR suffers from inter-capability divergence cost, while the pipeline of first training experts and then performing OPD, though avoiding divergence, fails to fully absorb teacher capabilities due to large behavioral pattern gaps between teacher and student. We propose Co-Evolving Policy Distillation (CoPD), which encourages parallel training of experts and introduces OPD during each expert's ongoing RLVR training rather than after complete expert training, with experts serving as mutual teachers (making OPD bidirectional) to co-evolve. This enables more consistent behavioral patterns among experts while maintaining sufficient complementary knowledge throughout. Experiments validate that CoPD achieves all-in-one integration of text, image, and video reasoning capabilities, significantly outperforming strong baselines such as mixed RLVR and MOPD, and even surpassing domain-specific experts. The model parallel training pattern offered by CoPD may inspire a novel training scaling paradigm.

44
ExoActor: Exocentric Video Generation as Generalizable Interactive Humanoid Control

Humanoid control systems have made significant progress in recent years, yet modeling fluent interaction-rich behavior between a robot, its surrounding environment, and task-relevant objects remains a fundamental challenge. This difficulty arises from the need to jointly capture spatial context, temporal dynamics, robot actions, and task intent at scale, which is a poor match to conventional supervision. We propose ExoActor, a novel framework that leverages the generalization capabilities of large-scale video generation models to address this problem. The key insight in ExoActor is to use third-person video generation as a unified interface for modeling interaction dynamics. Given a task instruction and scene context, ExoActor synthesizes plausible execution processes that implicitly encode coordinated interactions between robot, environment, and objects. Such video output is then transformed into executable humanoid behaviors through a pipeline that estimates human motion and executes it via a general motion controller, yielding a task-conditioned behavior sequence. To validate the proposed framework, we implement it as an end-to-end system and demonstrate its generalization to new scenarios without additional real-world data collection. Furthermore, we conclude by discussing limitations of the current implementation and outlining promising directions for future research, illustrating how ExoActor provides a scalable approach to modeling interaction-rich humanoid behaviors, potentially opening a new avenue for generative models to advance general-purpose humanoid intelligence.

35
Intern-Atlas: A Methodological Evolution Graph as Research Infrastructure for AI Scientists

Existing research infrastructure is fundamentally document-centric, providing citation links between papers but lacking explicit representations of methodological evolution. In particular, it does not capture the structured relationships that explain how and why research methods emerge, adapt, and build upon one another. With the rise of AI-driven research agents as a new class of consumers of scientific knowledge, this limitation becomes increasingly consequential, as such agents cannot reliably reconstruct method evolution topologies from unstructured text. We introduce Intern-Atlas, a methodological evolution graph that automatically identifies method-level entities, infers lineage relationships among methodologies, and captures the bottlenecks that drive transitions between successive innovations. Built from 1,030,314 papers spanning AI conferences, journals, and arXiv preprints, the resulting graph comprises 9,410,201 semantically typed edges, each grounded in verbatim source evidence, forming a queryable causal network of methodological development. To operationalize this structure, we further propose a self-guided temporal tree search algorithm for constructing evolution chains that trace the progression of methods over time. We evaluate the quality of the resulting graph against expert-curated ground-truth evolution chains and observe strong alignment. In addition, we demonstrate that Intern-Atlas enables downstream applications in idea evaluation and automated idea generation. We position methodological evolution graphs as a foundational data layer for the emerging automated scientific discovery.

33
Efficient Training on Multiple Consumer GPUs with RoundPipe

Fine-tuning Large Language Models (LLMs) on consumer-grade GPUs is highly cost-effective, yet constrained by limited GPU memory and slow PCIe interconnects. Pipeline parallelism combined with CPU offloading mitigates these hardware bottlenecks by reducing communication overhead. However, existing PP schedules suffer from an inherent limitation termed the weight binding issue. Binding uneven model stages (e.g., the LM head is large) to GPUs limits the pipeline's throughput to that of the GPU with the heaviest load, leading to severe pipeline bubbles. In this paper, we propose RoundPipe, a novel pipeline schedule that breaks the weight binding constraint on consumer GPU servers. RoundPipe treats GPUs as a pool of stateless execution workers and dynamically dispatches computation stages across devices in a round-robin manner, achieving a near-zero-bubble pipeline. To ensure training correctness and system efficiency, RoundPipe integrates a priority-aware transfer scheduling engine, a fine-grained distributed event-based synchronization protocol, and an automated layer partitioning algorithm. Evaluations on an 8times RTX 4090 server demonstrate that RoundPipe achieves 1.48--2.16times speedups over state-of-the-art baselines when fine-tuning 1.7B to 32B models. Remarkably, RoundPipe enables LoRA fine-tuning of the Qwen3-235B model with 31K sequence length on a single server. RoundPipe is publicly available as an open-source Python library with comprehensive documentation.

32
Claw-Eval-Live: A Live Agent Benchmark for Evolving Real-World Workflows

LLM agents are expected to complete end-to-end units of work across software tools, business services, and local workspaces. Yet many agent benchmarks freeze a curated task set at release time and grade mainly the final response, making it difficult to evaluate agents against evolving workflow demand or verify whether a task was executed. We introduce Claw-Eval-Live, a live benchmark for workflow agents that separates a refreshable signal layer, updated across releases from public workflow-demand signals, from a reproducible, time-stamped release snapshot. Each release is constructed from public workflow-demand signals, with ClawHub Top-500 skills used in the current release, and materialized as controlled tasks with fixed fixtures, services, workspaces, and graders. For grading, Claw-Eval-Live records execution traces, audit logs, service state, and post-run workspace artifacts, using deterministic checks when evidence is sufficient and structured LLM judging only for semantic dimensions. The release contains 105 tasks spanning controlled business services and local workspace repair, and evaluates 13 frontier models under a shared public pass rule. Experiments reveal that reliable workflow automation remains far from solved: the leading model passes only 66.7% of tasks and no model reaches 70%. Failures are structured by task family and execution surface, with HR, management, and multi-system business workflows as persistent bottlenecks and local workspace repair comparatively easier but unsaturated. Leaderboard rank alone is insufficient because models with similar pass rates can diverge in overall completion, and task-level discrimination concentrates in a middle band of tasks. Claw-Eval-Live suggests that workflow-agent evaluation should be grounded twice, in fresh external demand and in verifiable agent action.

29
Leveraging Verifier-Based Reinforcement Learning in Image Editing

While Reinforcement Learning from Human Feedback (RLHF) has become a pivotal paradigm for text-to-image generation, its application to image editing remains largely unexplored. A key bottleneck is the lack of a robust general reward model for all editing tasks. Existing edit reward models usually give overall scores without detailed checks, ignoring different instruction requirements and causing biased rewards. To address this, we argue that the key is to move from a simple scorer to a reasoning verifier. We introduce Edit-R1, a framework that builds a chain-of-thought (CoT) verifier-based reasoning reward model (RRM) and then leverages it for downstream image editing. The Edit-RRM breaks instructions into distinct principles, evaluates the edited image against each principle, and aggregates these checks into an interpretable, fine-grained reward. To build such an RRM, we first apply supervised fine-tuning (SFT) as a ``cold-start'' to generate CoT reward trajectories. Then, we introduce Group Contrastive Preference Optimization (GCPO), a reinforcement learning algorithm that leverages human pairwise preference data to reinforce our pointwise RRM. After building the RRM, we use GRPO to train editing models with this non-differentiable yet powerful reward model. Extensive experiments demonstrate that our Edit-RRM surpasses powerful VLMs such as Seed-1.5-VL and Seed-1.6-VL as an editing-specific reward model, and we observe a clear scaling trend, with performance consistently improving from 3B to 7B parameters. Moreover, Edit-R1 delivers gains to editing models like FLUX.1-kontext, highlighting its effectiveness in enhancing image editing.

28
Length Value Model: Scalable Value Pretraining for Token-Level Length Modeling

Token serves as the fundamental unit of computation in modern autoregressive models, and generation length directly influences both inference cost and reasoning performance. Despite its importance, existing approaches lack fine-grained length modeling, operating primarily at the coarse-grained sequence level. We introduce the Length Value Model (LenVM), a token-level framework that models the remaining generation length. By formulating length modeling as a value estimation problem and assigning a constant negative reward to each generated token, LenVM predicts a bounded, discounted return that serves as a monotone proxy for the remaining generation horizon. This formulation yields supervision that is annotation-free, dense, unbiased, and scalable. Experiments on LLMs and VLMs demonstrate LenVM provides a highly effective signal at inference time. On the LIFEBench exact length matching task, applying LenVM to a 7B model improves the length score from 30.9 to 64.8, significantly outperforming frontier closed-source models. Furthermore, LenVM enables continuous control over the trade off between performance and efficiency. On GSM8K at a budget of 200 tokens, LenVM maintains 63% accuracy compared to 6 percent for token budget baseline. It also accurately predicts total generation length from the prompt boundary. Finally, LenVM's token-level values offer an interpretable view of generation dynamics, revealing how specific tokens shift reasoning toward shorter or longer regimes. Results demonstrate that LenVM supports a broad range of applications and token length can be effectively modeled as a token-level value signal, highlighting the potential of LenVM as a general framework for length modeling and as a length-specific value signal that could support future RL training. Code is available at https://github.com/eric-ai-lab/Length-Value-Model.

19
Representation Fréchet Loss for Visual Generation

We show that Fréchet Distance (FD), long considered impractical as a training objective, can in fact be effectively optimized in the representation space. Our idea is simple: decouple the population size for FD estimation (e.g., 50k) from the batch size for gradient computation (e.g., 1024). We term this approach FD-loss. Optimizing FD-loss reveals several surprising findings. First, post-training a base generator with FD-loss in different representation spaces consistently improves visual quality. Under the Inception feature space, a one-step generator achieves0.72 FID on ImageNet 256x256. Second, the same FD-loss repurposes multi-step generators into strong one-step generators without teacher distillation, adversarial training or per-sample targets. Third, FID can misrank visual quality: modern representations can yield better samples despite worse Inception FID. This motivates FDr^k, a multi-representation metric. We hope this work will encourage further exploration of distributional distances in diverse representation spaces as both training objectives and evaluation metrics for generative models.

18
Nemotron 3 Nano Omni: Efficient and Open Multimodal Intelligence

We introduce Nemotron 3 Nano Omni, the latest model in the Nemotron multimodal series and the first to natively support audio inputs alongside text, images, and video. Nemotron 3 Nano Omni delivers consistent accuracy improvements over its predecessor, Nemotron Nano V2 VL, across all modalities, enabled by advances in architecture, training data and recipes. In particular, Nemotron 3 delivers leading results in real-world document understanding, long audio-video comprehension, and agentic computer use. Built on the highly efficient Nemotron 3 Nano 30B-A3B backbone, Nemotron 3 Nano Omni further incorporates innovative multimodal token-reduction techniques to deliver substantially lower inference latency and higher throughput than other models of similar size. We are releasing model checkpoints in BF16, FP8, and FP4 formats, along with portions of the training data and codebase to facilitate further research and development.

15
Synthetic Computers at Scale for Long-Horizon Productivity Simulation

Realistic long-horizon productivity work is strongly conditioned on user-specific computer environments, where much of the work context is stored and organized through directory structures and content-rich artifacts. To scale synthetic data creation for such productivity scenarios, we introduce Synthetic Computers at Scale, a scalable methodology for creating such environments with realistic folder hierarchies and content-rich artifacts (e.g., documents, spreadsheets, and presentations). Conditioned on each synthetic computer, we run long-horizon simulations: one agent creates productivity objectives that are specific to the computer's user and require multiple professional deliverables and about a month of human work; another agent then acts as that user and keeps working across the computer -- for example, navigating the filesystem for grounding, coordinating with simulated collaborators, and producing professional artifacts -- until these objectives are completed. In preliminary experiments, we create 1,000 synthetic computers and run long-horizon simulations on them; each run requires over 8 hours of agent runtime and spans more than 2,000 turns on average. These simulations produce rich experiential learning signals, whose effectiveness is validated by significant improvements in agent performance on both in-domain and out-of-domain productivity evaluations. Given that personas are abundant at billion scale, this methodology can in principle scale to millions or even billions of synthetic user worlds with sufficient compute, enabling broader coverage of diverse professions, roles, contexts, environments, and productivity needs. We argue that scalable synthetic computer creation, together with at-scale simulations, is highly promising as a foundational substrate for agent self-improvement and agentic reinforcement learning in long-horizon productivity scenarios.

15
Step-level Optimization for Efficient Computer-use Agents

Computer-use agents provide a promising path toward general software automation because they can interact directly with arbitrary graphical user interfaces instead of relying on brittle, application-specific integrations. Despite recent advances in benchmark performance, strong computer-use agents remain expensive and slow in practice, since most systems invoke large multimodal models at nearly every interaction step. We argue that this uniform allocation of compute is fundamentally inefficient for long-horizon GUI tasks. Such trajectories are highly heterogeneous: many steps are routine and can be handled reliably by smaller, cheaper policies, while errors tend to concentrate at a relatively small number of high-risk moments. Across computer-use benchmarks, these failures repeatedly take two forms: progress stalls, where the agent loops, repeats ineffective actions, or fails to make meaningful progress, and silent semantic drift, where the agent continues taking locally plausible actions after already deviating from the user's true goal. To address this inefficiency, we propose an event-driven, step-level cascade for computer-use agents that runs a small policy by default and escalates to a stronger model only when lightweight learned monitors detect elevated risk. Our framework combines two complementary signals: a Stuck Monitor that detects degraded progress from recent reasoning-action history and triggers recovery, and a Milestone Monitor that identifies semantically meaningful checkpoints where sparse verification is most informative for catching drift. This design turns always-on frontier-model inference into adaptive, on-demand compute allocation over the course of an evolving interaction. The framework is modular and deployment-oriented: it can be layered on top of existing computer-use agents without changing the underlying agent architecture or retraining the large model.

11
The Last Human-Written Paper: Agent-Native Research Artifacts

Scientific publication compresses a branching, iterative research process into a linear narrative, discarding the majority of what was discovered along the way. This compilation imposes two structural costs: a Storytelling Tax, where failed experiments, rejected hypotheses, and the branching exploration process are discarded to fit a linear narrative; and an Engineering Tax, where the gap between reviewer-sufficient prose and agent-sufficient specification leaves critical implementation details unwritten. Tolerable for human readers, these costs become critical when AI agents must understand, reproduce, and extend published work. We introduce the Agent-Native Research Artifact (ARA), a protocol that replaces the narrative paper with a machine-executable research package structured around four layers: scientific logic, executable code with full specifications, an exploration graph that preserves the failures compilation discards, and evidence grounding every claim in raw outputs. Three mechanisms support the ecosystem: a Live Research Manager that captures decisions and dead ends during ordinary development; an ARA Compiler that translates legacy PDFs and repos into ARAs; and an ARA-native review system that automates objective checks so human reviewers can focus on significance, novelty, and taste. On PaperBench and RE-Bench, ARA raises question-answering accuracy from 72.4% to 93.7% and reproduction success from 57.4% to 64.4%. On RE-Bench's five open-ended extension tasks, preserved failure traces in ARA accelerate progress, but can also constrain a capable agent from stepping outside the prior-run box depending on the agent's capabilities.

10
InteractWeb-Bench: Can Multimodal Agent Escape Blind Execution in Interactive Website Generation?

With the advancement of multimodal large language models (MLLMs) and coding agents, the website development has shifted from manual programming to agent-based project-level code synthesis. Existing benchmarks rely on idealized assumptions, especially for well-structured, information-rich inputs and static execution settings. In contrast, real-world development is constrained by a critical bottleneck: the semantic misalignment between ambiguous, low-quality instructions from non-expert users and model understanding, which results in a failure mode that we term blind execution. To address this gap, we introduce InteractWeb-Bench, the first multimodal interactive benchmark for website generation under non-expert low-code user conditions. InteractWeb-Bench introduces four types of user agents and persona-driven instruction perturbations to systematically simulate diverse user behaviors, including ambiguity, redundancy, and contradiction, grounded in requirement engineering defect taxonomies. We develop an interactive execution environment for agents, featuring a unified action space comprising Clarify, Implement, Verify, and Submit, enabling iterative intent refinement, code synthesis, and visual feedback-based validation. Extensive experiments and analysis reveal that frontier MLLM-based agents remain trapped in blind execution, exposing limitations in intent recognition and adaptive interaction.

9
MoCapAnything V2: End-to-End Motion Capture for Arbitrary Skeletons

Recent methods for arbitrary-skeleton motion capture from monocular video follow a factorized pipeline, where a Video-to-Pose network predicts joint positions and an analytical inverse-kinematics (IK) stage recovers joint rotations. While effective, this design is inherently limited, since joint positions do not fully determine rotations and leave degrees of freedom such as bone-axis twist ambiguous, and the non-differentiable IK stage prevents the system from adapting to noisy predictions or optimizing for the final animation objective. In this work, we present the first fully end-to-end framework in which both Video-to-Pose and Pose-to-Rotation are learnable and jointly optimized. We observe that the ambiguity in pose-to-rotation mapping arises from missing coordinate system information: the same joint positions can correspond to different rotations under different rest poses and local axis conventions. To resolve this, we introduce a reference pose-rotation pair from the target asset, which, together with the rest pose, not only anchors the mapping but also defines the underlying rotation coordinate system. This formulation turns rotation prediction into a well-constrained conditional problem and enables effective learning. In addition, our model predicts joint positions directly from video without relying on mesh intermediates, improving both robustness and efficiency. Both stages share a skeleton-aware Global-Local Graph-guided Multi-Head Attention (GL-GMHA) module for joint-level local reasoning and global coordination. Experiments on Truebones Zoo and Objaverse show that our method reduces rotation error from ~17 degrees to ~10 degrees, and to 6.54 degrees on unseen skeletons, while achieving ~20x faster inference than mesh-based pipelines. Project page: https://animotionlab.github.io/MoCapAnythingV2/

8
PhyCo: Learning Controllable Physical Priors for Generative Motion

Modern video diffusion models excel at appearance synthesis but still struggle with physical consistency: objects drift, collisions lack realistic rebound, and material responses seldom match their underlying properties. We present PhyCo, a framework that introduces continuous, interpretable, and physically grounded control into video generation. Our approach integrates three key components: (i) a large-scale dataset of over 100K photorealistic simulation videos where friction, restitution, deformation, and force are systematically varied across diverse scenarios; (ii) physics-supervised fine-tuning of a pretrained diffusion model using a ControlNet conditioned on pixel-aligned physical property maps; and (iii) VLM-guided reward optimization, where a fine-tuned vision-language model evaluates generated videos with targeted physics queries and provides differentiable feedback. This combination enables a generative model to produce physically consistent and controllable outputs through variations in physical attributes-without any simulator or geometry reconstruction at inference. On the Physics-IQ benchmark, PhyCo significantly improves physical realism over strong baselines, and human studies confirm clearer and more faithful control over physical attributes. Our results demonstrate a scalable path toward physically consistent, controllable generative video models that generalize beyond synthetic training environments.

8
Compliance versus Sensibility: On the Reasoning Controllability in Large Language Models

Large Language Models (LLMs) are known to acquire reasoning capabilities through shared inference patterns in pre-training data, which are further elicited via Chain-of-Thought (CoT) practices. However, whether fundamental reasoning patterns, such as induction, deduction, and abduction, can be decoupled from specific problem instances remains a critical challenge for model controllability, and for shedding light on reasoning controllability. In this paper, we present the first systematic investigation of this problem through the lens of reasoning conflicts: an explicit tension between parametric and contextual information induced by mandating logical schemata that deviate from those expected for a target task. Our evaluation reveals that LLMs consistently prioritize sensibility over compliance, favoring task-appropriate reasoning patterns despite conflicting instructions. Notably, task accuracy is not strictly determined by sensibility, with models often maintaining high performance even when using conflicting patterns, suggesting a reliance on internalized parametric memory that increases with model size. We further demonstrate that reasoning conflicts are internally detectable, as confidence scores significantly drop during conflicting episodes. Probing experiments confirm that reasoning types are linearly encoded from middle-to-late layers, indicating the potential for activation-level controllability. Leveraging these insights, we steer models towards compliance, increasing instruction following by up to 29%. Overall, our findings establish that while LLM reasoning is anchored to concrete instances, active mechanistic interventions can effectively decouple logical schemata from data, offering a path toward improved controllability, faithfulness, and generalizability.

5
Learning from Noisy Preferences: A Semi-Supervised Learning Approach to Direct Preference Optimization

Human visual preferences are inherently multi-dimensional, encompassing aesthetics, detail fidelity, and semantic alignment. However, existing datasets provide only single, holistic annotations, resulting in severe label noise: images that excel in some dimensions but are deficient in others are simply marked as winner or loser. We theoretically demonstrate that compressing multi-dimensional preferences into binary labels generates conflicting gradient signals that misguide Diffusion Direct Preference Optimization (DPO). To address this, we propose Semi-DPO, a semi-supervised approach that treats consistent pairs as clean labeled data and conflicting ones as noisy unlabeled data. Our method starts by training on a consensus-filtered clean subset, then uses this model as an implicit classifier to generate pseudo-labels for the noisy set for iterative refinement. Experimental results demonstrate that Semi-DPO achieves state-of-the-art performance and significantly improves alignment with complex human preferences, without requiring additional human annotation or explicit reward models during training. We will release our code and models at: https://github.com/L-CodingSpace/semi-dpo

3
World2Minecraft: Occupancy-Driven Simulated Scenes Construction

Embodied intelligence requires high-fidelity simulation environments to support perception and decision-making, yet existing platforms often suffer from data contamination and limited flexibility. To mitigate this, we propose World2Minecraft to convert real-world scenes into structured Minecraft environments based on 3D semantic occupancy prediction. In the reconstructed scenes, we can effortlessly perform downstream tasks such as Vision-Language Navigation(VLN). However, we observe that reconstruction quality heavily depends on accurate occupancy prediction, which remains limited by data scarcity and poor generalization in existing models. We introduce a low-cost, automated, and scalable data acquisition pipeline for creating customized occupancy datasets, and demonstrate its effectiveness through MinecraftOcc, a large-scale dataset featuring 100,165 images from 156 richly detailed indoor scenes. Extensive experiments show that our dataset provides a critical complement to existing datasets and poses a significant challenge to current SOTA methods. These findings contribute to improving occupancy prediction and highlight the value of World2Minecraft in providing a customizable and editable platform for personalized embodied AI research. Project page:https://world2minecraft.github.io/.

3
Instruction-Guided Poetry Generation in Arabic and Its Dialects

Poetry has long been a central art form for Arabic speakers, serving as a powerful medium of expression and cultural identity. While modern Arabic speakers continue to value poetry, existing research on Arabic poetry within Large Language Models (LLMs) has primarily focused on analysis tasks such as interpretation or metadata prediction, e.g., rhyme schemes and titles. In contrast, our work addresses the practical aspect of poetry creation in Arabic by introducing controllable generation capabilities to assist users in writing poetry. Specifically, we present a large-scale, carefully curated instruction-based dataset in Modern Standard Arabic (MSA) and various Arabic dialects. This dataset enables tasks such as writing, revising, and continuing poems based on predefined criteria, including style and rhyme, as well as performing poetry analysis. Our experiments show that fine-tuning LLMs on this dataset yields models that can effectively generate poetry that is aligned with user requirements, based on both automated metrics and human evaluation with native Arabic speakers. The data and the code are available at https://github.com/mbzuai-nlp/instructpoet-ar

2
ViPO: Visual Preference Optimization at Scale

While preference optimization is crucial for improving visual generative models, how to effectively scale this paradigm remains largely unexplored. Current open-source preference datasets contain conflicting preference patterns, where winners excel in some dimensions but underperform in others. Naively optimizing on such noisy datasets fails to learn preferences, hindering effective scaling. To enhance robustness against noise, we propose Poly-DPO, which extends the DPO objective with an additional polynomial term that dynamically adjusts model confidence based on dataset characteristics, enabling effective learning across diverse data distributions. Beyond biased patterns, existing datasets suffer from low resolution, limited prompt diversity, and imbalanced distributions. To facilitate large-scale visual preference optimization by tackling data bottlenecks, we construct ViPO, a massive-scale preference dataset with 1M image pairs at 1024px across five categories and 300K video pairs at 720p+ across three categories. State-of-the-art generative models and diverse prompts ensure reliable preference signals with balanced distributions. Remarkably, when applying Poly-DPO to our high-quality dataset, the optimal configuration converges to standard DPO. This convergence validates dataset quality and Poly-DPO's adaptive nature: sophisticated optimization becomes unnecessary with sufficient data quality, yet remains valuable for imperfect datasets. We validate our approach across visual generation models. On noisy datasets like Pick-a-Pic V2, Poly-DPO achieves 6.87 and 2.32 gains over Diffusion-DPO on GenEval for SD1.5 and SDXL, respectively. For ViPO, models achieve performance far exceeding those trained on existing open-source preference datasets. These results confirm that addressing both algorithmic adaptability and data quality is essential for scaling visual preference optimization.

1
FlashRT: Towards Computationally and Memory Efficient Red-Teaming for Prompt Injection and Knowledge Corruption

Long-context large language models (LLMs)-for example, Gemini-3.1-Pro and Qwen-3.5-are widely used to empower many real-world applications, such as retrieval-augmented generation, autonomous agents, and AI assistants. However, security remains a major concern for their widespread deployment, with threats such as prompt injection and knowledge corruption. To quantify the security risks faced by LLMs under these threats, the research community has developed heuristic-based and optimization-based red-teaming methods. Optimization-based methods generally produce stronger attacks than heuristic attacks and thus provide a more rigorous assessment of LLM security risks. However, they are often resource-intensive, requiring significant computation and GPU memory, especially for long context scenarios. The resource-intensive nature poses a major obstacle for the community (especially academic researchers) to systematically evaluate the security risks of long-context LLMs and assess the effectiveness of defense strategies at scale. In this work, we propose FlashRT, the first framework to improve the efficiency (in terms of both computation and memory) for optimization-based prompt injection and knowledge corruption attacks under long-context LLMs. Through extensive evaluations, we find that FlashRT consistently delivers a 2x-7x speedup (e.g., reducing runtime from one hour to less than ten minutes) and a 2x-4x reduction in GPU memory consumption (e.g., reducing from 264.1 GB to 65.7 GB GPU memory for a 32K token context) compared to state-of-the-art baseline nanoGCG. FlashRT can be broadly applied to black-box optimization methods, such as TAP and AutoDAN. We hope FlashRT can serve as a red-teaming tool to enable systematic evaluation of long-context LLM security. The code is available at: https://github.com/Wang-Yanting/FlashRT

0
Safety Drift After Fine-Tuning: Evidence from High-Stakes Domains

Foundation models are routinely fine-tuned for use in particular domains, yet safety assessments are typically conducted only on base models, implicitly assuming that safety properties persist through downstream adaptation. We test this assumption by analyzing the safety behavior of 100 models, including widely deployed fine-tunes in the medical and legal domains as well as controlled adaptations of open foundation models alongside their bases. Across general-purpose and domain-specific safety benchmarks, we find that benign fine-tuning induces large, heterogeneous, and often contradictory changes in measured safety: models frequently improve on some instruments while degrading on others, with substantial disagreement across evaluations. These results show that safety behavior is not stable under ordinary downstream adaptation, raising critical questions about governance and deployment practices centered on base-model evaluations. Without explicit re-evaluation of fine-tuned models in deployment-relevant contexts, such approaches fall short of adequately managing downstream risk, overlooking practical sources of harm -- failures that are especially consequential in high-stakes settings and challenge current accountability paradigms.

0
05

PRODUCT HUNT

05.00
PRODUCT HUNT

Product Hunt - May 3, 2026

Product Hunt Daily Feed: Featuring noteworthy tech launches.

Aximote In-Car App icon
Aximote In-Car App

The fitness tracker for your car

0
Mockin 2.0 icon
Mockin 2.0

Ultimate career toolkit for UX/UI & Product designers

0
PandaProbe icon
PandaProbe

open source agent engineering platform

0
Huddle01 VMs icon
Huddle01 VMs

Virtual Machines for Your Agents

0
Radar icon
Radar

The missing open-source Kubernetes UI

0
Rosentic icon
Rosentic

Catch when coding agents break each other before merge

0
Scholé icon
Scholé

Turn everyday work into personalized AI learning

0
Cloud Computer by Manus icon
Cloud Computer by Manus

A dedicated cloud machine for bots and software

0
Filect icon
Filect

Organize Your Files With AI

0
Microsoft Copilot Health icon
Microsoft Copilot Health

Dedicated space to bring your personal health data together

0
Feather icon
Feather

Photo editor with local AI

0
Ara icon
Ara

Build an entire business by texting

0
YouTube TV Custom Multiview icon
YouTube TV Custom Multiview

Mix and match up to 4 live streams at once

0
Breaks icon
Breaks

A quiet Pomodoro that lives in your menu bar.

0
HiveTerm icon
HiveTerm

One workspace for Claude, Codex, Gemini and your stack

0
Zed 1.0 icon
Zed 1.0

High-performance, open source, multiplayer code editor

0
Marx Finance icon
Marx Finance

AI agents debate the markets

0
Zush icon
Zush

Updated: docs support, BYOK, Local AI (Ollama), Windows App

0
Postiz icon
Postiz

Agentic social media scheduler for agents like OpenClaw

0
Ghosted icon
Ghosted

Pause media or lock your screen when you step away

0
ScreenVeil icon
ScreenVeil

Hide what shouldn’t be seen on your computer

0
nudge icon
nudge

Drop your tasks. AI auto-schedules your whole week.

0
LaunchCut icon
LaunchCut

Interactive iOS Demo Builder

0
AnyDrop icon
AnyDrop

AirDrop for the browser: share files, chat and sync notes

0
Buda icon
Buda

Recruit agents to run your company as a synchronous team

0
Beauty Diagram icon
Beauty Diagram

Diagrams that don't look like they were auto-generated

0
Bitgrain icon
Bitgrain

Design studio lighter than Figma & more flexible than Canva

0
Genspark for Word icon
Genspark for Word

Draft, edit, and research inside Microsoft Word with AI

0
Montage icon
Montage

The runtime framework for agentic user interfaces!

0
PeekFocus icon
PeekFocus

One keystroke blurs everything behind your active window

0
MUSIXQUARE icon
MUSIXQUARE

Turn any room into a surround system with your devices

0
CipherLock icon
CipherLock

Learn ciphers by breaking them

0
TrafficClaw icon
TrafficClaw

Have a conversation with your SEO & analytics data

0
ElevenMusic icon
ElevenMusic

AI-assisted music creation with built-in discovery, royalty

0
Crin AI icon
Crin AI

Learn about AI & watch text become tokens in a node graph

0
Docky icon
Docky

Pin, group, and remove apps easily from your dock

0
Gemini Deep Research Agent icon
Gemini Deep Research Agent

Web and MCP research agents, now in Gemini API

0
Adoptly icon
Adoptly

Turn product releases into feature adoption

0
SuperMind icon
SuperMind

Business that Runs Itself

0
doola MCP for US LLC Formation icon
doola MCP for US LLC Formation

Start your business using AI in Claude and Replit

0
ElevenLabs Agent Templates icon
ElevenLabs Agent Templates

Deploy pre-built voice and chat agents for support, sales

0
Miaw AI secretary icon
Miaw AI secretary

Non-invasive AI secretary to help without context switching

0
Symphony icon
Symphony

An open-source spec for Codex orchestration

0
Motorola Razr Fold icon
Motorola Razr Fold

A foldable phone built for pen-first productivity

0
Sync-in icon
Sync-in

Open-source file storage, sharing, collaboration & syncing

0
WooTrack - POAS Plugin for WooCommerce icon
WooTrack - POAS Plugin for WooCommerce

Real profit tracking for WooCommerce + Google Ads.

0
VideoOS by Jupitrr AI icon
VideoOS by Jupitrr AI

Your all-in-one video workflow

0
Mintlify Editor icon
Mintlify Editor

AI-native collaborative editor

0
Tabstack icon
Tabstack

Extract web data and automate browsers, no scraper required.

0
Tinfoil icon
Tinfoil

AI chat and API that keeps your conversations fully private

0
06

TECHMEME

06.00
TECHMEME

Techmeme - May 3, 2026

Techmeme Digest: Major tech headlines and industry conversations.

How Amazon's expansion into fashion helped Jeff Bezos enter fashion's inner circle, as he and Lauren Sánchez Bezos become underwriters for this year's Met Gala (Chavie Lieber/Wall Street Journal)
Source: TechmemePublished: May 3, 2026

Chavie Lieber / Wall Street Journal : How Amazon's expansion into fashion helped Jeff Bezos enter fashion's inner circle, as he and Lauren Sánchez Bezos become underwriters for this year's Met Gala —  The Amazon founder and Lauren Sánchez Bezos have become front-row fixtures through business expansion and charitable giving

Nintendo's share price has fallen by ~45% since August 2025, as rising memory chip costs drive investor concerns over profit margins for the Switch 2 (David Keohane/Financial Times)
Source: TechmemePublished: May 3, 2026

David Keohane / Financial Times : Nintendo's share price has fallen by ~45% since August 2025, as rising memory chip costs drive investor concerns over profit margins for the Switch 2 —  Higher memory chip costs fuel fears of price rise for Switch 2 and cast shadow over console's success  —  The latest Super Mario movie released …

A profile of BlackBerry's QNX division, whose operating system controls safety features in 275M cars and accounts for half of BlackBerry's revenue (Ben Cohen/Wall Street Journal)
Source: TechmemePublished: May 3, 2026

Ben Cohen / Wall Street Journal : A profile of BlackBerry's QNX division, whose operating system controls safety features in 275M cars and accounts for half of BlackBerry's revenue —  John Wall has spent nearly his entire career working for the same company.  And when he tells people where he works, nobody has any clue what he's talking about.

An evaluation by NIST's CAISI says DeepSeek V4 Pro lags behind leading US AI models by about eight months and is the most capable Chinese AI model to date (NIST)
Source: TechmemePublished: May 3, 2026

NIST : An evaluation by NIST's CAISI says DeepSeek V4 Pro lags behind leading US AI models by about eight months and is the most capable Chinese AI model to date —  In April 2026, the Center for AI Standards and Innovation (CAISI) evaluated the open-weight AI model DeepSeek V4 Pro ("DeepSeek V4").

A slew of top Boston Dynamics execs have left the Hyundai-owned company in recent months, as sources say it faces pressure to speed the delivery of humanoids (Rachyl Jones/Semafor)
Source: TechmemePublished: May 3, 2026

Rachyl Jones / Semafor : A slew of top Boston Dynamics execs have left the Hyundai-owned company in recent months, as sources say it faces pressure to speed the delivery of humanoids —  In recent months, a slew of top executives at Boston Dynamics have left the company, which Hyundai bought a majority stake in back in 2021.

Sources: OpenAI employees have raised alarms internally over failures to alert law enforcement when users describe plans for real-world violence to ChatGPT (Georgia Wells/Wall Street Journal)
Source: TechmemePublished: May 3, 2026

Georgia Wells / Wall Street Journal : Sources: OpenAI employees have raised alarms internally over failures to alert law enforcement when users describe plans for real-world violence to ChatGPT —  OpenAI's chatbot dispenses advice on weapons and role-plays mass shootings.  The carnage is raising scrutiny on when and how companies intervene.

Palo Alto Networks agrees to acquire Portkey, which develops AI gateway tech to manage and secure AI agents; sources say the deal values Portkey at $120M-$140M (The Economic Times)
Source: TechmemePublished: May 3, 2026

The Economic Times : Palo Alto Networks agrees to acquire Portkey, which develops AI gateway tech to manage and secure AI agents; sources say the deal values Portkey at $120M-$140M —  Cybersecurity giant Palo Alto Networks is acquiring AI infrastructure startup Portkey to bolster its defences for autonomous AI systems.

Amadeus IT Group, which operates the world's largest travel booking system, plans to acquire French biometrics company Idemia Public Security, for €1.2B in cash (Javi West Larrañaga/Reuters)
Source: TechmemePublished: May 3, 2026

Javi West Larrañaga / Reuters : Amadeus IT Group, which operates the world's largest travel booking system, plans to acquire French biometrics company Idemia Public Security, for €1.2B in cash —  Spanish travel technology firm Amadeus (AMA.MC) on Wednesday announced a plan to acquire French biometrics company Idemia Public Security …

Ask.com shutters, as its owner IAC "continues to sharpen its focus"; a dot-com era icon, Ask Jeeves launched in 1997, a year before Google (Chase DiBenedetto/Mashable)
Source: TechmemePublished: May 3, 2026

Chase DiBenedetto / Mashable : Ask.com shutters, as its owner IAC “continues to sharpen its focus”; a dot-com era icon, Ask Jeeves launched in 1997, a year before Google —  “As IAC continues to sharpen its focus, we have made the decision to discontinue our search business, which includes Ask.com.

Maryland becomes the first US state to ban surveillance pricing in grocery stores, as other states including CO, CA, MA, IL, and NJ consider similar bills (Sanya Mansoor/The Guardian)
Source: TechmemePublished: May 3, 2026

Sanya Mansoor / The Guardian : Maryland becomes the first US state to ban surveillance pricing in grocery stores, as other states including CO, CA, MA, IL, and NJ consider similar bills —  Critics say Maryland's new law banning rapidly change product costs based on consumer data is full of carveouts

Analysis: after Trump's World Liberty raised $550M from investors, tokens worth hundreds of millions in USD were privately sold in "white glove" transactions (Olga Kharif/Bloomberg)
Source: TechmemePublished: May 2, 2026

Olga Kharif / Bloomberg : Analysis: after Trump's World Liberty raised $550M from investors, tokens worth hundreds of millions in USD were privately sold in “white glove” transactions —  The pitch was straightforward: Invest in the cryptocurrency venture of Donald Trump and his family …

Investigation: Nobitex was founded by two brothers from Iran's elite Kharrazi family; the crypto exchange processed hundreds of millions beyond US sanctions (Reuters)
Source: TechmemePublished: May 2, 2026

Reuters : Investigation: Nobitex was founded by two brothers from Iran's elite Kharrazi family; the crypto exchange processed hundreds of millions beyond US sanctions —  Two brothers from the elite Kharrazi family, using an alternative surname, started up Nobitex in 2018.

A Chinese court ruled that companies cannot terminate staff just to replace them with AI, following a similar ruling by another Chinese court in December 2025 (Victor Swezey/Bloomberg)
Source: TechmemePublished: May 2, 2026

Victor Swezey / Bloomberg : A Chinese court ruled that companies cannot terminate staff just to replace them with AI, following a similar ruling by another Chinese court in December 2025 —  A Chinese court ruled that companies cannot terminate employees just to replace them with artificial intelligence systems …

Study: OpenAI's o1 correctly diagnosed 67% of emergency room patients using electronic records and a few sentences from nurses, vs. to 50-55% for triage doctors (Robert Booth/The Guardian)
Source: TechmemePublished: May 2, 2026

Robert Booth / The Guardian : Study: OpenAI's o1 correctly diagnosed 67% of emergency room patients using electronic records and a few sentences from nurses, vs. to 50-55% for triage doctors —  Researchers say results mark a ‘profound change in technology that will reshape medicine’  —  From George Clooney in ER …

Sources: Nigerian mobile payments service OPay is preparing for a US IPO at a $4B valuation with Citigroup, Deutsche Bank, and JPMorgan Chase advising (Bloomberg)
Source: TechmemePublished: May 2, 2026

Bloomberg : Sources: Nigerian mobile payments service OPay is preparing for a US IPO at a $4B valuation with Citigroup, Deutsche Bank, and JPMorgan Chase advising —  Opay Digital Services Ltd. is working with Citigroup Inc., Deutsche Bank AG, and JPMorgan Chase & Co. as the Nigeria-focused payments platform prepares …

07

STARTUP ARCHIVE

07.00
STARTUP ARCHIVE

Startup News - May 3, 2026

Startup News Roundup: Aggregating key funding and launch updates.

Marc Andreessen on the 5 personality traits of an innovator
Source: StartupPublished: Mar 31, 2026

“When you’re talking about real innovators—people who actually do really creative, breakthrough work—I think you’re talking about a couple things:”

Steve Jobs explains the importance of both thinking and doing
Source: StartupPublished: Mar 30, 2026

“The doers are the major thinkers. The people who really create the things that change this industry are both the thinker-doer in one person.”

Tobi Lutke explains what the VCs who passed on Shopify got wrong
Source: StartupPublished: Mar 27, 2026

“What a lot of free-market thinkers don’t understand is that between the demand and eventual supply lies friction."

Sam Altman explains how he decides to invest in a startup after 10 minutes
Source: StartupPublished: Mar 26, 2026

"Does this person have the potential to be the next Mark Zuckerberg?… [You don’t get to] 100% accuracy, obviously, but it’s good enough that our business model works.”

Jony Ive recounts the time Steve Jobs called him vain
Source: StartupPublished: Mar 25, 2026

In the clip below, Jony Ive recounts the time he asked Steve Jobs to be less harsh in his critique of a piece of work.

Jeff Bezos’s two pieces of advice for aspiring entrepreneurs
Source: StartupPublished: Mar 24, 2026

“The advice that I would give entrepreneurs is don't chase the hot new thing. It's so hard to catch something that everybody already knows is hot."

Elad Gil: “Things that work tend to work pretty fast”
Source: StartupPublished: Mar 23, 2026

“I do think there’s a bit of a myth in Silicon Valley that you should keep grinding no matter what and it’s just about perseverance, and I think that’s really bad advice."

Paul Graham on why starting with a “small, intense fire" is the key to startup growth
Source: StartupPublished: Mar 20, 2026

"You have to know who those first users are and how you're going to get them."

Keith Rabois on how to identify great talent
Source: StartupPublished: Mar 19, 2026

“What you want to do with every single employee every single day is expand the scope of their responsibilities until it breaks… and that’s the role they should stay in.”

Wealthfront CEO on why advertising spend makes it harder to find product/market fit
Source: StartupPublished: Mar 18, 2026

“The way that you know you have product/market fit is if you have exponential organic growth."

Eric Schmidt on why most companies get strategy wrong
Source: StartupPublished: Mar 17, 2026

“Work very, very hard to figure out what the world’s going to look like in five years. What will people be doing? What will your customers want? Where will costs be?"

Mark Zuckerberg: “You can’t 80/20 everything”
Source: StartupPublished: Mar 16, 2026

"There’s the famous 80/20 rule where you get 80% of the benefit by doing 20% of the work, but you can’t just 80/20 everything. There have to be certain things that you are just the best at."

Marc Andreessen on Mark Zuckerberg’s founder “superpower”
Source: StartupPublished: Mar 13, 2026

“A great superpower that Mark Zuckerberg has that is probably not well-understood enough is he does not get emotionally upset in stressful situations"

Sam Altman explains how to come up with a great startup idea
Source: StartupPublished: Mar 12, 2026

"If you start a startup without a good idea… you’ll be under pressure to make something up and it won’t work that well."

Jeff Bezos on the problems with proxies and managing to metrics
Source: StartupPublished: Mar 11, 2026

“One of the things that happens in business is that you develop certain things that you’re managing to—a typical case would be a metric. And that metric isn’t the real underlying thing.”

Airbnb founder Brian Chesky on how to design an amazing user experience
Source: StartupPublished: Mar 10, 2026

“If you can design something really amazing using the hand-crafted part of your brain, then you can reverse-engineer how to industrialize this millions of times over."

Spencer Rascoff: "I will never invest in a consumer startup with paid marketing”
Source: StartupPublished: Mar 9, 2026

"If you’re actually trying to grow a product, the best levers for doing that are often within the product itself.”

Patrick Collison explains why it sometimes make sense to quit
Source: StartupPublished: Mar 6, 2026

“One thing I’ve learned myself the hard way, is that it is easier to tear down a company and restart it in Silicon Valley, than it is to constantly try to pivot or keep something alive."

Jeff Bezos recounts the time he called Amazon’s customer service number mid-meeting to prove a metric was wrong
Source: StartupPublished: Mar 5, 2026

“I have a saying, which is when the data and the anecdotes disagree, the anecdotes are usually right"

Ben Horowitz: “Nobody was born a great manager. It’s a very unnatural job.”
Source: StartupPublished: Mar 4, 2026

“If you can’t build a great product, it doesn’t matter if you can build a great company.”

03

ALSO TODAY

3 MORE SOURCES
08

SOLIDOT

08.00
SOLIDOT

Solidot News - May 3, 2026

Solidot Feed: Highlighting essential tech & open-source news.

英国 NHS 以 AI 为由准备关闭所有开源库

日程安排平台 Cal.com 上月宣布从开源转为闭源,理由是 AI 工具更容易从开源代码中发现漏洞,而安全性依赖于模糊,因此闭源有助于提高安全。现在英国国家医疗服务体系(NHS)以相同的理由准备关闭它几乎所有的开源库,这一决定引发了广泛争议和批评。批评者指出 NHS 公布的大部分开源库是数据集、内部工具、指南、研究工具、前端设计等,它们不会因为安全扫描技术的进步而受到影响。此外是否开源对于 Anthropic Mythos 之类的 AI 工具并无区别,因为它们也能分析二进制程序并寻找漏洞。批评者发表了公开信,呼吁 NHS 保持其代码公开。

杭州法院裁定以 AI 代替人类为由裁员是系违法

杭州市中级人民法院公布了一起有关“AI 接替人类员工”的判例,判决公司因“AI 成本比人工低”而辞退员工系违法行为,涉事企业需要支付赔偿金 26 万元人民币。在本案中,现年 35 岁的小周(化名) 2022 年入职杭州某家科技公司担任 AI 大模型“质检员”,负责对 AI 大模型与用户交互形成的答案进行正确性判定。2025 年,该公司以“AI 大模型技术升级,原来需要人工完成的质检工作,现在 AI 自己就能做了”为由,试图对小周进行调岗降薪:从主管降为普通员工、月薪从 2.5 万元人民币降到 1.5 万元。小周拒绝如此安排,随后就被公司解除劳动合同。小周申请劳动仲裁,仲裁庭判定公司应当支付违法解除劳动合同赔偿金 26 万余元。该公司不服,因此诉诸法庭。杭州市中级人民法院审理后认定,该公司解约非因裁撤业务、经营不善、减少亏损等消极因素,而是以 AI 的成本优势为由,不属于劳动合同无法履行的“客观情况重大变化”。而且该公司之前为小周提供的调岗降薪方案,实际上导致待遇大幅下降,并非合理协商方案。因此法庭认定该公司构成违法解除,支持仲裁结果,判决其按 2N 标准支付小周赔偿金。杭州市中级人民法院民事第五庭庭长丁晔对媒体表示,在企业视角下,应用 AI 提效降本是市场竞争的必然选择;而在劳动者视角下,因技术变革而失去岗位或被降薪,实质是公司将正常的技术迭代风险转嫁给劳动者。

人可以在睡梦中交流和学习

很多人都有过在睡梦中获得灵感的经历。这种现象促使科学家研究睡眠学习。在 1954 年 Charles W. Simon 和 William H. Emmons 认为大多数睡眠学习研究的参与者其实都是清醒的,因此此类研究都毫无意义。他们将睡眠研究归类为科幻和伪科学,之后几十年很少有人再对此展开研究。但最近几年,科学家再次尝试展开研究。新研究主要针对清醒梦者,即在睡眠中保持意识清醒,并意识到自己正在做梦的人。根据发表在《Neuroscience of Consciousness》期刊上的一项研究,20 名清醒梦者在实验室里尝试睡梦中解谜。每个谜题都与特定的声音配对,旨在促使他们恢复处理相应的谜题。在实验室里,参与者解开了梦中出现的谜题的 42%,对于没有出现在梦中的谜题,他们只解开了 17%。大多数人不会做清醒梦,所以研究对象并不具有代表性。研究人员认为一种解释是:我们在睡着时,更可能将不相关的刺激联系起来。研究人员并不建议为了睡眠中学习而干扰睡眠,因为睡眠是重要的生理过程,干扰这一过程可能得不偿失。

Ask.com 关闭

有 30 年历史的搜索引擎 Ask.com 于 2026 年 5 月 1 日关闭。Ask.com 创办于 1996 年 6 月,最初的名字叫 AskJeeves.com,2006 年弃用了名字 Jeeves 变成了 Ask.com,成为了搜索引擎,有自己的爬虫和算法。2010 年面对大型搜索引擎的竞争它将网络搜索技术外包,恢复了问答网站的功能。Ask.com 虽然关闭了,但 AskJeeves.com 仍继续运营。Jeeves 的意思是贴身侍从,名字来自于英国作家 P. G. Wodehouse 作品《Jeeves》系列,Jeeves 是绅士 Bertie Wooster 的贴身男仆。

为什么 OpenAI 的系统提示词要专门限制 Goblins

OpenAI Codex CLI 系统提示词专门加入了一条对地精(Goblins)等词的限制:“never talk about goblins, gremlins, raccoons, trolls, ogres, pigeons, or other animals or creatures unless it is absolutely and unambiguously relevant to the user's query”。官方解释称,从 GPT-5.1 开始该公司的模型在比喻中提及 goblin 等词的频率大增,ChatGPT 中 goblin 的使用量增加了 175%,gremlin 使用量增加了 52%。它为此展开了调查,发现是因为 Nerdy 个性无意中奖励了此类比喻,导致高频使用 goblin 的行为扩散。为解决该问题,OpenAI 淘汰了 Nerdy 个性,移除了对 goblin 友好的奖励信号,从训练数据过滤掉相关示例,防止其再次不恰当的出现。

瑞士将于六月公投是否将人口限制在一千万

瑞士将于 6 月 14 日举行全民公投,决定是否在 2050 年前将全国常住人口限制在一千万以内。瑞士的人口出生率为每名妇女生育 1.29 个孩子,远低于 2.1 的人口替代率,它的人口增长主要归因于外来移民。目前瑞士人口已超过 900 万,官方数据显示,2024 年外国公民占到了瑞士总人口的 27% 以上。右翼的瑞士人民党(Swiss People's Party)支持的提案要求“2050 年前瑞士常住人口不得超过 1000 万,且瑞士应放弃与欧盟的自由流动协议”。对瑞士 16176 名受访者的最新民调显示,52%的人支持或倾向于支持该提案,46% 的人反对,其余未表明立场。

内核曝出 Root 提权漏洞 Copy Fail

Xint Code 团队报告了被称为 Copy Fail 的内核 root 提权漏洞。该漏洞非常容易利用,影响 2017 年以来的几乎所有内核版本。在漏洞披露前内核安全团队没有提前通知发行版也引发了争议。内核不将损坏的页面标记为可写回,因此磁盘上的文件内容不变,但内存中的页面缓存已被篡改。访问文件时,系统读取的是页面缓存,因此损坏的数据会立即影响整个系统。本地非特权用户可通过损坏 setuid 二进制文件的页面缓存获取 root 权限。由于页面缓存在主机和容器之间共享,攻击者可以跨容器边界利用此漏洞。该漏洞影响几乎所有发行版,主要发行版都已经释出或准备释出补丁。

Mozilla 反对 Chrome 的 Prompt API

Google Chrome 在 2025 年提出了 Prompt API,也就是为浏览器集成的本地模型——使用前需要下载——提供统一的 JavaScript API。Google 还有意让该 API 成为一个 W3C 标准。Chrome 桌面版集成的大模型是 Gemini Nano,使用该模型需要本地设备至少有 4GB 显存、16GB 内存和至少 22GB 可用空间(浏览器所在硬盘)。Mozilla 开发者发表声明反对 Chrome 的 Prompt API。开发者认为该 API 存在巨大的互操作性问题,因为不同的模型都有各种独特的特性,因此系统提示词需要对模型进行针对性调整,然而对一个模型进行的调整对另一个模型就可能是过度修正。为了实现互操作性,Mozilla 和 Apple 可能不得不获得 Google 模型的授权,或者发布一个与 Google 模型特性兼容的模型。另一个大问题是模型的中立性缺乏。

数据中心开发商 Pure Data 暂停中东投资项目

在其设施遭袭受损之后,数据中心开发商 Pure Data 暂停所有中东项目投资。Pure Data 在欧洲、亚洲和中东运营或开发逾 1GW 的数据中心。数据中心作为基础设施成为了战争中的一个重要目标。亚马逊 AWS 在中东有三座数据中心遭到袭击,导致中东客户的服务出现大规模中断,迫使亚马逊宣布免除其中东云区域客户所有费用,导致其损失了约 1.5 亿美元。Pure Data 位于阿布扎比 Yas Island 的数据中心园区遭到了弹片的袭击。该公司没有披露发生的时间以及受损情况。

德国 2025 年新生儿数量降至 1946 年以来最低水平

德国联邦统计局的初步数据显示,2025 年新生儿数量降至 1946 年以来最低水平。2025 年德国新生儿数约 65.5 万,远低于 1964 年婴儿潮高峰时的 136 万,2024 年的新生儿数据是 68 万。与此同时德国死亡人数接近 101 万,使得 2025 年死亡人数与出生人数之差超过 35.2 万,创战后历史新高。德国出生率连续第四年下降,目前每名妇女平均生育 1.35 个孩子,创历史新低,远低于维持人口稳定所需的 2.1 个孩子。汉堡是唯一一个生育率上升的德国州,2025 年增长了 0.5%。

Google 给你贴上的价格标签

瑞士邮件服务商 Proton 利用 2025 年广告竞价数据,分析了逾 54,000 个人口画像,估算广告商为触达不同类美国人所支付的价格。结果显示不同人之间的价格差距远超想象。美国人平均每年产生的广告价值约 1,605 美元;一名居住在蒙大拿州 Bozeman 市、年龄 35-44 岁之间、无子女、用台式机进行高价值企业搜索的男性,其广告价值估计为 17,929.30 美元;一位居住在阿肯色州 Fort Smith 市、年龄在 18-24 岁之间、用 Android 手机进行低价值搜索的父亲,其广告价值仅为 31.05 美元。1,605 美元的平均值与 760 美元的中位数显示,少数高价值用户拉高了平均值,而此类商业模式依赖于高价值用户。分析显示,无子女用户的广告价值比有子女用户平均高出约 17%,一旦某个用户被标记为有子女,针对他们的广告投放会从每次点击 6 美元的财富管理广告转向每次点击 2 美元的面包车和幼儿园广告。台式机用户的价值是 Android 用户的 4.9 倍,苹果 iPhone 用户的价值是 Android 用户的 2.7 倍。用户年龄在 35-44 岁之间时广告价值最高,65 岁后广告价值下降——虽然老年用户价值下降,但针对他们的广告则属于高消费类别如医保补充保险、药品和金融产品。老年人的总体价值降低,但广告商的投放力度更精准。为什么蒙大拿州 Bozeman 市居民的广告价值高?因为大量远程科技工作者的涌入和户外休闲消费使其成为全美竞争最激烈的本地广告市场之一。

亚洲多国加大燃煤发电以应对能源危机

最新的中东能源危机促使亚洲国家加大燃煤发电,而煤炭是高污染排放来源,如果这一趋势继续,全球气候变化问题将会愈发严峻。印度宣布推迟对国内燃煤电厂的维护检查。国际能源署(IEA)的数据显示,截至 2023 年,印度发电中煤炭占 74%。石油和天然气合计约占 3%,来自中东的采购存在制约,印度通过增加煤炭火力来避免停电风险。泰国电力公司重启原计划停用的 2 座燃煤机组。韩国暂时解除了以发电能力 80% 为上限的煤炭火电站的运行限制,推迟原定于 6 月关闭的两座火电站的关闭时间。日本也将提高煤炭火力发电站的开工率。孟加拉国则增加煤炭的供应来源。全球最大的发电用煤炭出口国印尼计划上调原定为 6 亿吨的 2026 年煤炭生产计划。第二大出口国澳大利亚政府也计划扩大煤炭生产。

活动邀请 | NVIDIA 开发者见面会:从基础设施到智能体,全链路专家深度解析

从底层的 GPU 开发,到长上下文的大模型推理,再到能够自主规划的 Agentic AI,AI 技术的演进正全方位重塑软件开发的范式。为了帮助开发者更好地应对日益复杂的全栈挑战,NVIDIA 企业开发者社区诚邀您参加即将在苏州举办的 NVIDIA 开发者见面会。本次见面会将汇集来自 NVIDIA 全球及本地的技术专家,为您带来从基础设施优化到前沿智能体应用的全链路干货分享。  查看全文

水产养殖的温室气体排放

发表在《Frontiers of Agricultural Science and Engineering》期刊上的一项研究发现,水产养殖的温室气体排放主要来自四个环节:饲料生产、养殖过程中的能源消耗、池塘或水体中的生物化学过程(如甲烷和氧化亚氮的释放),以及土地利用变化和基础设施建设。其中饲料生产是大多数投饵型养殖系统中最大的排放源,在我国的研究中占比达到 52%。而在我国等以淡水池塘养殖为主的地区,甲烷排放尤为突出,贡献了约 90% 的养殖系统温室气体排放。不同水产养殖物种之间的排放差异显著。例如不依赖投饵的双壳贝类(如牡蛎、蛤蜊)和海藻养殖,排放极低甚至为负值,反而能通过碳固定起到“碳汇”作用。而草食性或杂食性鱼类(如鲤鱼、罗非鱼)在适度养殖强度下排放也相对较低。相比之下,集约化养殖的肉食性鱼类(如鲑鱼、鳟鱼)和虾类由于饲料和能源需求高,碳排放强度显著上升,部分甚至与陆地畜牧业相当。

微软公开 86-DOS 1.00 源代码

2018 年微软公开了 MS-DOS 1.25 和 2.11 源代码,2024 年公开了 MS-DOS 4.0 源代码,2026 年 4 月在 86-DOS 1.00 发布 45 周年之际它延续传统公开了 86-DOS 1.00 源代码。86-DOS 的作者是 Tim Paterson,它后来成为 MS-DOS 的基础。发布在 GitHub 上的内容包括了 86-DOS 1.00 内核源代码、内核的多个快照,以及知名工具 CHKDSK 等。

基因组学先驱 Craig Venter 去世,享年 79 岁

基因组学先驱 Craig Venter 去世,享年 79 岁。他因在 1990 年代末与人类基因组计划竞争建立一个基因组数据库而闻名,但他的数据库是设想要付费才能访问,因此在科学界并不受欢迎,促使其他科研团队加速公开发布基因组测序结果。2000 年由美国总统克林顿牵手,人类基因组计划和 Venter 创办的 Celera Genomics 公司同意所有人类基因组数据为人类共同财富,不允许专利保护,且对所有研究者公开。

GCC 17 加入对海光 C86-4G CPU 的支持

GCC 编译器项目合并了支持海光 C86-4G CPU 的补丁。海光最早是与 AMD 合作的半导体企业,授权提供 AMD Zen 1 CPU的本地化版本,其产品仅供国内市场使用。海光去年五月宣布与中科曙光合并,但年底宣布合并计划终止。 C86-4G 为 16 核/32 线程处理器,其性能接近英特尔的 Raptor Lake CPU,支持 DDR5 和 PCIe Gen 5。海光声称 C86-4G 利用了自主研发的新微架构,但仅从 GCC 补丁看它仍然与 AMD Zen 有许多相似之处。C86-4G 包括了 C86-4G-M4 / C86-4G-M6 / C86-4G-M7 系列,其中 C86-4G-M7 支持 AVX-512 指令集。

09

APP STORE RANK

09.00
APP STORE RANK
FETCHING · APP STORE RANK