在当前的移动互联网环境中,越来越多的平台选择将核心数据仅通过App端提供,Web端要么功能残缺,要么数据经过复杂加密。作为一名长期从事数据采集工作的开发者,我深刻体会到传统爬虫技术在这种场景下的无力感。特别是当遇到以下几种情况时:
面对这些挑战,我经过多次实践发现,Frida动态注入技术配合Python脚本可以完美解决这些问题。不同于静态逆向需要反编译整个APK,Frida允许我们在运行时动态修改和观察应用行为,这大大提高了逆向效率。
在开始之前,我们需要准备以下环境:
Python端:
bash复制pip install frida==16.0.19 frida-tools==12.1.1
选择特定版本是因为在2026年的实践中,我发现新版本可能存在兼容性问题。建议使用虚拟环境隔离依赖:
bash复制python -m venv frida_env
source frida_env/bin/activate # Linux/Mac
frida_env\Scripts\activate # Windows
Android设备端:
下载匹配设备架构的frida-server:
frida-server-16.0.19-android-arm64.xzfrida-server-16.0.19-android-arm.xzfrida-server-16.0.19-android-x86.xz推送并运行:
bash复制adb push frida-server /data/local/tmp/
adb shell "chmod 755 /data/local/tmp/frida-server"
adb shell "/data/local/tmp/frida-server &"
注意:部分厂商ROM会限制后台进程,建议使用
nohup保持运行:
adb shell "nohup /data/local/tmp/frida-server >/dev/null 2>&1 &"
对于iOS设备,配置稍复杂:
bash复制# 在Mac上
brew install usbmuxd
iproxy 2222 22 & # 端口转发
scp -P 2222 frida-server root@localhost:/var/root/
ssh -p 2222 root@localhost "chmod +x /var/root/frida-server"
ssh -p 2222 root@localhost "/var/root/frida-server &"
以某电商App的签名生成函数为例,典型的Hook流程如下:
python复制# frida_hook_signature.py
import frida
import sys
def on_message(message, data):
if message['type'] == 'send':
print(f"[*] {message['payload']}")
elif message['type'] == 'error':
print(f"[!] {message['stack']}")
device = frida.get_usb_device(timeout=5) # 增加超时
try:
session = device.attach("com.example.shop")
except frida.ProcessNotFoundError:
print("目标进程未找到,请确保App已启动")
sys.exit(1)
js_code = """
Java.perform(function() {
// 定位签名工具类
var SignUtils = Java.use("com.security.SignUtils");
// Hook所有重载版本
SignUtils.generateSign.overloads.forEach(function(method) {
method.implementation = function() {
// 打印调用堆栈
console.log(Java.use("android.util.Log").getStackTraceString(
Java.use("java.lang.Throwable").$new()
));
// 打印参数
var args = [];
for(var i=0; i<arguments.length; i++){
args.push(arguments[i]);
}
console.log(`参数: ${args.join(', ')}`);
// 调用原方法
var result = this.generateSign.apply(this, arguments);
console.log(`结果: ${result}`);
// 发送到Python端
send({
type: 'signature',
args: args,
result: result
});
return result;
};
});
});
"""
script = session.create_script(js_code)
script.on('message', on_message)
script.load()
sys.stdin.read() # 保持脚本运行
关键点解析:
overloads.forEach 可以捕获所有重载方法apply保持原方法调用方式当关键算法在.so文件中时,需要更底层的Hook:
javascript复制Interceptor.attach(Module.findExportByName("libcrypto.so", "aes_encrypt"), {
onEnter: function(args) {
this.arg0 = args[0]; // 保存参数供onLeave使用
console.log("输入数据:", hexdump(args[0], {
length: args[2].toInt32(),
header: false
}));
},
onLeave: function(retval) {
console.log("加密结果:", hexdump(retval, {
length: 16, // AES-128输出固定16字节
header: false
}));
// 计算密钥特征
var key = Memory.readByteArray(this.arg0.add(16), 16);
send({
type: 'aes_key',
key: Array.from(key)
});
}
});
通过Hook获取到签名逻辑后,需要在Python中实现:
python复制# signature.py
import hashlib
import hmac
import base64
class AppSigner:
def __init__(self, secret_key):
self.secret = secret_key
self.cache = {} # 缓存计算结果
def _md5_sign(self, params: str) -> str:
cache_key = f"md5_{params}"
if cache_key in self.cache:
return self.cache[cache_key]
sign = hashlib.md5(
(params + self.secret).encode('utf-8')
).hexdigest()
self.cache[cache_key] = sign
return sign
def _hmac_sign(self, params: str) -> str:
return hmac.new(
self.secret.encode('utf-8'),
params.encode('utf-8'),
hashlib.sha256
).hexdigest()
def generate(self, method: str, params: dict) -> str:
"""统一签名入口"""
sorted_params = '&'.join(
f"{k}={v}" for k,v in sorted(params.items())
)
if method == 'MD5':
return self._md5_sign(sorted_params)
elif method == 'HMAC':
return self._hmac_sign(sorted_params)
else:
raise ValueError("Unsupported sign method")
完整的爬虫请求示例:
python复制# crawler.py
import time
import random
import httpx
from signature import AppSigner
class AppCrawler:
def __init__(self):
self.signer = AppSigner("extracted_secret")
self.device_id = "d7a1b2c3-4567-8910"
self.session = httpx.Client(
headers={
"User-Agent": "Dalvik/2.1.0",
"X-Device-ID": self.device_id,
},
timeout=30
)
def _gen_common_params(self):
return {
"t": int(time.time()),
"nonce": random.randint(100000, 999999),
"device_id": self.device_id,
"version": "6.8.2"
}
def get_product_list(self, page: int):
params = self._gen_common_params()
params.update({
"page": page,
"size": 20,
"category": "electronics"
})
params['sign'] = self.signer.generate("MD5", params)
resp = self.session.get(
"https://api.example.com/products",
params=params
)
resp.raise_for_status()
return resp.json()
def __del__(self):
self.session.close()
反反爬技巧:
部分安全加固的App会检测Frida,常见对抗手段:
bash复制mv frida-server fs123
javascript复制Java.perform(function() {
var AntiFrida = Java.use("com.security.AntiFrida");
AntiFrida.isFridaRunning.implementation = function() {
return false; // 永远返回false
};
});
当参数来自多个源头时,可以使用完整调用链追踪:
javascript复制Java.perform(function() {
// 追踪SharedPreferences读取
var SharedPreferences = Java.use("android.content.SharedPreferences");
SharedPreferences.getString.implementation = function(key, defValue) {
var result = this.getString(key, defValue);
if(key === "device_id") {
console.log("获取设备ID:", result);
send({type: 'device_id', value: result});
}
return result;
};
// 追踪系统属性
var SystemProperties = Java.use("android.os.SystemProperties");
SystemProperties.get.implementation = function(key) {
var result = this.get(key);
if(key.includes("serial")) {
console.log("获取序列号:", result);
}
return result;
};
});
RPC功能将复杂逻辑移到Python端在实际项目中,我们必须注意:
建议在开发前进行法律风险评估,必要时咨询专业律师。技术本身无罪,但使用方式决定其合法性。