| 网站首页 | 业界新闻 | 小组 | 威客 | 人才 | 下载频道 | 博客 | 代码贴 | 在线编程 | 编程论坛
欢迎加入我们,一同切磋技术
用户名:   
 
密 码:  
共有 1449 人关注过本帖
标题:[求助]对文本中数据的分类问题
只看楼主 加入收藏
阿智兄
Rank: 1
等 级:新手上路
帖 子:36
专家分:0
注 册:2021-3-8
结帖率:87.5%
收藏
已结贴  问题点数:20 回复次数:5 
[求助]对文本中数据的分类问题
非常感谢你在百忙之中抽空为我解答,以下是问题:
对文本中的数据冒号后面部分的数据进行分类:
1.从A0001冒号后面的6个数中每次抽取4个(取15次,即C(6,4)),将每次抽取出来的4个数对全部数据逐个查找,看是否有完全相同的,如有则将该行数据写入一个新建的文本中。
2.举例:比如在文本的第一行抽取的是02,06,24,29,在对全部数据逐个查找得到的结果输出到新建文本(新建文件名以抽取的4个数命名02,06,24,29.txt)中如下:
A0001:02,06,07,24,28,29
A0005:01,02,06,11,24,29
A0017:01,02,06,10,24,29

以下是文本的部分数据:
A0001:02,06,07,24,28,29
A0002:02,08,21,24,29,31
A0003:05,07,09,16,18,27
A0004:13,17,19,24,26,30
A0005:01,02,06,11,24,29
A0006:12,14,18,20,26,28
A0007:04,10,12,18,23,25
A0008:02,12,13,14,31,33
A0009:01,05,08,15,16,30
A0010:06,11,21,25,28,31
A0011:08,09,13,27,29,31
A0012:08,10,15,17,22,30
A0013:11,16,23,26,27,33
A0014:05,07,09,16,20,26
A0015:03,05,13,19,20,25
A0016:06,08,14,17,18,30
A0017:01,02,06,10,24,29
A0018:02,21,22,23,27,28
A0019:10,15,17,25,31,32
A0020:04,08,17,24,28,33
A0021:01,03,06,07,09,11
A0022:12,16,17,24,28,29
A0023:02,08,14,23,25,32
A0024:09,15,18,29,32,33
A0025:03,08,09,13,15,18
A0026:06,14,16,26,28,29
A0027:04,15,21,25,29,33
A0028:06,09,12,16,27,31
A0029:03,06,14,18,20,26
A0030:05,10,16,23,27,33
搜索更多相关主题的帖子: 文本 数据 分类 抽取 新建 
2021-04-18 22:40
zyb159357
Rank: 3Rank: 3
等 级:论坛游侠
帖 子:25
专家分:113
注 册:2021-3-15
收藏
得分:0 
提示:
random.sample(set,4) #可在set集合/列表中,随机抽取4个元素.
set1.issubset(set2) #当为True时,集合set1是集合set2的子集.即set2包含set1.
2021-04-19 10:09
阿智兄
Rank: 1
等 级:新手上路
帖 子:36
专家分:0
注 册:2021-3-8
收藏
得分:0 
回复 2楼 zyb159357
谢谢你的回复。可以详细点吗?
2021-04-19 20:56
fall_bernana
Rank: 11Rank: 11Rank: 11Rank: 11
等 级:贵宾
威 望:17
帖 子:244
专家分:2106
注 册:2019-8-16
收藏
得分:20 
回复 3楼 阿智兄
程序代码:
from itertools import combinations,permutations
m=["A0001:02,06,07,24,28,29","A0002:02,08,21,24,29,31","A0003:05,07,09,16,18,27","A0004:13,17,19,24,26,30","A0005:01,02,06,11,24,29","A0006:12,14,18,20,26,28","A0007:04,10,12,18,23,25","A0008:02,12,13,14,31,33","A0009:01,05,08,15,16,30","A0010:06,11,21,25,28,31","A0011:08,09,13,27,29,31","A0012:08,10,15,17,22,30","A0013:11,16,23,26,27,33","A0014:05,07,09,16,20,26","A0015:03,05,13,19,20,25","A0016:06,08,14,17,18,30","A0017:01,02,06,10,24,29","A0018:02,21,22,23,27,28","A0019:10,15,17,25,31,32","A0020:04,08,17,24,28,33","A0021:01,03,06,07,09,11","A0022:12,16,17,24,28,29","A0023:02,08,14,23,25,32","A0024:09,15,18,29,32,33","A0025:03,08,09,13,15,18","A0026:06,14,16,26,28,29","A0027:04,15,21,25,29,33","A0028:06,09,12,16,27,31","A0029:03,06,14,18,20,26","A0030:05,10,16,23,27,33"]
result=[]#存放所有的数字集合
txt={}#存放每一样的数据用于输出
copydict={}#存放是否已经获取过
for one in m :
    onelist=one.split(':')[1].split(',') #获取每行的数字["02","06","07","24","28","29"]
    result.append(onelist)#生成数据集[["02","06","07","24","28","29"],["02","06","07","24","28","29"]]
    txt['-'.join(onelist)]=one#存放每行的数据对应的字典用于后续输出{"02-06-07-24-28-29"=>"A0001:02,06,07,24,28,29"}
result.reverse()
while result:
    check=result.pop()#每次从result取出一组数据。因为是遍历所有组合,所以处理后不需要再次在下面做循环判断。所以用的pop
    comblist=list(combinations(check,4))#获取C(6,4)
    for comblist_one in comblist:
        if '-'.join(comblist_one) in copydict:#如果是已经获取过的,跳过以免重复
            continue
        sign=0
        for result_one in result:
            if set(comblist_one).issubset(result_one):#通过set的issubset判断comblist_one是否是result_one的子集
                sign=1
                copydict['-'.join(comblist_one)]=1 #存入判断用的字典
                print(txt['-'.join(result_one)],end='\t')#输出result里符合要求的行的信息
        if sign==1:
            print(txt['-'.join(check)],end='\t')#如果除了当前行符合要求还有其他符合要求的行,输出当前行信息
            print(comblist_one,end='\t')#输出('02', '06', '24', '29') 
            print("")

A0017:01,02,06,10,24,29 A0005:01,02,06,11,24,29 A0001:02,06,07,24,28,29 ('02', '06', '24', '29')    
A0014:05,07,09,16,20,26 A0003:05,07,09,16,18,27 ('05', '07', '09', '16')
A0017:01,02,06,10,24,29 A0005:01,02,06,11,24,29 ('01', '02', '06', '24')
A0017:01,02,06,10,24,29 A0005:01,02,06,11,24,29 ('01', '02', '06', '29')
A0017:01,02,06,10,24,29 A0005:01,02,06,11,24,29 ('01', '02', '24', '29')
A0017:01,02,06,10,24,29 A0005:01,02,06,11,24,29 ('01', '06', '24', '29')
A0029:03,06,14,18,20,26 A0006:12,14,18,20,26,28 ('14', '18', '20', '26')
A0030:05,10,16,23,27,33 A0013:11,16,23,26,27,33 ('16', '23', '27', '33')


[此贴子已经被作者于2021-4-23 09:29编辑过]

2021-04-21 11:00
阿智兄
Rank: 1
等 级:新手上路
帖 子:36
专家分:0
注 册:2021-3-8
收藏
得分:0 
回复 4楼 fall_bernana
非常感谢你的帮助。能不能加点注释,方便我学习呢?
2021-04-23 01:40
阿智兄
Rank: 1
等 级:新手上路
帖 子:36
专家分:0
注 册:2021-3-8
收藏
得分:0 
回复 4楼 fall_bernana
再次感谢你的帮助,谢谢。
2021-04-23 10:55
快速回复:[求助]对文本中数据的分类问题
数据加载中...
 
   



关于我们 | 广告合作 | 编程中国 | 清除Cookies | TOP | 手机版

编程中国 版权所有,并保留所有权利。
Powered by Discuz, Processed in 0.035844 second(s), 8 queries.
Copyright©2004-2024, BCCN.NET, All Rights Reserved